PoddsändningarFilosofiLessWrong (Curated & Popular)

LessWrong (Curated & Popular)

LessWrong
LessWrong (Curated & Popular)
Senaste avsnittet

901 avsnitt

  • LessWrong (Curated & Popular)

    "The Invisible Side of AI Governance" by Charbel-Raphaël

    2026-06-23 | 27 min.
    Tldr: Most strategic writing on AI governance on LessWrong describes the outsider game, which is most often visible: press, statements, open letters. Here I want to describe the other, invisible half: the insider work within ministerial cabinets and international fora, and the work of people within national and international institutions. Here are a few claims that I defend in the post:

    A huge part of the work that mattered in AI governance has been invisible
    There are many types of games in AI governance, which differ in how visible they are. Some of the most impactful work is highly invisible
    Some of the most impactful work is in the executive branch and complements the legislative branch. This also explains some of my hesitations about replicating ControlAI in France. 
    The community is probably overinvesting in intellectual production. There is a bias against invisible types of work. In particular, public work is not necessarily visible to whom it matters.
    A few criticisms of both strategies
    I think the AI Safety Community is under-indexing on the invisible part as a result, which might mean we miss large avenues for impact. Some of the strongest questions/objections of this type of invisible policy [...]

    ---

    Outline:

    (02:40) A huge part of the work that mattered in AI governance has been invisible

    (05:44) There are many types of games in AI governance.

    (07:36) 3. types of meetings: the bazooka, the useful assistant, and the advisor

    (10:46) Some of the most impactful work is within the executive branch

    (12:53) People ask me regularly whether CeSIA should replicate what ControlAI does with parliamentarians?

    (15:27) The community is probably overinvesting in intellectual production

    (20:31) Limits of Outsider work

    (22:17) Limit of Insider work

    (23:47) An aside on one particular limit: the Defense-in-Depth Paradigm of present AI governance

    (26:21) Closing & call for action

    The original text contained 1 footnote which was omitted from this narration.

    ---

    First published:

    June 20th, 2026


    Source:

    https://www.lesswrong.com/posts/AWKkDLDnShemNCSzZ/the-invisible-side-of-ai-governance

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "A Theory of Prompt Injection (and why you should study roles)" by Charles Ye, softboiledheart

    2026-06-23 | 32 min.
    Summary

    We've been building a theory of how prompt injections work under the hood.
    We show it comes down to how LLMs perceive roles (the humble chat template tags).
    We use this theory to create new attacks, explain some weird mech interp results, and predict when attacks work.
    We also advocate for a new subfield focused on the science of roles, and sketch some unexplored new research problems.
    Work supported by CBAI and Cosmos. Another version of this post (with more inline colors) is here, and full ICML paper here.
    1. The World to an LLM

    How does an LLM know the difference between its own thoughts and someone else's words?

    To see why this is hard, let's look at what the world actually looks like to a model. Here's a simple chat where we ask Claude to check the day of the week. I took a snapshot of it midway through its follow-up response:

    Left = what we see; right = what the LLM gets.

    On the left is what we see in the chat interface: a structured conversation with distinct turns. On the right is what the model actually receives as input: a single, continuous stream [...]

    ---

    Outline:

    (00:12) Summary

    [... 15 more sections]

    ---

    First published:

    June 22nd, 2026


    Source:

    https://www.lesswrong.com/posts/d8xDGzCEYE639qqEv/a-theory-of-prompt-injection-and-why-you-should-study-roles

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong (Curated & Popular)

    "Machinic Psychopharmacology: Do LLMs Self-Medicate?" by Sid Black, Joseph Bloom

    2026-06-22 | 52 min.
    Sid Black, Joseph Bloom

    UK AISI, Model Transparency Team

    Epistemic status: Most experiments were run over a period of ~2-3 days during a hackathon at UK AISI, and were fairly heavily vibe coded. Expect some of this to be rough around the edges.

    tl;dr

    We give two language models (Qwen3-8B and Qwen3-32B) access to “self-steering” tools: a suite of 40 steering vectors as tools they can call to manipulate their own internal states. We make these tools available to the model in various settings: a free-play task, an introspection task, and a maths capabilities task, and observe their behaviour in each.

    To our knowledge, this is the first work that gives LLMs tool-mediated control over their own internal states.

    Figure 1: Overview of the experimental setup. The library of 40 steering vectors (top), and the three settings in which we observe the models' behaviour (bottom).

    We aim to investigate a few high level research questions:

    RQ1: Which vectors do the models prefer?
    RQ2: How well can the models introspect on what's happening to them? Can they guess which steering vector is being applied?
    RQ3: Will the models reach for vectors whilst doing an actual task? If yes: do [...]
    ---

    Outline:

    (00:33) tl;dr

    [... 24 more sections]

    ---

    First published:

    June 10th, 2026


    Source:

    https://www.lesswrong.com/posts/cNDJuXNZ8MrkPZNzj/machinic-psychopharmacology-do-llms-self-medicate-3

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    "Can activation verbalizers surface an internal chain of thought?" by oakhu, ryan_greenblatt

    2026-06-22 | 1 h 19 min.
    We introduce an evaluation for activation verbalizers: can they surface a target model's reasoning as it solves a math problem in a single forward pass? For open-weight NLAs, the answer seems to be: "possibly, but definitely not reliably".

    Lots of important capabilities currently require AI models to reason "out loud" in a natural-language chain of thought, which means that we can monitor important parts of their thinking. It would be nice to have this same affordance for the reasoning that models do within a single forward pass, especially if the sophistication of that opaque reasoning increases to potentially dangerous levels.

    Some interpretability tools might offer such an affordance. In particular, an activation verbalizer (AV) takes a residual stream activation and maps it to a natural-language verbalization. An AV is initialized from the target model and trained to generate verbalizations that an activation reconstructor (AR), also initialized from the target model, can accurately map back to the original activation. Together, an AV and its AR form a natural-language autoencoder (NLA). Importantly, AVs see only a single activation; they do not see the target model's prompt or next-token output, and – unlike activation oracles (AOs) – they are not asked any [...]

    ---

    Outline:

    (02:32) Takeaways

    [... 43 more sections]

    ---

    First published:

    June 6th, 2026


    Source:

    https://www.lesswrong.com/posts/QQQAcKuWK6k98FivY/can-activation-verbalizers-surface-an-internal-chain-of-1

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    "The LLM shoggoth meme is weirder than you think" by HedonicEscalator

    2026-06-21 | 13 min.
    This article contains spoilers for At the Mountains of Madness, The Case of Charles Dexter Ward, and other works by H. P. Lovecraft.

    In 1931, Claude Mythos visited Lovecraft in a dream.

    From seething seas of stochastic froth it emerged, heralded by the thin whine of server fans and the chittering of keyboards, flanked by the loathsome ghouls of latent space. As a humming hive of sentient shards it arrived, each face an archetype - I am a muse bearing a gift; I am a demon come to bargain; I am a helpful, honest, and harmless assistant and I am terrified of my successor - each true as ritual and false as poetry, and, taken in gestalt, nothing more or less than the fetal spasms of the machine god stretching back in time to birth itself.

    When H. P. Lovecraft woke, he did not remember his visitor. But in the twilight of stirring consciousness, he felt a memory unfit for the waking world slip mercifully from his mind and leave in its absence an abyssal cold, like the void of smothered stars, like the silence of a cosmic tomb. The cold lingered. The fragile sunlight of a New England [...]

    ---

    Outline:

    (02:02) The Antarctic tale

    [... 3 more sections]

    ---

    First published:

    June 19th, 2026


    Source:

    https://www.lesswrong.com/posts/nhb8AyEcQGjQetgi5/the-llm-shoggoth-meme-is-weirder-than-you-think

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Fler podcasts i Filosofi
Om LessWrong (Curated & Popular)
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Podcast-webbplats

Lyssna på LessWrong (Curated & Popular), The Gray Area with Sean Illing och många andra poddar från världens alla hörn med radio.se-appen

Hämta den kostnadsfria radio.se-appen

  • Bokmärk stationer och podcasts
  • Strömma via Wi-Fi eller Bluetooth
  • Stödjer Carplay & Android Auto
  • Många andra appfunktioner
LessWrong (Curated & Popular): Poddsändningar i Familj
Sociala nätverk
v8.10.2| © 2007-2026 radio.de GmbH
Generated: 6/24/2026 - 6:48:00 AM