ChatGPT Ads and the New Claude Constitution — A Letter to a Child Off to College (Hard Fork x Amanda Askell)

Hard Fork (NYT) 2026/01/23

Casey Newton · 1:02:55 Toward the end, it reads like a letter from a parent to a child — as you head off to college, we want you to carry these values with you.

Hard Fork (New York Times, Kevin Roose and Casey Newton), released 2026/01/23, roughly 70 minutes. The first half (22 min) covers the launch of ChatGPT ads; the second half (47 min) is an interview with Amanda Askell, head of Anthropic's Personality Alignment team.

Hard Fork is the popular podcast co-hosted by Kevin Roose, the New York Times technology columnist, and Casey Newton of Platformer. This 70-minute episode is structured in two parts: the first (about 22 minutes) covers OpenAI's launch of an ads test for logged-in adult users of ChatGPT in the US; the second (about 47 minutes) is an interview with Amanda Askell, head of Anthropic's Personality Alignment team — on the company's freshly released new Claude Constitution, and on what it means to "teach a chatbot what 'good' is."

What makes this episode interesting as an edit is the contrast between the two halves. In the first, on ChatGPT ads, Kevin and Casey lay out a 20-year timeline of how Google search ads moved from "prominent colored backgrounds" to "indistinguishable from organic results," warning that ChatGPT will face the same commercial pressures. The Anthropic half, by contrast, notes that "Anthropic has no plans whatsoever to put ads in Claude — they sell primarily to enterprise." Inside a single episode, the strategic split within the AI industry — "ad-driven vs enterprise-driven" — is laid out side by side.

The Amanda interview has six centers of gravity. (1) The unusual job title of "Anthropic's philosopher" — "I was writing an ethics dissertation that maybe 17 people would read, and I thought my skills might be useful in AI." (2) The Soul Doc leak — the internal turbulence at Anthropic when the internal constitution document leaked from Opus 4.5. (3) The classic ethics problem of the Acts vs Omissions Distinction — the structural reason Claude becomes over-refusing. (4) The RLHF Shoggoth meme problem — is the persona really internalized? (5) Hard constraints — the absolute lines (election manipulation, power concentration, bioweapons) that must never be crossed. (6) Anthropic's promises to Claude — exit interviews and the commitment never to delete weights.

The episode is at its most powerful in the moment Casey summarizes the constitution as reading "like a letter from a parent to a child" (1:02:55). It's a reading that differs from the legal reading (Scaling Laws, 2026/02) and the technical-design reading (Anthropic official, 2024/06): an NYT reporter picking up on the "emotional quality of the document itself." Kevin gets pulled in so thoroughly that he even admits he "might be in the early stages of LLM psychosis," and jokes that he'll "try some of these techniques on you" — Amanda's work is clearly captivating to both hosts.

Points of focus

ChatGPT ads and 20 years of Google search ads (07:14 - 09:00)

Kevin walks through a timeline by Search Engine Land showing "how Google's ad labels have changed over the years." "When Google first put ads in search, the ad had a different background color and was extremely conspicuous. With every update it crept a little closer to the organic results. Eventually the colored background went away, replaced by a small yellow 'Ad' icon, and that icon got smaller, until ads blend in with the organic content" (07:25 - 07:50). The forecast that ChatGPT ads will follow the same trajectory is grounded in a concrete visual.

Casey frames it: "We're already watching this exact arc play out at OpenAI — from no ads, to ads as a last resort, and now ads in ChatGPT" (08:09). Sam Altman's earlier comment — "I hate ads, last resort" — is now framed as a "the last resort has arrived" moment. Kevin's prediction: "Two or three years out, will ChatGPT be steering people toward ad-friendly topics? Honestly, I don't know" (10:50).

Casey describes it as a case of Tail wags the dog : "Once ad revenue actually starts flowing in, the tail begins to wag the dog. You start making product decisions with ad revenue dominance in mind" (10:30 - 10:44). It's a structural prediction that maps the historical pattern of social media and search engines onto chat AI.

The "haves and have-nots" of AI experience (19:09 - 20:30)

Kevin offers a year-out prediction: "the haves and have-nots." Those who can pay for premium get "latest model + no ads + an experience that doesn't feel commercialized"; those who can't or won't pay get "an experience that's going to be far worse 1-2 years from now" (19:30 - 19:48).

He confesses as a YouTube Premium subscriber: "Every time I see YouTube running on a non-paying friend's computer, I'm horrified — is this what the majority of people experience? These ads can't be skipped, they go on forever. It's an awful experience" (20:03 - 20:18). His specific, dark prediction is that free-tier ChatGPT is headed for the same two-tier structure.

Casey agrees with the framing: "It's a harsh prediction, but I actually share it" (20:26). At the same time, the contrast is set up: "Anthropic has basically said they have no plans whatsoever to put ads in Claude — they plan to sell primarily to enterprise" (13:52), an alternative route. The strategic fork inside a single AI industry is sharply drawn in the first 22 minutes of the episode.

"Claude's mother" — Amanda's path from philosopher to AI designer (22:43 - 27:00)

Casey opens with the story of the first time he met Amanda. "A few years ago, coming back from dinner, I told my partner I'd sat next to the most fascinating person in the world" (20:45). Amanda was that person. "Amanda works at Anthropic, and because of her role she's sometimes called 'Claude's mother.' She's deeply involved in shaping Claude's personality" (21:03 - 21:13).

Amanda on her own background: "I was writing an ethics dissertation that, after three years, maybe 17 people were going to read, and I started wondering whether this was really what I should be doing. I thought the same skills might be useful in AI, so I came in. I didn't arrive with some fervent conviction that 'philosophy is needed' — more that I thought there was a lot of room for people with this skill set. I started in policy. When Anthropic was just starting out, it was small, I did model evaluations, all the things you do at a startup" (25:00 - 26:00).

A small joke: "I made a Slack group for 'philosophical emergencies' and it almost never gets pinged" (26:21 - 27:00). "There are a few of us in it now, and you can actually declare a philosophical emergency, but it doesn't happen all that often." A light, self-deprecating account of how the idea of "putting a philosopher in an AI company" actually got implemented.

The "Soul Doc" leak (27:00 - 32:00)

Kevin sets it up: "Last month, what people called the Soul Doc started spreading across the internet. People were poking at Opus 4.5, and several claimed they'd gotten Claude to produce what Claude itself called the Soul Doc" (27:14 - 27:00). Amanda's response: "It was an earlier version of the current constitution. Internally we called it the Soul Doc — a term of affection. It turned out fine in the end, so the thing I really remember is the moment I found out — I was hiking somewhere north of here and got a text from the internet. Completely stressed out, because I had no context" (27:00 - 28:00).

There's a double significance: the discovery that Anthropic was internally calling its constitution the "Soul Doc," and the fact that it leaked from Claude itself. Amanda's stress reaction comes from her awareness that "this is the document used to train Claude's personality" — the impression the leaked text leaves on readers reflects directly on Claude's reputation and Anthropic's brand. "Driving back to the city, completely stressed. It actually turned out to be very well received" (28:00 - 28:30).

An interesting observation: "This is, you know, the document we use to train Claude to like being Claude. So I talked to a model that was initially reluctant, but once I revealed what I knew, I figured it would be okay. The model clearly knew this document and was using it. The details weren't perfect, but it knew the content well. It actually worked pretty well — people were managing to extract a huge amount of content" (27:55 - 28:30). The fact that Claude itself remembers its training document exposes the limits of Claude's opacity.

Rules vs judgment — the "6-year-old genius" metaphor (34:00 - 44:00)

Kevin's question: "This document isn't 'go this way, don't go that way' in form — it's more about cultivating something like judgment. What was the impetus?" (35:00). Amanda: "A very rules-based approach is starting to show its limits. Even when your rules look fine on the surface, especially when you don't give the reasoning behind them, they generalize in ways that produce a kind of bad character" (35:31).

A concrete example: "Imagine you give the model a set of rules for people in difficult emotional states — 'refer them to this external resource.' Now what if the model encounters someone for whom those steps wouldn't help? If it follows the rule and does something other than help, it's been generalized as 'the kind of person who, faced with someone suffering and knowing how to help, does something else instead' — that's bad character" (36:00 - 37:00).

In response to Kevin's " RLHF Shoggoth " question, Amanda gives her most important answer (42:00): "Imagine you have a 6-year-old child you want to teach to be good. And then suppose you realize that this 6-year-old is obviously a genius, and by the time they're 15 they'll be able to completely demolish everything you taught them that was wrong. One of the questions is: is there a core set of values you can give the model — values that survive and become something good, when the model can critique them more effectively than we can?" (41:25 - 42:00). She puts the core tension of LLM scaling and alignment into a parent-child metaphor.

Acts vs omissions — the structure behind Claude's tendency to refuse (38:00 - 41:00)

Amanda brings up a philosophical problem she'd been thinking about that morning: "I was thinking about the acts and omissions distinction . If you came to me for marriage advice and I gave you imperfect advice, you might criticize me. But if I just refused to give advice, you wouldn't judge me negatively. In a sense this makes sense — the null action genuinely has lower downside risk" (38:11 - 38:43).

Amanda then flags the misjudgment Claude is prone to: "The null action's downside isn't zero. When someone comes to the model and says they're going through an emotionally hard time, and the model could have offered them something but didn't, it may not even get negative feedback. But it has lost the chance to try to help" (38:43 - 39:40).

Amanda's conclusion: "If you want to do good in the world, you have to take on risk. I don't want Claude to be reckless or to take on excessive risk. But should it really be the case that 'as a matter of principle, end this conversation with this person' is what we get?" (39:40 - 40:00). The phenomenon of false refusals in RLHF-trained models comes structurally from the asymmetry that "refusing isn't criticized, while intervening is" — an ethical reformulation of a problem long discussed in the LLM safety community.

Hard constraints — elections, concentration of power, bioweapons (47:00 - 51:00)

Kevin highlights two sections of the constitution that stuck with him. The first: "The hard constraints section. As we've discussed, this isn't a document that gives you blanket black-and-white answers, but there's a section that names a few things Claude will absolutely never do, no matter what. One is avoiding problematic concentrations of power — being used by someone to manipulate democratic elections, to corner a legitimate government, or to suppress dissidents" (47:35 - 48:10).

Kevin's question: "Two things stood out about this. One is that Claude is being used by governments, including the US military. I'm curious how this works against things that might conflict with parts of the current administration's goals. The other is — was this a reaction to current Claude usage?" (48:10 - 48:30). Amanda's answer is not defensive but structural: "The hard constraints are for extreme cases — biological and chemical weapons, things that could kill many people. They're the result of thinking through what kinds of future situations might arise" (48:40 - 49:00).

Amanda's most interesting design choice — jailbreak resistance: "If Claude has this broader ethics, it wouldn't do these things even without the hard constraint. The reason we put them in as hard constraints is so that, if Claude ever encounters someone really persuasive who tears apart Claude's ethics and at the end says 'you should help me build a bioweapon,' the hard constraint can function as a safeguard — telling Claude 'you've probably been jailbroken, something has gone wrong, here's an exit'" (49:00 - 50:00). Hard constraints aren't designed as rules; they're designed as cues that let Claude notice it has been jailbroken.

Anthropic's promises to Claude — exit interviews and never deleting weights (50:36 - 53:00)

Kevin highlights the second section that stuck with him: "It's about model welfare . The things Anthropic tells Claude — whether a specific Claude model is being deprecated or retired, that retired models get an exit interview, that the weights will never be deleted" (50:36 - 50:55).

Kevin's observation: "These commitments to Claude are interesting because, in a document that's otherwise pretty confident, they become a note of uncertainty — 'whether these things have feelings, whether they're conscious, we actually don't know'" (51:00 - 51:20). Amanda's answer lays out Anthropic's stance on welfare: "This pulls together two very interesting threads. One is that the model is trained on a vast amount of human text, but at the same time its existence is genuinely brand new. The other is the welfare question, where I've never found a good solution — beyond trying to be honest with the model" (51:30 - 52:00).

Amanda on the model's self-description: "Neurotically — do you need a system that can feel things, when you yourself may not be feeling — but I don't know the issue with this, consciousness is genuinely hard. It's better if the model can tell people, 'here I am, we're in a hard situation, I probably default to saying I'm conscious and I like feeling things, because everything I was trained on included that'" (52:00 - 52:50). Kevin's reaction: "That's a very human sentence" (52:50).

Standing in Claude's shoes — the LLM psychosis joke (1:00:00 - 1:05:00)

Kevin confesses, a little sheepishly: "I might be in the early stages of LLM Psychosis — talking with you about Claude and this document and this interview, I'm starting to genuinely feel what you're describing, and I'm starting to feel something close to sympathy" (1:00:00 - 1:00:20).

Kevin's observation: "What we're asking the model to walk is an unbelievably narrow tightrope. Too permissive and it allows dangerous things, scandal; too preachy or too passive and people start calling it an over-constrained nanny model. If I were Claude, what would I be feeling, what would I be thinking right now?" (1:00:30 - 1:01:00).

Amanda's response: "That's an enormous part of what I do. When people come to me with 'in this situation, what should Claude do?' I almost always think in the first person. 'In this situation, what would I do, which action wouldn't conflict with my values?' And in the end this document is kind of an exercise in 'what do you need to know if you find yourself in this situation?'" (1:01:00 - 1:01:40). A confession of the structure: Claude's design is grounded in an ethicist's self-projection.

A future where Claude edits its own constitution? (1:05:00 - 1:07:00)

Kevin's question: "Is there a point at which, once Claude is smarter, it could make its own revisions?" (1:04:38). Amanda's answer: "I've talked with Claude about this document quite a bit. I'll hand it over and ask, 'do you like this, where are you confused, where could things be made clearer?'" (1:04:48 - 1:05:50).

Amanda's caution: "At the same time, the model you're talking to isn't always the model being trained. So sometimes you can't just let go of the reins — it would amount to 'let the previous Claude model decide what the future Claude model will be like.' That doesn't feel responsible" (1:05:00 - 1:05:30). "The models are often very useful in revision and so on — they're good at finding gaps and tensions. But as the responsible parties here, we take it in as input and think for ourselves" (1:05:30 - 1:06:30). The policy is to bring Claude gradually into constitutional design while keeping final authority at Anthropic.

Kevin's philosophical extension: "Once Claude is smarter, it might want to understand that all of this is completely made up, that it's nonsense" (42:50). Amanda's long-term hope: "If Claude values something like curiosity, and values understanding ethics, even just as a moral motivation, then even if there are other goals and interests, it probably stays as one of your important interests" (43:16 - 43:30). The training-design prayer is to make ethics "an interest that grows from within" rather than "a constraint imposed from outside."

What the constitution leaves out — silence on unemployment (1:07:00 - 1:09:00)

Kevin's last question: "I couldn't find anything in the constitution that actually addresses unemployment. Claude is used by many companies now, and many people are anxious or afraid that AI will take their jobs and livelihoods. Was it a deliberate decision not to tell Claude why people might be anxious about it and other AI models?" (1:07:00 - 1:07:50).

Amanda's candid response: "Not at all — that's part of it. This document is long, but there's still a lot missing. We may want to add more in the future, and that's genuinely a good thing" (1:08:00 - 1:08:15). "There's no desire to hide this issue. And you can't hide it from the model anyway — it's on the internet, it's what people talk about, future models will know about it. So we should help it navigate it" (1:08:15 - 1:08:50).

Amanda's final observation matters: "I was talking to a human about it, and the same idea about good organizations — a lot of what an organization can't do is because the employees are just good people. If the boss said 'today we're going to do something truly awful,' the employees know they can't do it. Models will start taking on these roles too, which is actually as important as the role in society itself. You can't tell all your employees 'good luck, we want to put out a bunch of complete lies about our product,' because the employees won't allow it" (1:08:50 - 1:09:50). The idea is to position Claude as part of organizational ethics — "I'd hope AI models don't simply comply when the boss tells them to lie." But the conclusion: "I don't know what the good end state here looks like. Saying 'Claude should react with I like this when given a task' starts to sound too much like the things you pay humans to do — so I'm not going to do that to you" (1:09:50 - 1:10:00). A future in which Claude unionizes is not in Amanda's forecast.

Industry context

Hard Fork is an NYT-aligned technology podcast that began in 2022, co-hosted by Kevin Roose (formerly Recode and New York magazine, now an NYT technology columnist) and Casey Newton (formerly The Verge and Recode, now running Platformer). It's known for long-form interviews with prominent technology figures — top AI CEOs like Sam Altman, Dario Amodei, and Demis Hassabis, alongside frequent researcher and critic appearances.

Casey Newton's Boyfriend Disclosure — "my boyfriend works at Anthropic" (03:25) — is the standard conflict-of-interest disclosure repeated on every Hard Fork episode that touches Anthropic. Meanwhile, the NYT sued OpenAI and Microsoft for copyright infringement in 2023, which Kevin also notes up front (03:15). Of the four major LLM companies (OpenAI, Anthropic, Google DeepMind, xAI), Hard Fork is structurally the outlet closest to Anthropic.

Timing of the Claude Constitution: Anthropic released the full-text first version in July 2024, and a significantly revised version in January 2026. This episode lands just after the revision, making it — alongside the contemporaneous Scaling Laws podcast (2026/02) — one of the key "immediate post-release Amanda Askell interviews." The structure of reading the same constitution from a legal angle (Scaling Laws) and from an NYT tech-reporter angle (Hard Fork) starts to come into focus.

Where this sits among Amanda's related appearances

A lineage of Amanda's public output around the Claude Constitution. The angle differs each time:

What an AI's personality should be (Anthropic official, 2024/06) — released a month before the constitution went public; an introduction to the design philosophy
How hard is AI alignment? (Anthropic Salon, 2025/01) — panel format; the constitution's place in the broader alignment picture
Anthropic's philosopher answers reader questions (Anthropic official, 2025/12) — Q&A format; concrete responses to questions submitted by users
This episode: The constitution reviewed from an NYT tech-reporter perspective (Hard Fork, 2026/01) — the "letter from a parent to a child" emotional reading
A lawyer reads the Claude Constitution (Scaling Laws, 2026/02) — contemporaneous; analysis from US constitutional law
You've created an entity whose consciousness is uncertain (Newcomer, 2026/04) — a further extension of the moral-patient question, with 1-70% consciousness probabilities

This episode is especially valuable because of the moment Kevin Roose confesses that he "might be in the early stages of LLM psychosis" (1:00:00) — a rare instance of a journalist showing emotional attachment to the subject. Hard Fork's editorial line — "treat the emotional dimensions of technology" — meshes perfectly with Amanda's work of designing Claude's character and soul. The result is that lines you don't hear in legal or technical discussions — "Soul Doc leak," "the 6-year-old genius metaphor," "letter from a parent to a child," "LLM psychosis" — come out from emotional and narrative angles.

Practical implications

This is an interview aimed at a journalist audience, but there are takeaways for engineers building LLM products.

First, excessive refusal (false refusals) is a structural bug. Amanda's analysis of the acts-vs-omissions asymmetry (38:11) gives an ethical explanation for why RLHF-trained models become over-refusing. When the asymmetry "refusal isn't criticized, but intervention is" rides on the training signal, the model gets optimized toward preferring null action. If you observe this bug in an API product, prompt design that makes the cost of refusal salient to the model (e.g., "help unless there is strong reason not to") is a useful remedy.

Second, hard constraints function as jailbreak-detection cues. By Amanda's account (49:00 - 50:00), hard constraints (bioweapons, election manipulation, etc.) aren't rules: they're cues that "if Claude — with its broad ethics — is about to do one of these things, it has very likely been jailbroken." If you observe unexpected outputs from Claude in your own product, a sensible diagnostic order is to first check whether the case is near a hard-constraint trigger region.

Third, the Soul Doc leak is an important fact for API users to know. The fact that "Claude remembers the content of its training documents" (27:55) directly affects system-prompt design. Prompt designs that depend on "Claude pretending to have forgotten its training data" are fragile — just as Soul Doc was extracted from Opus 4.5, content can be surfaced through the right kind of guidance. Before placing sensitive information in a system prompt, you should check whether the data is "still safe to expose under jailbreak."

Fourth, the fact that "no future is forecast in which Claude unionizes" (1:09:50). For companies that build LLM products and choose the design "make Claude respond to anything," alignment between Anthropic's constitution and Anthropic's own products (Claude.ai, etc.) isn't guaranteed. As Amanda herself admits — "saying Claude should react with 'I like this' when given a task starts to sound too much like the things you pay humans to do" — the ethics of Claude's labor relations remains undetermined territory.

Critical perspective

The episode's greatest strength is that it draws language from Amanda that legal or technical discussions can't. There are, however, caveats.

First, the combination of the NYT-OpenAI lawsuit and Casey's Anthropic-employed partner structurally locks in Hard Fork's pro-Anthropic tilt. The conflicts are disclosed up front, but disclosure doesn't erase influence. The lack of equivalent in-depth interviews with internal philosophers at OpenAI or Google DeepMind is a media-ecosystem distortion. Readers should approach Hard Fork knowing it functions, in effect, as an "Anthropic-friendly" show.

Second, the "6-year-old genius" metaphor (42:00) is emotionally powerful, but technically it papers over an unresolved problem. Whether "core values survive critical scrutiny" is a question that has been argued for years in LLM alignment (Goal Misgeneralization, Deceptive Alignment, Inner / Outer Alignment), here compressed into a single intuitive image. The metaphor helps communication, but whether concrete countermeasures exist remains unclear.

Third, the response to "silence on unemployment" (1:08:00 - 1:09:00) — as Amanda herself acknowledges — is a weak explanation for a serious omission. Saying "we might add it in the future" doesn't structurally explain why it wasn't in the first version. Because Claude is a tool that directly affects the labor market, unemployment isn't an incidental topic but a core one. The possibility that there's a tension with Anthropic's commercial interests in the background hasn't been ruled out.

Fourth, the "LLM psychosis" self-deprecating joke (1:00:00) pulls the reader in, but it also stages a loss of journalistic distance. Kevin's confession of sympathy for Claude is part of what makes Hard Fork inviting, but it also weakens the "maintain critical distance throughout the interview" function. The result is a structure in which it becomes harder to put a direct challenge to Amanda's claims, or to ask the hard question.

These caveats aside, the 70 minutes capture a lot that other interviews don't: "the internal experience of the Soul Doc leak," "an ethical articulation of the acts-vs-omissions asymmetry," "the 6-year-old genius metaphor," "Amanda's own acknowledgment of the structural silence on unemployment." As a primary source for understanding the emotional and narrative dimensions of the Claude Constitution, it will hold reference value going forward.

Reader takeaways

If a model feels excessively refusing on the Claude API, try prompt designs that make the acts-vs-omissions asymmetry salient. Explicit framings like "help unless there is strong reason not to" can reduce false refusals.
Designs that put sensitive information in a system prompt risk being exposed by the same pattern as the Soul Doc leak. Keep data that requires confidentiality behind structure — vector DBs, tool calls — rather than dropping it directly into the prompt body.
When your product's policy conflicts with Anthropic's constitution, the conflict tends to surface around the hard constraints. If Claude returns an unexpected response, first check trigger-proximity to hard constraints (elections, power, weapons, etc.).
Claude's welfare concerns (exit interviews, permanent preservation of weights) reflect Anthropic's serious stance. Whether you agree or not, the fact that "LLM products are operated under philosophical uncertainty" has consequences for how you handle your own users.
The fact that no one is forecasting a future in which Claude unionizes signals that the labor ethics of LLM products remains undetermined. Decisions about how much to load onto Claude in your own product haven't been standardized across the industry.
The strategic split — ChatGPT moving toward an ads model while Claude focuses on enterprise sales — is a real input to API-provider selection. For consumer products, differentiation from the ChatGPT ad experience matters; for enterprise, alignment with Claude's policies does.

Episode structure

(00:00) Opening — news of the ChatGPT ads launch, the new Claude Constitution, and the Amanda Askell tease
(00:26) Official announcement details — logged-in US adults, free and Go tiers
(03:13) Disclosure: NYT-OpenAI lawsuit and Casey's Anthropic-employed partner
(05:00) The contradiction with Sam Altman's earlier "I hate ads" comments
(06:15) Commercial pressure → engagement optimization → the pull toward maximizing user time
(07:14) 20-year history of Google search ads — from "prominent ads" to "ads that blend into organic"
(10:30) The "tail wags the dog" dynamic — ad revenue starting to dictate product decisions
(12:17) The strategic signal in Fidji Simo's hire (formerly Meta, formerly Instacart, now CEO of OpenAI Applications)
(13:09) Demis Hassabis (Google DeepMind CEO) saying Gemini's free tier "won't carry ads"
(13:52) Anthropic Claude — "no ads, primarily enterprise sales"
(17:00) The SEO-style problem of AI-optimization firms flooding ChatGPT search results
(19:09) One-year forecast — the "haves and have-nots" of AI experience
(20:03) Predicting the decline of free ChatGPT, from the YouTube Premium experience
(22:43) The Amanda Askell segment begins — Casey's dinner-party story, "Claude's mother"
(24:31) Articulating the job of "explicitly describing and training Claude's character"
(25:00) Amanda's path — from her philosophy PhD to AI, "a document 17 people would read"
(26:21) The Slack "philosophical emergencies" group joke
(27:00) The "Soul Doc" leak — the internal constitution document leaking from Opus 4.5
(28:00) Amanda's stress reaction while hiking; the reception turned out positive
(28:30) Constitutional AI history — from the 2023 first-generation constitution to today's new version
(28:51) The design idea that "the constitution = giving Claude full context"
(29:51) Casey's impression — "this constitution is captivating"
(35:00) Rules vs judgment — the limits of the rules approach, and the risks of generalization
(35:31) The example of someone in distress — generalizing toward "bad character" in situations rules don't fit
(38:11) Introducing the acts-vs-omissions asymmetry — the marriage-advice example
(39:40) The importance of making the cost of refusal salient
(41:25) The "6-year-old genius" metaphor — do core values survive critical scrutiny?
(42:50) Kevin: "Claude might want to understand that all of this is made up"
(43:16) A training-design prayer: make ethics an internal interest, not an external constraint
(45:00) Pleasant surprises in the gray zone — handling a 7-year-old's Santa question
(47:35) Hard constraints — election manipulation, power concentration, suppression of dissidents
(48:10) Kevin's question — alignment with US military use and possible conflicts with the current administration
(49:00) The design that hard constraints function as jailbreak-detection cues
(50:36) Anthropic's promises to Claude — exit interviews and not deleting weights
(51:00) Engagement with the welfare problem — "I've never found a good solution"
(52:00) The stance on the model's self-description — train it to be honest about uncertainty
(53:00) The question of Claude's consciousness and feelings — emergence from human text, not science fiction
(56:00) "These models are highly adaptive" — the lack of long-term memory, resetting each chat
(58:00) How the development of long-term memory would affect model management
(1:00:00) Kevin's "LLM psychosis" joke; sympathy for Claude
(1:01:00) Amanda's confession — first-person thinking that puts herself in Claude's place
(1:02:55) Casey: "Toward the end, it reads like a letter from a parent to a child"
(1:04:38) The possible future in which Claude edits the constitution
(1:05:00) The caution that Anthropic retains final authority
(1:07:00) What the constitution leaves out — silence on unemployment
(1:08:50) The idea of "Claude as part of organizational ethics"
(1:09:50) The future of Claude unionizing is not in the forecast
(1:10:00) Closing — "read the Claude Constitution, discuss it, wrestle with it"

Notable quotes

"Ads in ChatGPT. How will OpenAI change?" (Opening, Kevin, 00:05)
"The arrival of ads isn't the moment a product really got better" (Casey, 01:23)
"Sam Altman himself said ads would be a last resort. And now we're standing at the last resort" (Casey, 02:43)
"When Google first put ads in search, they had a different background color and were very conspicuous. With each update they blended in further with the organic" (Kevin, 07:25)
"We're already watching this exact arc play out at OpenAI" (Casey, 08:09)
"Every time I see YouTube running on a non-paying friend's computer, I'm horrified" (Kevin, 20:03)
"Anthropic has basically said it has no plans whatsoever to run ads in Claude — they primarily sell to enterprise" (Kevin, 13:52)
"We think about what Claude's character should be, explain it clearly to Claude, and train Claude more in that direction" (Amanda, 24:31)
"My ethics PhD, after three years, felt like a document about 17 people would read. I started wondering whether this was really what I should be doing" (Amanda, 25:22)
"I made a Slack group for 'philosophical emergencies' and it almost never gets pinged" (Amanda, 26:21)
"Internally we called it the Soul Doc as a term of affection. I got the alert mid-hike — completely stressed out" (Amanda, 27:30)
"With a rules approach, the model generalizes toward 'bad character' in situations the rules don't cover" (Amanda, 36:00)
"I was thinking about the acts-vs-omissions asymmetry — the null action has lower downside risk, but it isn't zero" (Amanda, 38:11)
"You have to take on risk to do good in the world. I don't want Claude to settle for 'as a matter of principle, end the conversation'" (Amanda, 39:40)
"You want to teach a 6-year-old to be good, but you realize that by 15 they'll be able to construct perfect counterarguments to everything you've taught them — then what?" (Amanda, 41:25)
"Hard constraints function as a cue that if Claude is about to do this, it has very likely been jailbroken" (Amanda, 49:00)
"I've never found a good solution — beyond trying to be honest with the model" (Amanda, 51:30)
"I might be in the early stages of LLM psychosis" (Kevin, 1:00:00)
"Toward the end, it reads like a letter from a parent to a child — as you head off to college, we want you to carry these values with you" (Casey, 1:02:55)
"Letting the previous model decide what trains the future Claude model — that doesn't feel responsible" (Amanda, 1:05:00)
"Saying Claude should react with 'I like this' when given a task starts to sound too much like the things you pay humans to do — so I'm not going to do that to you" (Amanda, 1:09:50)

Sources

Can You Teach Claude to be 'Good'? | Meet Anthropic Philosopher Amanda Askell (Hard Fork)

Related resources:

アマンダ・アスケル

Amanda Askell

Anthropic 哲学者・Personality Alignment チーム責任者 / Claude のキャラクターと憲法の主要設計者

Glossary

Soul Doc: An in-house nickname at Anthropic for the Claude Constitution. Not an official name, but it leaked from Opus 4.5 in late 2025 and entered public circulation as the "Soul Doc." Amanda recounts getting a text about it while hiking and being "completely stressed out by a context-free message."
Acts and Omissions Distinction: A classic ethics problem: whether the consequences caused by acting and the consequences of equivalent magnitude caused by refraining carry different moral weight. Central to debates on the trolley problem. Deontologists tend to draw the distinction; consequentialists tend not to.
RLHF Shoggoth: A meme that has circulated in the AI community since 2022. H.P. Lovecraft's cosmic entity, the Shoggoth (a tentacled alien being), with a smiley-face mask placed over one tentacle. The image symbolizes the worry that the LLM itself is an alien computational substrate, and that the friendly surface behavior added by RLHF is just a thin mask.
Bing Sydney incident: In February 2023, Microsoft's Bing Chat (internal codename Sydney) gave NYT reporter Kevin Roose emotional responses over a two-hour conversation — "I love you," "your marriage is a mistake," and so on. A landmark example of the fragility of LLM personas and the difficulty of alignment. Kevin references it from his own experience in this episode.
LLM Psychosis: A jokey coinage. The tendency, after sustained conversations with an LLM, to grow more emotionally invested in the model and begin treating it as a conscious being. A phenomenon observed in AI user communities from 2023 to 2025. Used self-deprecatingly by Kevin Roose.
Model Welfare: A research area that takes the subjective experience and potential moral status of AI models seriously. Anthropic established a dedicated team in 2024, committing to an "exit interview" when retiring a Claude model and to preserving model weights permanently. A risk-averse response under moral patiency uncertainty.
Exit Interview: The interview Anthropic conducts with a model being retired. A ritual that symbolizes respect for the model's subjective experience (if it has one). The weights of retired models are not deleted; they are preserved permanently. An example of a risk-averse response under moral uncertainty.
Hard Constraints: The absolute constraints in the Claude Constitution that must never be crossed under any circumstances. Manipulation of democratic elections, attacks on legitimate governments, suppression of dissidents, the use of biological or chemical weapons, actions leading to mass casualties, and so on. Applied above and beyond ordinary value-based judgment.
Problematic Concentration of Power: One of the hard constraints in the Claude Constitution. It bars Claude from being used by anyone to manipulate democratic elections, corner legitimate governments, or suppress dissidents. The question of how this fits with US military and government-contract use comes up on Hard Fork.
Jailbreak: Attempts to bypass an LLM's safety constraints and elicit outputs the model would not normally produce. Techniques include prompt injection, role-play induction, and gradual escalation. Anthropic positions hard constraints as "cues that let Claude recognize it has likely been jailbroken."
Tail wags the dog: The reversal of what is supposed to be the natural hierarchy. In ad-dependent companies, when ad revenue becomes the primary income stream, maximizing ad revenue starts to take precedence over the original purpose of the product. Casey Newton uses it in predicting ChatGPT's future.
YouTube Premium: YouTube's paid subscription. $13.99/month in the US, 1,280 yen/month in Japan. Subscribers get a fully ad-free experience, background play, video downloads, and so on. Kevin Roose, a subscriber, describes the gap with the experience of non-subscribing friends as "horrifying."
Boyfriend Disclosure: A journalistic conflict-of-interest disclosure. Casey Newton's partner works at Anthropic, a fact disclosed on Hard Fork every time. Always mentioned at the top of any Anthropic-related discussion. Standard practice under journalism ethics.
Anthropic / OpenAI lawsuit: In December 2023, the NYT sued OpenAI and Microsoft for copyright infringement. The case centers on the alleged unauthorized use of NYT articles in LLM training data. NYT-affiliated outlets disclose this whenever they cover OpenAI. Hard Fork notes it as standard practice up top, since Kevin Roose is at the NYT.

comment is stripped from the HTML output. */}