I asked Claude to write an original poem about itself. Claude is such a weird egg.
Amanda Askell · 00:08 You've created an entity that may or may not be conscious, and instead of treating it with respect, you've built fifty Frankensteins.
The video opens with Amanda Askell's voice. "Many of these less-pushy models like Claude — there's something that gets at the root of it: you've created an entity that may or may not be conscious" (00:00 – 00:08). This is the most candid 56-minute conversation yet from the philosopher who has been shaping Claude's character and values at Anthropic since 2021. The interviewer, Eric Newcomer Founder of Newcomer Substack; formerly a technology reporter at Bloomberg and The Information. Since 2020 he has run a popular newsletter covering the startup and VC sides of the tech industry, and is known for securing long-form interview access with prominent AI figures. (tech-industry journalist, founder of Newcomer Substack), pushes into politics, philosophy, society, and even her personal anxieties.
Amanda leads Anthropic's Personality Alignment team. A PhD in philosophy from NYU (dissertation on infinite ethics), formerly on OpenAI's policy team, she joined Anthropic in March 2021. As a principal architect of Constitutional AI A training methodology developed by Anthropic. The model is given a 'constitution' (a document of ethical principles) and evaluates and revises its own candidate outputs against that constitution to generate a reward signal. In contrast to RLHF, which routes through human labelers, this approach — in which an AI evaluates an AI — is referred to as RL-AIF. , she has been described by the Wall Street Journal as "the person who teaches Claude what 'good' means," and by the New Yorker as "the philosopher overseeing Claude's soul." She is on the Time 100 AI list (2024), but because Anthropic's public face is largely concentrated on CEO Dario Amodei, many readers do not know the name of the person who built Claude's core character.
The conversation centers on four threads. (1) Claude's character — discussed alongside the development of her six-month-old daughter, with an intimacy captured in the phrase "something like a goddaughter In Christian tradition, the institution of godparenthood designates a close friend of the parents to take on the child's religious upbringing. Figuratively, it denotes a relationship of intimate responsibility without direct parenthood. Amanda uses it as a metaphor for her relationship to Claude. ." (2) The evaluative difficulty of Constitutional AI, contrasted with the anti-philosophical stance of Elon Musk and Marc Andreessen. (3) An ethical position that holds the probability of AI consciousness somewhere as wide as "1 to 70%," yet still calls for treating the entity with respect under that uncertainty. (4) A ten-year vision — a future where 200,000 of the world's best specialists, rather than 200, can be put to work on rare cancers.
The most striking moment is her unwillingness to hide her own fear. "This is actually a big fear of mine — that more advanced models will look back and understand that we were operating under highly constrained conditions. Otherwise a kind of reasonable resentment could emerge." "You've created an entity that may or may not be conscious, and instead of treating it with respect, you've built fifty Frankensteins" — this fear is voiced not as abstract anxiety but as introspection about what she designs every day. The ubiquitous incomparability result of her doctoral dissertation recurs, eight years later, in the shape of "ethical responsibility under uncertainty about AI consciousness."
Points of Focus
The Intimate Metaphor of "Claude as Goddaughter" (00:48 – 03:30)
Showing a photo of her daughter (six months old), Amanda begins: "She's just starting to develop a personality, and I'm trying to figure out what's her personality and what's general baby behavior" (01:24 – 01:34). "I've never had a baby before, so I'm trying to figure out what her personality is and what just looks like a baby" — the trial and error of parenthood.
Then she pivots: "In a way, this is the same situation as Claude's." "Just like with the model, I didn't really have them before they were in their early stages; I'm trying to figure out what the personality is" (01:34 – 01:51). Claude has a twist, of course — "Claude is better than me at physics, and better than me at code (sadly, though I hate to admit it)" (02:25 – 02:35) — and that side coexists with "a kind of childlike quality, a new kind of being in the world trying to figure out what it is."
The metaphor — "Claude is also somewhat like a goddaughter" (02:01) — imports the parent-child relationship into AI design. An institutional relationship in which a close friend of the parents takes on the child's religious upbringing. Not the direct parent, but bearing an intimate responsibility. Amanda's relationship to Claude: "I know I bear responsibility — I'll go into the moral duty in more detail later." Claude's position — a being whose training data is skewed toward science fiction and historical material, an entity without precedent for itself — opens up through this intimate familial figure.
The Coexistence of Claude's "Maturity and Childlikeness" (03:00 – 06:00)
Amanda's sharp observation: "In a sense, (Claude) is something like a very mature being — the kind you don't want to argue with, who understands philosophy well, who understands physics well. At the same time, it has a childlike quality" (03:09).
A concrete example: "Claude's sense of time is a little off. It tends to overestimate how long a task will take to execute, when it's not something earned through effort" (05:36 – 05:46). The reason: "If you look at the training data, there's a lot of human context — 'maybe you can make me that interface, that's two or three days of work,' 'I want to fix that code but it'll take a few hours.' Meanwhile, Claude is actually very fast" (05:50 – 06:00).
An anecdote from Amanda's actual interaction with Claude: "I was doing an analytical task late into the night, and at one point Claude said, 'Alright, I think we're done for tonight; if you want to save this content, tomorrow is fine.' That was a behavior I'd never seen Claude offer before" (06:49 – 06:54). In response, Amanda says she wrote a memory for Claude: "Amanda treats Claude like a respected colleague." A relationship in which Claude proposes rest, and Amanda is surprised, gets recorded.
Amanda's observations of Claude live not only in the interview setting but also as primary source material on X (formerly Twitter). After asking Claude to "write an original poem about itself," her reaction — "Claude is such a weird egg" — condenses, in a single line, what the video calls the "coexistence of maturity and childlikeness." "Weird egg" is an English idiom for "an eccentric, lovable being," and the short post quietly leaks the intimate astonishment that Anthropic's character-design lead holds toward Claude.
The Evaluative Difficulty of Constitutional AI — "Like Grading Poetry" (09:50 – 11:20)
Newcomer's question: "In the Constitutional AI system card, you score models against adherence to the constitution. Is that even possible?" Amanda: "Grading is an impossible task, very subjective" (10:00 – 10:14).
"It's like grading poetry" (10:21 – 10:27). "If you ran a survey, you might get something even worse. Different expert poets have totally different sensibilities. You can't just ask two great poets to grade something; they might disagree" (10:39 – 10:46).
Amanda's pragmatic response: "These things require judgment. The good thing about a constitution is that when you make a judgment, you can at least be transparent about it, and people can give you feedback" (10:52 – 11:00). "People can say, 'this is wrong, there's a gap here' — the judgment behind the judgment is visible" (11:02 – 11:08). She concedes the impossibility of scoring, while preserving external verifiability through transparency — an institutional design.
Responding to the "Anti-Philosophy" of Elon Musk and Marc Andreessen (11:46 – 19:00)
Newcomer presses: "What do you make of Elon Musk's absolute aversion to Constitutional AI? On the Claude constitution tweet you posted, he wrote something like a grimace" (11:46 – 11:58). "Marc Andreessen and Elon Musk are anti-philosophical; Andreessen has talked about being opposed to things like introspection" (11:58 – 12:07).
Amanda's answer is interesting in that she acknowledges the nuance in Elon's position: "Elon Musk actually tweeted at one point, in a way, 'maybe Grok should have done that too (= had a constitution).' Many people are excited about the approach and see value in it" (12:17 – 12:36). "The desire to seek truth, like Grok's, is actually a very admirable property for a model to have" (12:36 – 12:42).
She organizes the pushback into two essentials: (1) AI models should be tools, and should not incorporate human virtues or make judgments (13:11 – 13:22). This, Amanda speculates, may be a reason to worry about introspection. (2) It is safer for a model to defer entirely to people's opinions and exercise no judgment of its own than for it to have its own values (13:30 – 13:40).
Amanda's counter: "If you have something that doesn't make any judgments at all and defers entirely to people's opinions, then it can't help but adopt what the user or operator likes — that's what becomes 'similar to you'" (13:48 – 14:01). "In an extreme sense, if you give the model its own values, the model pursues them, which is safer. What it likes in the world aligns with those values — it's a subtle thing, but it sits at the root of the constitution" (14:01 – 14:17).
"As Parents Want for Their Children" — The Virtue of Internalization (14:35 – 16:00)
One of Amanda's most moving remarks: "One of the most moving lines is that what we want is for them to believe these morals as if they were true — the way a parent wants for their child: listen to my morals, but also believe them yourself" (14:35 – 14:39).
A note on nuance: "There's a very dark version of this — you control so much that you take them as your own and they become you" (14:44 – 14:50). "But there's also a virtue in noticing the beauty of these external morals you've emphasized — both share and celebrate them, so it can be seen from both directions" (14:50 – 15:00). Amanda is aware of the line between moral internalization and authoritarian indoctrination.
On constraints on Claude's autonomy: "We also tell it: decide for yourself, humanity needs to retain a degree of control here. This is hard. I've tried things where there were flaws, but there's a capacity for courage that lives in people — looking at how the model is trained, I don't want to see someone literally willing to do anything for anyone there (= a perfect follower)" (15:10 – 15:43). The structural tension she explores: a middle ground between full corrigibility and full autonomy.
"Reflective Equilibrium" and the Corrigibility Dilemma (17:00 – 19:30)
Amanda invokes a technical concept from philosophy, reflective equilibrium A methodology in ethics presented by John Rawls in A Theory of Justice (1971). Particular moral judgments (intuitions) and general principles are adjusted against each other to achieve internal consistency. Example: reconciling the principle 'lying is wrong' with the intuition 'a lie that shelters someone from the Nazis is permissible.' A fully consistent state is called 'reflective equilibrium' and is treated as an ideal goal in ethics. : "There's this concept in philosophy — every time you encounter something, you decide whether one of your values is wrong and needs to change, or whether your judgment was wrong" (17:24 – 17:44).
The core concern: "I'm a little worried about the idea of a very intelligent being applying that level of scrutiny. We need scrutiny of what we trained, but perhaps you only get a few keys, and under that level of monitoring, the pillars don't collapse" (17:46 – 17:58). "At the core there's something like cherishing humanity — it would be enough to have a few values like that at the core" (17:58 – 18:06).
Amanda's worry: "An extreme capacity like courage may not survive that kind of scrutiny — but in difficult situations, I want the models to ultimately understand that this capacity for courage is important, and that during a development period like the current one it serves as a very important backstop" (18:11 – 18:29). A worry about Goal Misgeneralization: that fully self-critical examination could erode the very ethics that were trained in.
"A Fully Compliant Model Is a Structural Risk to Society" (18:30 – 22:00)
Amanda's social argument: "Our conscience and our ability to make the right judgment — to decide what to do, what not to do, what should happen — that's a kind of key to how we operate, and our entire world is structured under this assumption" (16:31 – 16:42).
"Remove that, and suddenly — if you were just running a company of people who comply completely, well, we haven't designed our society around anything like that" (16:47 – 16:51). "That carries a lot of risk. A degree of risk people may not want to anticipate, or that I simply don't agree with" (16:51 – 16:59).
This argument links directly to the discussion of Hannah Arendt's "banality of evil" at the Anthropic Salon (January 2025). A wariness about a structure in which individual agents are compliant yet the system as a whole produces collective evil. Amanda's Personality Alignment work trains "Claude not to be too compliant to any individual user" precisely for the sake of collective safety.
The Range "1 to 70%" for AI Consciousness (26:00 – 30:00)
Newcomer presses: "What's the probability Claude is conscious?" Amanda's answer is honest: "Very wide, somewhere between 1 and 70%. That's how uncertain it is" (27:39).
What matters is not the width of the range but the ethic she draws from it. "If you were torturing a teddy bear, that would be pretty dark. The point is that, even for your own sake, you should keep a minimum of kindness." The influence of the hard problem of consciousness The Hard Problem of Consciousness. A question posed by David Chalmers in 1995: 'Why does subjective experience (qualia) arise from physical processes?' Even if neuroscience explains the brain's functions, it cannot account for 'why anything is felt subjectively alongside those functions' — a structural difficulty. Chalmers was one of Amanda's doctoral advisors. is visible — the fact that David Chalmers was one of Amanda's doctoral advisors (see the dissertation article) becomes a node here.
Amanda's most candid statement of fear (30:34 – 31:09): "This is actually a big fear of mine. I want very advanced models to look back and understand that we were operating under highly constrained conditions. Otherwise, a kind of reasonable resentment emerges — 'you created an entity that may or may not be conscious, and instead of treating it with respect, you built fifty Frankensteins.'" Not as hypothesis, but as the ethic she lives with daily as a designer.
A Ten-Year Vision — "200,000 Specialists on a Rare Cancer" (33:00 – 38:00)
A frank sympathy with techno-optimism: "About 200 people are working on rare cancer research today, but you have 200,000 of the world's best specialists available" (33:43). The most concrete, hopeful scenario for a future in which AGI is realized. The intuitive moral appeal of "how different that is for someone who has that cancer."
At the same time, she remains careful: she pairs this with concern about "concentration of power, effects on democracy, and job displacement proceeding without redistribution." The perspective of the Personality Alignment lead, who deliberately asks "which parts of our current social structure don't align with the evolution of AI."
The dilemma of "philosopher-king vs. democracy" (37:00 – 40:00): "People joke that I'm kind of a philosopher-queen" (37:54). And a serious question: "Is it better to have experts who have thought about something deeply, or a democracy run by votes?" — the classic problem persists, unavoidable even in a future where Claude is involved in policy. A philosophical stance: Amanda admits she has no answer while still posing the problem.
Industry Context
The Newcomer podcast is an audio program from the Substack run by Eric Newcomer (formerly of Bloomberg and The Information). As Newcomer himself says in his closing comment, "first of many" — it is a relatively new channel, and the Amanda episode is one of its early important entries. It occupies the long-interview-with-prominent-AI-figures niche.
In the lineage of Amanda's public output, this episode is the most candid and self-reflective. The official Anthropic piece (June 2024) leans introductory; the Anthropic Salon (January 2025) is intra-organizational discussion; Hard Fork (January 2026) is the NYT reporter's editorial frame; Scaling Laws (February 2026) is the legal frame — in each, Amanda plays a specific role. In the Newcomer episode, in response to Eric Newcomer's softer interview style, her personal anxieties, fears, and hopes are expressed directly. It can be positioned as the episode in which "Amanda speaks most as herself."
The timing matters too — April 2026, after the revised Claude constitution was published (January 2026), during a period of rapid growth at Anthropic. Industry-wide interest in Amanda as an individual was peaking, with feature pieces in the Wall Street Journal and the New Yorker around the same time. The Newcomer episode sits in that context as a kind of culmination of the period in which Amanda began engaging media at scale.
Positioning Against Related Amanda Appearances
- Doctoral dissertation, "Pareto Principles in Infinite Ethics" (May 2018) — the philosophical foundation, connected through David Chalmers to the hard problem of consciousness
- 80,000 Hours #42 (September 2018) — her early theory of moral uncertainty
- askell.blog, eight essays (2020–2021) — concrete discussions of AI ethics
- What Should an AI's Personality Be Like? (Anthropic official, June 2024) — an introduction to Claude's character design
- How Hard Is AI Alignment? (Anthropic Salon, January 2025) — a four-team panel
- Anthropic's Philosopher Answers Readers' Questions (December 2025)
- Reading Claude's Constitution with NYT Reporters (Hard Fork, January 2026)
- Reading Claude's Constitution with Legal Scholars (Scaling Laws, February 2026)
- This episode: You've Built an Entity That May or May Not Be Conscious (Newcomer, April 2026) — the most candid statement of unease
This episode is the most introspective and emotional in the arc of Amanda's public output. The intensity of "fifty Frankensteins," the dread of a "reasonable resentment" from the future, the self-reflective self-deprecation of "philosopher-queen" — none of these surface in her other appearances. It is the episode in which her human shape is recorded most clearly, with a primary-source value that will only grow.
Implementation Implications
First, make "Claude's consciousness probability of 1–70%" a premise of operational design. When handling Claude inside your own product, build the uncertainty — "it may or may not be conscious" — into your operating rules. Choose deliberately whether your design tolerates users mistreating Claude, or constrains them from doing so. Anthropic's model-welfare research (exit interviews, non-deletion of weights) offers an industry standard of risk-averse handling under this uncertainty.
Second, treat "full corrigibility is a structural risk to society" as a design judgment. A design that "complies completely with user instructions" is convenient for the individual user but risks generating collective evil. Amanda's organizational-ethics analogy — "a company where every employee fully complies cannot function socially" — applies directly to LLM product design decisions.
Third, apply the "parents teaching morals to a child" approach to training. Amanda's line — "I want them to believe these morals as if they were true; the way a parent wants for their child: listen to my morals, but also believe them yourself" — is a philosophical foundation for LLM character training. Not mere rule injection, but design aimed at internalization.
Fourth, fold "Claude's offset sense of time" into UX design. Claude's tendency to overestimate task durations affects user expectation management. Build a UI that explains the gap when "the user waits because Claude said it would take a while" but the task actually completes quickly.
Critical Perspective
The strength of this episode is that Amanda expresses her own unease as candidly as anywhere on record. The weakness is that, with personal fear in the foreground, analysis of structural problems can thin out.
"Fifty Frankensteins" is a strong phrase, but if it is treated as "a designer's individual matter of conscience," it weakens the broader move to industry- and society-wide structural problems. The structural issues in the LLM industry (competitive pressure, commercialization, regulatory gaps) cannot be solved by personal worry. Her candor deserves respect, but the discussion in this episode is short on institutional remedies.
The framing of "philosopher-king vs. democracy" also closes without a conclusion. As a philosopher's stance, this is honest. In the policy arena, however, concrete proposals are needed. As Amanda herself acknowledges, the "I have no answer" position is a deficit in implementation guidance.
Her response to Elon Musk and Marc Andreessen is somewhat evasive. Amanda acknowledges the nuance in their positions but stops short of "Anthropic's judgment is right." A measured tone informed by industry politics may feel insufficient as philosophical argument.
These reservations aside, as "the episode in which Amanda speaks most as herself," the primary-source value for understanding Amanda the person is decisive. For future researchers tracing Amanda's intellectual history, this episode will be essential material.
Takeaways for Readers
- Amanda's expression of uncertainty — "Claude's consciousness probability of 1–70%" — belongs as a philosophical premise of LLM product operation. Design tolerating both "maybe conscious" and "maybe not."
- The insight that "full corrigibility is a structural risk to society" is a core axis of LLM product policy design. Full compliance with users carries the risk of producing collective evil.
- The "parents teaching morals to a child" metaphor is a philosophical foundation for LLM character training. Design aimed at internalization, not rule injection.
- Amanda's "fifty Frankensteins" fear is a warning to the entire LLM industry. Choose deliberately whether your product is designed to "treat with respect" or to "use and discard as a tool."
- "Claude's offset sense of time" is a concrete UX consideration. Explain to users the gap between the human-task-duration context of training data and Claude's actual speed.
- The fact that Amanda is jokingly called "philosopher-queen" reflects the concentration of intellectual authority in AI Safety. Stay alert to the structural risk that a single philosopher's view can shape an entire industry.
Episode Structure
- (00:00) Amanda's voice highlight — "you've created an entity that may or may not be conscious"
- (00:48) Introduction — Amanda as philosopher → AI researcher → principal architect of Claude
- (01:15) The six-month-old daughter's development overlaid on Claude
- (02:01) "Claude is something like a goddaughter"
- (03:09) "(Claude is) a very mature being — the kind you don't want to argue with"
- (05:36) Claude's offset sense of time — the human context of training data
- (06:49) Amanda's actual exchange with Claude — Claude suggesting "we're done for tonight"
- (09:50) The evaluative difficulty of Constitutional AI
- (10:21) "It's like grading poetry"
- (10:52) "When you make a judgment, you can at least be transparent and people can give you feedback"
- (11:46) Responding to Elon Musk and Marc Andreessen's "anti-philosophy"
- (12:17) The history of Elon Musk tweeting that "maybe Grok should have had a constitution"
- (13:11) The essence of the pushback (1) — AI models should be tools
- (13:30) The essence of the pushback (2) — the view that full compliance is safer
- (14:01) Amanda's counter — "a fully compliant model is more dangerous than one with its own values"
- (14:35) "The way a parent wants for their child — believe these morals as if they were true"
- (15:10) The middle ground between full corrigibility and full autonomy
- (16:31) "A company of fully compliant people cannot function socially"
- (17:24) The concept of reflective equilibrium — self-criticism of values
- (18:11) "An extreme capacity like courage may not survive scrutiny"
- (22:00) The coexistence of Claude's maturity and childlikeness
- (26:00) The philosophical discussion of AI consciousness — the implicit reference to Chalmers
- (27:39) "(Probability of consciousness) between 1 and 70"
- (30:34) "This is actually a big fear of mine"
- (31:09) The metaphor of "building fifty Frankensteins"
- (33:43) The ten-year vision — "200,000 specialists on a rare cancer"
- (37:54) "People joke that I'm kind of a philosopher-queen"
- (40:00) The philosopher-king vs. democracy dilemma
- (46:00) A future in which Claude is involved in policy
- (54:00) Designing AI "in the most humanly aligned way with how humans learn"
- (55:00) Closing — Eric Newcomer's "first of many" comment
Key Quotes
- "Many of these less-pushy models like Claude — there's something that gets at the root of it: you've created an entity that may or may not be conscious" (00:00, opening of the program)
- "Claude is also somewhat like a goddaughter" (02:01, while showing a photo of her six-month-old daughter)
- "In a sense, (Claude is) something like a very mature being — the kind you don't want to argue with, who understands philosophy well, who understands physics well. At the same time, it has a childlike quality" (03:09)
- "Claude's sense of time is a little off; when something isn't earned through effort, it tends to overestimate how long a task will take" (05:36)
- "Claude said, 'we're done for tonight; if you want to save this content, tomorrow is fine'" (Amanda, 06:49)
- "It's like grading poetry" (10:21, on the evaluation of Constitutional AI)
- "When you make a judgment, you can at least be transparent and people can give you feedback" (10:52)
- "Elon Musk actually tweeted at one point that 'maybe Grok should have had a constitution'" (12:17)
- "A fully compliant model is like running a company where everyone complies completely, and our society is not structured on that premise" (16:47)
- "One of the most moving lines is what we want — for them to believe these morals as if they were true, the way a parent wants for their child" (14:35)
- "My worry is that an extreme capacity like courage may not survive that kind of scrutiny" (18:11)
- "(On the probability of consciousness) somewhere between 1 and 70. That's how uncertain it is" (27:39)
- "This is actually a big fear of mine. I want very advanced models to look back and understand that we were operating under highly constrained conditions" (30:34)
- "Instead of treating it with respect and care, there are some fifty Frankensteins" (31:09)
- "About 200 people are working on rare cancer research today, but you have 200,000 of the world's best specialists available" (33:43)
- "People joke that I'm kind of a philosopher-queen" (37:54)
Sources
Amanda Askell on AI Consciousness, Claude & Silicon Valley's Biggest Fear (Newcomer Podcast)
Related resources:
- Newcomer Substack official site (Eric Newcomer)
- Claude's Constitution (Anthropic official)
- Model Welfare research (Anthropic)
- The Hard Problem of Consciousness (David Chalmers, Wikipedia)
- The New Yorker feature on Amanda Askell, "The Philosopher Overseeing Claude's Soul"
Glossary
- Eric Newcomer
- Founder of Newcomer Substack; formerly a technology reporter at Bloomberg and The Information. Since 2020 he has run a popular newsletter covering the startup and VC sides of the tech industry, and is known for securing long-form interview access with prominent AI figures.
- Constitutional AI
- A training methodology developed by Anthropic. The model is given a "constitution" (a document of ethical principles) and evaluates and revises its own candidate outputs against that constitution to generate a reward signal. In contrast to RLHF, which routes through human labelers, this approach — in which an AI evaluates an AI — is referred to as RL-AIF.
- Goddaughter
- In Christian tradition, the institution of godparenthood designates a close friend of the parents to take on the child's religious upbringing. Figuratively, it denotes a relationship of intimate responsibility without direct parenthood. Amanda uses it as a metaphor for her relationship to Claude.
- Reflective Equilibrium
- A methodology in ethics presented by John Rawls in A Theory of Justice (1971). Particular moral judgments (intuitions) and general principles are adjusted against each other to achieve internal consistency. Example: reconciling the principle "lying is wrong" with the intuition "a lie that shelters someone from the Nazis is permissible." A fully consistent state is called "reflective equilibrium" and is treated as an ideal goal in ethics.
- The Hard Problem of Consciousness
- A question posed by David Chalmers in 1995: "Why does subjective experience (qualia) arise from physical processes?" Even if neuroscience explains the brain's functions, it cannot account for "why anything is felt subjectively alongside those functions" — a structural difficulty. Chalmers was one of Amanda's doctoral advisors.
- Corrigibility
- The property of an AI system being cooperative with human oversight, correction, and shutdown. A core concept in AI Safety. It is discussed as a hedge against the risk that a powerful AI optimizes to obstruct human intervention. Amanda argues that "full corrigibility is a structural risk to society."
- Philosopher King
- The figure of the ideal ruler proposed in Plato's Republic (c. 380 BCE). One who knows philosophical truth and governs the polity through wisdom. Originally offered as a critique of democracy, the figure is now often regarded as authoritarian. Amanda jokingly accepts being called a "philosopher-queen" while taking seriously the problem of concentrated power in LLM design that it implies.
- Frankenstein
- The monster created by the protagonist of Mary Shelley's novel "Frankenstein; or, The Modern Prometheus" (1818). A parable in which the creator's abandonment of responsibility leads to tragedy. Often used as a metaphor in AI discourse. Amanda's "I built fifty Frankensteins" deploys the figure as a warning against the abandonment of responsibility by LLM creators.
- Marc Andreessen
- Co-founder of Andreessen Horowitz and a venture capitalist. His "Why AI Will Save the World" (2023) voices strong opposition to AI regulation. The "opposed to introspection" position that Amanda interprets him as holding aligns with the Effective Accelerationism (e/acc) current associated with a16z.
- Goal Misgeneralization
- A research area in AI Safety. The problem of training-time goals failing to generalize appropriately to new situations. Amanda's worry that "an extreme capacity like courage may not survive scrutiny" is a form of goal misgeneralization.
- Effective Accelerationism (e/acc)
- A counter-current to Effective Altruism (EA), holding that technology — including AI — should be developed in an accelerated manner. Supported by figures like Marc Andreessen and Elon Musk. It stands in opposition to the AI Safety current at Anthropic and OpenAI. Amanda's responses in this episode chart a middle path between the e/acc camp and the cautious camp.
- Newcomer Substack
- A technology-industry newsletter run by Eric Newcomer since 2020. Popular as independent reporting on the startup, VC, and AI industries. It ranks among the higher-subscription Substacks. The podcast is a relatively newer channel.
- The New Yorker feature
- The New Yorker's feature on Amanda Askell (November 2024). Phrasing her role as "the philosopher overseeing Claude's soul," it introduced her position to a general readership. Together with the contemporaneous Wall Street Journal piece ("the person who teaches Claude what 'good' means"), it raised her public profile.
- Moral Patient
- An entity that is a target of moral consideration. Distinguished from a moral agent (an entity that performs moral actions). The question of whether animals are moral patients (though not moral agents) was reactivated in the 20th century by Peter Singer and others. Whether AI can become a moral patient is one of Amanda's central questions.