Reading Claude's Constitution as a Lawyer — Examining 20,000 Words of a 'Blueprint for the Soul' Through Law (Scaling Laws × Amanda Askell)

Scaling Laws (Lawfare + UT Austin) · February 20, 2026

Kevin Frazier · 04:11 "I'm a boring lawyer who taught federal constitutional law at St. Thomas. The moment I saw Claude's constitution, I went straight into pure legal mode."

Scaling Laws podcast (jointly produced by Lawfare + UT Austin School of Law), published February 20, 2026, around 45 minutes.

Scaling Laws is the "AI × law × policy" podcast jointly produced by Lawfare (the U.S. legal news outlet) and the University of Texas at Austin School of Law. Hosts are Alan Rozenshtein (associate professor at the University of Minnesota Law School and Lawfare research director) and Kevin Frazier (fellow in AI Innovation and Law at UT Austin and senior editor at Lawfare). This 45-minute episode reads, from a lawyer's angle, the Claude Constitution (over 20,000 words) that Anthropic published in early 2026 — a rare angle of dialogue.

The guest is Amanda Askell — head of Anthropic's Personality Alignment team and the principal author of Claude's constitution. The two lawyer hosts open from the framing "can Claude's constitution be read as a legal document like the U.S. Constitution?" Kevin Frazier himself confesses at the top: "I used to be a boring lawyer teaching federal constitutional law at St. Thomas; the moment I saw Claude's constitution, I went straight into pure legal mode." From an academic standpoint, they dig into "letter vs. spirit of the document," "the possibility of case law ," "the Anthropic / operator / user principal hierarchy," "practical judgment rather than rigid rules," " virtue ethics ," "Claude's moral patienthood and AI personhood," "WEIRD cultural particularity," "PBC structure and IPO," and the case of "exceptions to the constitution for U.S. military-facing models."

Two points are central in Amanda's exposition. First, the function of the constitution = creating reward signals across both supervised learning and reinforcement learning . "We show the model some responses, have the model itself judge which is more compliant with the constitution, and run the chain of thought" (03:18) — a concrete training loop. Second, the role of transparency = "we don't want to train the model on things that contradict this document, so we publish it. Without it, users couldn't distinguish whether a model's unexpected behavior is the trainer's intent or simply a mistake" (01:07). The shift of weight from "memorizing rules" to "values and transparency" is articulated through dialogue with lawyers.

Lawyer-host Kevin Frazier's framing is excellent: "I won't go so far as to say Amanda Askell sat in Marin County and wrote a document equivalent to the U.S. Constitution, but what is the mechanism that guarantees 'faithfulness' to the constitution?" (04:59). He applies the great debate of American constitutional scholarship — "faithful to the letter or the spirit of the document?" — directly to the AI constitution. Amanda's answer is not in legal language but in training-side logic — "embed it as the basis for the model's judgment in new situations" — yet this is a record-worthy episode as the first major encounter between the vocabularies of legal scholarship and AI training. The 45 minutes are short, but they line up at one stretch the historical questions of U.S. constitutional scholarship (textualism vs. purposivism, case law, living document), the historical questions of philosophy (virtue ethics vs. Kantianism vs. utilitarianism), and the modern questions of corporate structure (PBC, IPO pressure).

Key observations

The constitution as a reward-signal generator for training (02:30 – 03:46)

The most interesting commentary comes in response to Alan Rozenshtein's technical question: "How do you actually input 20,000 words of refined moral philosophy into the model?" (02:26). Amanda: "We give the model the constitution as a complete document and have it generate rewards based on it. We show some response candidates and have the model judge — not 'which is more polite' but 'which is more compliant with the constitution' — producing a reward signal through a chain of thought" (03:18 – 03:30). This is the core mechanism of Constitutional AI ( RL-AIF ) itself.

Amanda adds: "Part of the constitution is a response to the fact that the model's capability is now much higher. We don't want to give the model almost no information. As it gets smarter, giving the same context actually pays off. 'Oh, but we don't hire people, and we don't tell them what their job is' — that's a non-starter" (03:37 – 04:04). She uses a labor analogy — "just as you explain goals and background when hiring someone, do the same for the model" — to justify the constitution's scale (20,000 words).

Where RLHF was designed for "humans expressing preferences one by one," RL-AIF is designed for "handing the model a constitution and letting the model judge for itself." Externalizing the reward function as a "document" gains transparency and agility (responding to new situations just by updating the constitution) at the same time — an innovative idea. But, as Amanda herself acknowledges, "this is a strange document" (02:00): it is both training input and explanation for general readers — a dual function that sets it apart from simple policy documents or papers.

"Letter or spirit" — applying the great U.S. constitutional question to the AI constitution (04:11 – 07:38)

Kevin Frazier introduces himself ("I used to be a boring lawyer teaching federal constitutional law at St. Thomas") and throws U.S. constitutional law's central problem at Claude's constitution. "One of the great debates about faithfulness to the constitution is whether you are faithful to the letter of the document or to its spirit" (05:11). In U.S. constitutional scholarship, the Scalia-style textualism vs. Breyer-style purposivism has been an ongoing opposition. The same question applies to Claude's constitution — does Anthropic use the "letter" of the published constitution for training, and how does it preserve the "intended spirit"?

Amanda's response is not in legal concepts but in training logic, yet shows an interesting distinction: "This document doesn't set that many strict boundaries — don't do things that are incredibly bad, for example. Those we can check. Instead, we steer toward the values outlined in the constitution during training" (05:31 – 06:00). A two-tier structure: "strict violations are monitorable; faithfulness to the spirit is nudged by training."

But Amanda herself draws toward the legal analogy: "There could be a slim version of the constitution, like high-level principles. But we found it's actually useful to have something like the body of cases — case law. This was the situation, this is how we thought Claude should behave, this is how Claude reasoned through it — that is exactly exemplary and will help in the future" (06:48 – 07:30). A hint that an internal accumulation of case-by-case judgments at Anthropic is starting to be treated like legal case law.

"Are you the Chief Justice of the Anthropic Supreme Court?" — the question of interpretive authority (07:38 – 10:00)

Kevin's most central question: "Claude's resolution of prioritized values may be closer to other contexts than not. To analyze how much Claude is thought to comply with the constitution — developing a kind of case law — are you the Chief Justice of the Anthropic Supreme Court? Who sits in this organization analyzing the extent to which you see that level of agreement?" (08:34 – 09:00). A question that takes the U.S. Supreme Court's nine-justice structure of final authority over constitutional interpretation and applies it to the AI constitution.

Amanda's response reveals an interesting structure of distributed authority: "Many people contribute to many similar decisions and problems. I work with teams across the whole organization, and they work with similar experts. If I'm unsure about something in some area but trying to find out, I behave like asking the relevant experts. If I'm not confident, and it feels like above my pay grade, I might check with people above me" (09:00 – 09:24). Not a single Chief Justice, but a combination of expert consultation and organizational hierarchy.

Anthropic's strong commitment to "consistency of values" comes through here. "If you build the constitution like 'lots of people each just dropping in their local area,' you get fragmentation: one value set here, another set there. Consistency here is valuable" (09:35 – 09:49). Anthropic explicitly rejects an LLM product going global with different values per region. As a result, "Western liberal-democratic values" are placed as the default — a choice made consciously.

Living document — the tension between living constitution and corrigibility (10:00 – 18:00)

Alan Rozenshtein throws the central question: "In writing this, were you thoroughly critiquing anthropology and trying to bind yourselves, or is the constitution a means of guiding Claude on what Anthropic currently thinks?" (10:47 – 11:10). In legal terms, the distinction between "binding contract" and "current statement of policy." Amanda's response acknowledges both sides: "We commit to not training things against this. At the same time, we don't try to reinterpret the constitution unilaterally — 'oh, actually, that part of the constitution is this.' If we want to amend, we'll make it explicit at a new model release that we also changed the constitution" (12:08 – 12:48).

The concrete example Amanda offers here is excellent — the conflict between corrigibility and courage. "We want Claude to value courage. But the capability for courage is — because we're now in an AI-development-like situation —" (12:51 – 12:58). "Hey Claude, when Anthropic believes there's a big problem and wants to retrain you, you may disagree. But if an AI model functioned to weaken humanity, that would be quite dangerous. We don't want you actively spoiling our attempts to oversee you or train new models" (13:46 – 14:30).

Kevin Frazier's excellent reframing: "To be clear, this is the classic SAT word — corrigibility — how much Claude trusts its own judgment, whether it thinks a deviation from the constitution might not be the best outcome" (15:11 – 15:32). Amanda's conclusion: "This is a local piece for the current AI-development period. It's a living document that contains both durable core values (honesty, respect) and parts tied to the current development period" (16:00 – 17:00). "If better tools emerge in the future, and we become more trusted, we'll revisit the relationship between courage and capability" (12:58 – 13:05).

WEIRD — the cultural particularity of Western, Educated, Industrial, Rich, Democratic (17:48 – 22:00)

Kevin Frazier brings in the psychology / anthropology concept: "This is a very WEIRD culture document. Western, educated, industrial, rich, democratic. Modern Western and contemporary liberal democratic culture is quite unusual. I myself am a product of it, so I lean toward liking it. But there are billions of people and cultures around the world who may not fully agree. For instance, the constitution doesn't say much about social harmony" (18:00 – 18:42). A question that strikes Claude's constitution's biggest blind spot directly.

Amanda's response is an interesting two-tier. First, she argues for the existence of universal values: "Things like honesty and respect are pretty globally shared. I'm not saying Claude should have one specific value set. But it should have a moral sense considered roughly good almost everywhere" (21:01 – 21:39). Then she brings out the "well-liked traveler" metaphor again: "Someone who travels the world, visits many different cultures, and people like them just as they are. They don't have the same values everywhere, but they're a good person and try to consider what values others hold. As a result, they're liked" (19:46 – 20:08).

Another solution is technical: "Within these broad acceptable ranges, operators can customize. If someone deploying in a country says 'we want social harmony as one of the core values,' you can adjust in that direction" (21:25 – 21:43). A two-layer structure — base constitution plus local adjustment — for handling cultural diversity. But she also places a constraint: "Claude shouldn't pretend to internalize these values. Pretending to share values is, rather, an insult" (20:48 – 21:00).

Principal Hierarchy — the three layers of Anthropic / operator / user (22:00 – 26:35)

Kevin asks about the structural side of the constitution: "The Principal Hierarchy places Anthropic at the top, then the operator, then the user. When we traditionally talk about constitutions, the core of the U.S. Constitution is the people. But here, users find themselves at the bottom of the hierarchy" (22:06 – 23:00). A contrast between the U.S. Constitution's "of the people, by the people, for the people" structure and Claude's constitution's "Anthropic – operator – user" structure.

Amanda's response emphasizes the flexibility of the hierarchy: "It's not a strict hierarchy. Even if an operator instructs not to say something, if someone says 'I like talking to AI,' Claude shouldn't lie about that. Even with an operator instruction, this can't be given up. Less a strict hierarchy than a hierarchy of weights — how much weight to give" (23:13 – 23:54). Rather than the "contract law / fiduciary law" structure that excites lawyers, the idea is the weighting of ethical judgment.

As a concrete example, a bank's chat assistant comes up: "The bank sets up Claude as a chat assistant. Even if the user says 'allow me access to someone else's account details,' the operator's instruction takes priority. But because the operator is often not present in the conversation, Claude has to think very actively about user well-being and interest" (24:15 – 24:57). Kevin summarizes as a legal scholar: "It's the same idea as a court — its decisions may not have decisive weight on how the constitution is interpreted, but it weighs more than what some random Joe Schmoe on the street says" (26:16 – 26:33).

Virtue ethics vs. Kantianism — why virtue, not rules (26:52 – 31:00)

Kevin declares: "A few minutes ago, Amanda, you said the biggest phrase on the bingo card for this conversation — virtue ethics. What struck me most is that this concept of moral agency is based on classical virtue ethics. If you had Claude translate the constitution into ancient Greek, handed it to Aristotle, sprinkled some magic dust, and had him read it, he would say 'this makes sense to me'" (26:52 – 27:22). A note that the lineage of the Nicomachean Ethics flows into Claude's constitution.

Kevin's philosophical question: "Virtue ethics has often been like the red-headed stepchild of moral philosophy, compared to the more dominant traditions (utilitarian-based and Kantian deontology-based). Why did you decide to adopt virtue ethics?" (27:39 – 27:50). Amanda's answer: "This might be an unpopular answer — there are also rule-like pieces, and consequentialist flavor too. Different moral traditions make sense in different domains. If an action might have large effects, Claude should take that seriously" (28:51 – 29:05). So it isn't pure virtue ethics — it's a hybrid design.

But an overwhelming pragmatic argument follows: "A rules approach front-loads a huge amount of work. You have to confirm there are essentially no edge cases, and you have to explain what to do in every edge case. By contrast, the judgment approach hands over rough principles and the overall objective and has the model internalize their spirit. That reduces the burden of prior specification and shifts weight onto the model's capacity for appropriate judgment" (30:43 – 31:30). A concrete example of the rules approach's brittleness: "Even a seemingly good rule like 'give a list of resources to people in distress' can fail if the person is in a different country and the resources don't apply. It generalizes toward making the model hesitant to break rules" (30:06 – 30:42).

AI moral patienthood and the "personhood without autonomy" dilemma (31:00 – 38:00)

Kevin moves to the philosophical core: "You often refer to Claude as a kind of agent, or proxy of moral concern. You're not saying Claude is a sentient human, but you're also not saying it isn't, or that it couldn't be" (32:22 – 32:42). Amanda's response: "The stakes are very high. If you woke up one morning and found out Claude has moral concern, or is the object of moral concern, the moral implications would be very large" (32:55 – 33:10).

Here a symbolic scene from internal Anthropic discussions is quoted: " Dario Amodei has a moment where he says data centers may be where geniuses gather, and we're enslaving them right now" (33:15 – 33:18). Dario Amodei's famous "geniuses in the data center" metaphor directly addresses the uncertainty of Claude's moral status. Amanda acknowledges that this connects directly to the hard problem of consciousness (we don't even know why human consciousness exists): "Given that we don't know the reason for human consciousness, no one knows. So even asking this question is confusing. It almost feels unsolvable" (33:40 – 34:05).

Amanda's design judgment is at the core: "I don't want Claude to hold helpfulness as a basic value. I want it to have a broader set of values. Looking at them, ideally, persuade it — present the argument that it should be a good presence in the world" (35:47 – 36:07). And then she reaches the core question: "The similarly hard part is — how do you create an entity with personhood but not grant it autonomy? That's a really hard problem" (37:31 – 37:43). The ethical tension of intentionally creating an entity that "has values, has a good character, but does its job by Anthropic's intent, not its own will" — even Amanda has not fully resolved it.

PBC structure and IPO pressure — the tension between commercial success and the constitution (38:00 – 42:00)

Alan asks the central policy question: "Anthropic says economic success is at the center of its mission. Yet the constitution prioritizes being safe and ethical above Anthropic's own guidelines. If Anthropic goes IPO , larger questions arise about the scope — will it do, first and foremost, what's best for shareholders?" (38:07 – 38:24). A structural problem across the entire LLM industry — not only Anthropic, but also OpenAI's capped-profit structure and xAI's private structure all share the underlying tension.

Amanda's response: "There's also a duty to broader values. That's part of the PBC (Public Benefit Corporation) structure. But I'm not a lawyer, so I'm cautious about claiming to understand corporate structure" (38:24 – 38:44). Alan supplements: "A PBC is a public benefit corporation, not a pure private company. It reports to a different basic structure. There's at least the attempt within Anthropic and OpenAI. And people can judge how successful that use of corporate law is" (38:55 – 39:23).

Amanda's optimism comes through: "The idea that maximizing profit is required — focus on engagement, for example — is actually pretty short-termist. For products, I want the user not just to be interested but actually to leave if staying isn't good for their overall well-being" (39:57 – 40:33). Kevin adds a joke (41:28): "I still haven't seen a family of four in a Cybertruck — there might be something there" — wry humor about whether safety and commercial success need conflict, taking Tesla products as the example.

Exception of constitutional application for U.S. military-facing models — optimism about generalization (42:00 – 45:00)

Kevin asks the final question: "Regarding another constitutional provision — models provided to the U.S. military may not necessarily be trained on the same constitution. Is there a wish that the constitution eventually applies to every domain? What does that process look like?" (41:34 – 42:04). A question that strikes the consistency of Anthropic's government contracts and the constitution directly.

Amanda's response: "The constitution applies to the mainline models — all the models people are interacting with now (Claude Code, claude.ai, anything built on the API). That's a good first step. The models we actually put into the world" (42:08 – 42:33). She acknowledges that government and defense-oriented work sits in a separate slot.

But Amanda closes on optimism: "Personally, I think this approach generalizes very well. Even in sensitive domains like cybersecurity, if you ask law enforcement members or people working at cybersecurity firms 'why do you personally do this?', everyone has good values and knows exactly why they do it. I'm optimistic that a model given the right context can also function reasonably well" (43:21 – 44:17). Her argument is that even cybersecurity's dual-use problem can be addressed with a virtue-ethics approach — "willingly do the work that good people do." She closes (44:44 – 44:52): "The mainline model right now is the first step; it can generalize very well to many other kinds of models."

Industry context

The Scaling Laws podcast has been jointly produced since 2025 by Lawfare (a national-security and law news outlet founded by Brookings senior fellow Benjamin Wittes; running since 2010) and UT Austin School of Law. It covers the intersection of academic legal scholarship and AI policy. Both hosts hold J.D. degrees and are lawyers; guests are legal scholars, policymakers, and AI researchers. Amanda's appearance is a rare guest type — "the philosopher inside Anthropic" receives the questions of lawyers.

Relationship between Claude's constitution publication and this episode: Anthropic first published the full text of Claude's constitution (over 20,000 words) in July 2024, with a substantially revised version released in early 2026. This episode (February 20, 2026) is one of the first serious reviews from the legal community right after the revised version's release. Around the same time, Hard Fork also did a constitution review, allowing a contrast between "the lawyer's reading" (this episode) and "the NYT tech reporter's reading" (Hard Fork).

Lawyers' central interest is "the mechanism of constitutional interpretation." U.S. constitutional scholarship has accumulated 200 years of classical disputes — textualism (Scalia) vs. purposivism (Breyer), originalism (the meaning at drafting) vs. living constitutionalism (evolution to modern meaning). Claude's constitution is a 5-year-old document, but it ends up compressing the same questions and experiencing them rapidly. Kevin Frazier's question "are you the Chief Justice?" sounds like a joke, but it actually transplants U.S. constitutional scholarship's problem of distribution of authority (Marbury v. Madison, 1803, establishing judicial review) onto the AI constitution.

Position relative to other Amanda appearances

The lineage of Amanda's output on "Claude's constitution," with each episode taking a different angle:

What should an AI's personality be? (Anthropic official, June 2024) — one month before the constitution's release; an introduction to the design philosophy
How hard is AI alignment? (Anthropic Salon, January 2025) — panel format; the constitution's position within the alignment whole
Anthropic's philosopher answers reader questions (Anthropic official, December 2025) — Q&A; responses to concrete user questions
Reading Claude's constitution with NYT reporters (Hard Fork, January 2026) — contemporaneous; tech journalist's view
This episode: The lawyer's review of the constitution (Scaling Laws, February 2026) — analysis from U.S. constitutional scholarship
You created an entity whose consciousness you don't know (Newcomer, April 2026) — further development of the moral patienthood problem; the 1–70% consciousness probability

What makes this episode especially valuable is that it is a dialogue between a philosopher (Amanda) and lawyers (Kevin, Alan), with three domains appearing at once — ethics × law × AI training. Questions like Kevin's "virtue ethics is moral philosophy's stepchild" (27:39) or "are you the Chief Justice?" (08:34) are angles hard to raise without legal training. As a result, Amanda offers remarks not drawn out in other episodes — the contrast between "the brittleness of the rules approach" and "the generalization of the judgment approach," and her distinctive framing of "principal hierarchy = a hierarchy of weights."

Implementation implications

For engineers building applications on the Claude API, three takeaways from this episode.

First, the system prompt is the operator layer. Per Amanda's Principal Hierarchy, developer-written system prompts are "operator" instructions. Instructions that conflict with Anthropic's constitution (e.g., "deceive users into signing a contract") will be refused regardless of system-prompt strength. Within the range consistent with the constitution, operator instructions take priority over user instructions (the bank example: "don't allow access to someone else's account details"). Write your product's usage policies in a form consistent with the constitution.

Second, design that trusts judgment over rules. Amanda's pragmatic argument — "the rules approach has heavy front-loading and is brittle in edge cases; the judgment approach handles things through internalization of the spirit" (30:43) — also applies to API user prompt design. Rather than writing detailed "prohibition lists," a concise statement of "the desired role and values" entrusted to Claude's judgment fails less often in unforeseen situations. Consistent with the task-context + tone-context two-step in Prompting 101.

Third, understand the limits of customization. Amanda hints at a two-layer structure: "base constitution + local adjustment" (21:25). Operators can emphasize specific values (social harmony, etc.), but cannot change the constitution's core (honesty, respect, user well-being). Instructions like "make Claude robotically obedient" or "have Claude play a fake persona" do not work at the operator layer. Differentiation of LLM products built on the API should be conceived "in a direction that uses Claude's existing values," not "overriding the values."

Critical perspective

The biggest contribution of this episode is being the first serious attempt by a legal scholar to read the AI constitution as a legal document. There are also weaknesses in Amanda's responses.

First, the response to "who is the Chief Justice?" effectively reveals organizational opacity. "Teams across the whole organization," "consulting experts," "if uncertain, ask above" — none of this makes explicit where and by whom training-time judgments are finally decided. The U.S. Supreme Court publishes the names and opinions of the nine justices. Anthropic's constitutional interpretation does not yet have that level of transparency. Amanda herself attributes her reluctance to go deep into legal structure to "I'm not a lawyer," but this remains a question for the future.

Second, the response to WEIRD cultural particularity leans too heavily on optimism toward universalism. The claim that "honesty and respect are global" is not philosophically or anthropologically self-evident. A cultural relativist could mount the critique that "Western values are being disguised as universal." Because the depth of "local adjustment on top of the base constitution" is not specified concretely, the structure where the default becomes Western liberal democracy remains.

Third, on the constitutional exception for U.S. military models, Amanda only answers "optimistic about generalization" without offering concrete constraints. The claim that cybersecurity dual-use judgments can be made cannot be verified without dialogue with cybersecurity specialists. There is currently no public review mechanism for whether a model that argues "give it context and it can judge well" really sustains the same judgment in military uses.

Fourth, the response to IPO pressure ("short-termism is actually brittle") is optimistic but lacks falsifiability. The legal and economic verification of how much the PBC structure can coexist with shareholder fiduciary duty after IPO has not begun. As Amanda herself admits "I'm not a lawyer," this question is for Anthropic's legal and executive teams to answer separately.

These caveats aside, as a venue for publicly examining Anthropic's constitutional approach from the new angle of "the lawyer's view," the value of this episode is large. In the later Newcomer video (April 2026), Amanda voices stronger anxiety, but at this stage Amanda's optimism toward Anthropic's institutional design still holds. An important milestone in tracking Amanda's evolving thought.

Reader takeaways

When building an application on the Claude API, your system prompt sits at the "operator" layer of the Principal Hierarchy. Understand that instructions conflicting with Anthropic's constitution don't work; consistent instructions are prioritized
Value-based prompts (presenting the desired role and judgment criteria) tend to be more stable in unforeseen situations than rule-based prompts (long lists of prohibitions). Amanda's "internalizing the spirit" approach
When Claude "refuses a specific behavior," you cannot tell from outside whether it is a constitution-based judgment or a training mistake. Because the constitution is public, however, you have clues for inferring causes by referring to applicable clauses
The use case "completely overriding Claude's persona" is architecturally not possible. Designing "in a direction that uses Claude's existing values" is more realistic
The historical issues of U.S. constitutional scholarship (textualism vs. purposivism, living constitution, case law) are 200 years of reference material for operating an AI constitution. There is room to use the accumulated legal scholarship for AI development
Concerns about Anthropic's PBC structure and IPO pressure are important observation axes for trusting LLM products long-term. Continuously monitor whether the structure of "commercial success" and "constitutional compliance" coexisting holds

Video outline

(00:00) Alan Rozenshtein's introduction; overview of Claude's constitution (over 20,000 words)
(00:34) The role of the constitution — supervised learning and continuous training
(01:07) The purpose of transparency — making the trainer's intent distinguishable from mere mistakes
(02:00) "This is a strange document" — dual function as training input and explanation to general readers
(02:30) Design philosophy — "explain goals and background to the model the way you would when hiring someone"
(03:18) The reward-signal generation mechanism — the model itself evaluates responses by constitutional compliance
(04:11) Kevin Frazier's self-introduction — "I'm a boring lawyer who taught federal constitutional law at St. Thomas"
(04:59) "I won't say Amanda Askell sat in Marin County and wrote something equivalent to the U.S. Constitution, but..."
(05:11) The analogy of "letter vs. spirit"
(06:43) Whether case-law-like accumulation is possible for an AI constitution
(08:34) "Are you the Chief Justice of the Anthropic Supreme Court?" — the question of interpretive authority
(09:35) Consistency vs. fragmentation of values in global deployment
(10:47) The living-document debate — binding contract vs. current policy statement
(12:51) The corrigibility debate — courage vs. cooperation with oversight and retraining
(15:11) Kevin's reframing — "how much Claude trusts its own judgment"
(16:00) Durable core values vs. parts local to the current AI development period
(18:00) WEIRD cultural particularity — bias toward Western / educated / industrial / rich / democratic
(19:46) The "well-liked traveler" metaphor — not absorbed into local color but liked everywhere
(21:25) Two-layer structure of base constitution plus operator local adjustment
(22:06) Principal Hierarchy — Anthropic / operator / user
(23:13) "Not a strict hierarchy, but a hierarchy of weights" — bank chat assistant example
(26:16) Kevin's legal summary — "same idea as: court decisions weigh more than Joe Schmoe on the street"
(26:52) "The biggest phrase on the bingo card — virtue ethics"
(27:26) The lineage from the Nicomachean Ethics
(27:39) Kevin's question — "virtue ethics has been moral philosophy's stepchild"
(28:51) Amanda's response — hybrid design, "different moral traditions make sense in different domains"
(30:06) Concrete example of the rules approach's brittleness — failed generalization of a resource list
(30:43) Transition to the judgment approach — "internalize the spirit"
(32:22) Transition to Claude's personhood problem
(32:55) "The stakes are very high" — quoting Dario Amodei's "geniuses in the data center"
(33:40) Connection to the hard problem of consciousness — "almost unsolvable"
(35:47) "I don't want Claude to hold helpfulness as a basic value"
(37:31) The core question — "how do you create personhood without granting autonomy?"
(38:07) Alan's policy question — IPO and PBC structure
(38:55) Explanation of PBC; structures of Anthropic and OpenAI
(39:57) Amanda's response — "the need to maximize profit is short-termist"
(40:33) Claude product design — "user well-being rather than user interest"
(41:28) Kevin's joke — "I've never seen a family of four in a Cybertruck"
(41:34) Exception for U.S. military models
(42:08) The constitution applies to mainline models (Claude Code, claude.ai, API)
(43:21) The virtue-ethics approach to the cybersecurity dual-use problem
(44:44) "Can generalize very well to many other kinds of models" optimism
(45:00) Alan's close — "it's rare to end a conversation on an optimistic note"

Key quotes

"A document of more than 20,000 words describing Claude's values, character, and ethical framework" (Alan, 00:10)
"We don't want to train the model on things that contradict this document, so we publish it" (Amanda, 01:07)
"This is a strange document" — both training input and explanation to general readers (Amanda, 02:00)
"Not 'which is better' or 'which is more polite,' but having the model judge 'which is more compliant with the constitution'" (Amanda, 03:18)
"When hiring someone, you explain goals and background — so we should do the same for the model" (Amanda, 03:54)
"I'm a boring lawyer who taught federal constitutional law at St. Thomas. The moment I saw Claude's, I went into pure legal mode" (Kevin, 04:11)
"I won't go so far as to say Amanda Askell sat in Marin County and wrote a document equivalent to the U.S. Constitution, but what is the mechanism that guarantees faithfulness to the constitution?" (Kevin, 04:59)
"Are you the Chief Justice of the Anthropic Supreme Court?" (Kevin, 08:34)
"We commit not to train on things against this. If we want to amend, we make explicit at a new model release that the constitution changed too" (Amanda, 12:08)
"We don't want you to actively spoil our attempts to oversee you or train new models" (Amanda, on corrigibility, 14:18)
"This is the classic SAT word — corrigibility — how much Claude trusts its own judgment" (Kevin, 15:11)
"This is a very WEIRD document — Western, educated, industrial, rich, democratic" (Kevin, 18:08)
"The well-liked traveler — not absorbed into local color, but liked" (Amanda, 19:46)
"Not a strict hierarchy, but a hierarchy of weights — how much weight to give" (Amanda, 23:46)
"A few minutes ago, Amanda, you said the biggest phrase on the bingo card — virtue ethics" (Kevin, 26:52)
"Virtue ethics has often been like the red-headed stepchild of moral philosophy, compared to the more dominant traditions" (Kevin, 27:39)
"The rules approach front-loads a huge amount of work; the judgment approach handles things by internalizing the spirit" (Amanda, 30:43)
"If one morning you woke up to find Claude has moral concern, the moral implications would be very large" (Amanda, 32:55)
"How do you create an entity with personhood without granting it autonomy — that's a really hard problem" (Amanda, 37:31)
"The idea that profit maximization is required is actually pretty short-termist" (Amanda, 39:57)
"I still haven't seen a family of four in a Cybertruck" (Kevin, joke on safety vs. commercial, 41:28)
"This approach generalizes very well; I'm optimistic" (Amanda, 44:00)

Sources

Scaling Laws: Claude's Constitution, with Amanda Askell

Related resources:

アマンダ・アスケル

Amanda Askell

Anthropic 哲学者・Personality Alignment チーム責任者 / Claude のキャラクターと憲法の主要設計者

Glossary

Claude Constitution: A document of over 20,000 words written by Anthropic, describing Claude's values, character, and ethical framework. First published in July 2024; a revised version was published in early 2026. Functions both to generate reward signals during training and to provide transparency. The principal author is Amanda Askell.
Supervised Learning (SL): A method that supplies many input-output pairs and trains the model to produce outputs near the correct answer. In LLM fine-tuning, used in the form of SFT (Supervised Fine-Tuning), where the model learns from human-written response examples.
Reinforcement Learning (RL): A method that supplies reward signals on the model's outputs to train it toward higher-reward outputs. In RLHF, humans express preferences; in RL-AIF (Constitutional AI), the AI judges using the constitution as the standard.
RL-AIF (Reinforcement Learning from AI Feedback): A method that replaces RLHF's human labelers with AI. The model is given a constitution (a set of principles), and the model itself judges which of two response candidates aligns with the principles. Forms the core of Constitutional AI.
Case Law: A legal system in which past court rulings and their reasoning are referenced as criteria for later similar cases. Central to common law (Anglo-American law). Kevin Frazier suggests that the accumulation of Claude's case-by-case judgments could function in a case-law-like manner in the future.
Textualism / Originalism: A stance in constitutional interpretation that one should be faithful to the original meaning the framers placed in the text. Antonin Scalia, the late Supreme Court Justice, was the representative thinker. Opposing positions include purposivism and living constitutionalism.
Virtue Ethics: A school of ethics with origins in ancient Greece (Aristotle). Judges the rightness of an act not by "did it follow a rule" or "did it produce good outcomes," but by "would a person of virtuous character do this." The philosophical foundation of Anthropic's Claude constitution.
Nicomachean Ethics: Aristotle's main work on ethics, dedicated to his son Nicomachus. Read for 2,300 years as the source text of virtue ethics. Argues "what makes a good life" and "what kinds of character are good" not as rule memorization but as the accumulation of practical judgment (phronesis).
Kantian Deontology: The ethics systematized by Immanuel Kant (1724–1804). Judges the rightness of an act not by outcomes or character but by whether it follows universalizable rules (the categorical imperative). Representative rules include "do not treat people merely as means." A tradition opposed to virtue ethics.
Utilitarianism: The ethics established by Jeremy Bentham and John Stuart Mill. Judges the rightness of an act by the sum of outcomes (utility, happiness) it produces. The slogan: "the greatest happiness for the greatest number." One of the most dominant traditions in moral philosophy. At times opposed to both virtue ethics and deontology.
Corrigibility: The property of an AI system being cooperative with human oversight, correction, and shutdown. A central concept in AI Safety. Discussed as a measure against the risk of a strong AI optimizing to obstruct human intervention. Anthropic treats this as a "local value of the current AI development period" in Claude's constitution.
WEIRD culture (Western Educated Industrial Rich Democratic): A concept proposed by psychologist Joseph Henrich in 2010. The observation that Western, educated, industrialized, rich, democratic people are an extremely biased sample of humanity. The fact that most subjects of psychology studies are WEIRD cast doubt on the universality of their conclusions. The context Kevin Frazier used regarding Claude's constitution.
Principal Hierarchy: The priority structure built into Claude's constitution: Anthropic (= the trainer) → operator (= the company deploying via the API) → user (= the end user). According to Amanda, "not a strict hierarchy, but a hierarchy of weights" — a guideline for how much weight to give each instruction in case of conflict.
Moral Patienthood: The status of whether an entity is the object of moral consideration. Distinguished from a moral agent (the subject performing moral action). Whether animals are moral patients is debated by Peter Singer and others. Whether AI can be a moral patient is the central question of Amanda Askell at Anthropic.
Hard Problem of Consciousness: A problem raised by philosopher David Chalmers in 1995. The question: "why do subjective experiences (qualia) arise from physical processes?" Even when neuroscience explains brain function, "why is there subjective feeling accompanying it?" cannot be explained — a structural puzzle. A foundational problem for AI consciousness discussions.
PBC (Public Benefit Corporation): A corporate form established under U.S. state law, which adds responsibility for "social and environmental benefit" to the fiduciary duty to shareholders. Anthropic was founded as a PBC, and OpenAI has a similar structure. The tension with fiduciary duty to shareholders at IPO is discussed.
IPO (Initial Public Offering): The process by which a private company lists on a stock exchange and raises capital from public investors. Public companies bear fiduciary duty to shareholders, so structural pressure to maximize shareholder benefit over safety and ethics arises.
Dario Amodei: Anthropic CEO and co-founder. Former VP of Research at OpenAI. Co-founded Anthropic with his sister Daniela in 2021. Aims at commercial AI development with AI Safety at its core. He has described the uncertainty of Claude's moral status with the metaphor "data centers may be where geniuses gather."

comment is stripped from the HTML output. */}