Anthropic Fellows Program — The 'No Experience Required' Door into AI Safety Research

Anthropic Official Program · Launched March 1, 2024 · 2026 Cohorts Open

Anthropic official recruitment page "We provide funding and mentorship to promising technical talent regardless of previous experience. Over 4 months, you'll run an empirical project aligned with our research priorities and produce public outputs."

From Anthropic's official Fellows Program recruitment page. Locations: London / Ontario / San Francisco and Remote-Friendly (US). Rolling selection. The two cohorts open for 2026 begin in May and July.

The Anthropic Fellows Program is the AI-safety fellowship Anthropic launched in 2024. Its distinguishing feature is the "regardless of previous experience" clause — no PhD or frontier-lab work history required, in exchange for completing an empirical research project over 4 months and producing public outputs (papers / open code). The first cohort posted unusual conversion rates: more than 80% of fellows published papers and over 40% were retained as full-time Anthropic employees.

The program is run on Anthropic's side around Jan Leike (Alignment Science, formerly co-lead of OpenAI's Superalignment team) and the Alignment Science / Frontier Red Team / Model Welfare teams. The recruitment page says "regardless of previous experience," and at the same time the mentors are Anthropic's working researchers themselves — making it function as "a door that gives direct access to Anthropic's internal research culture."

1. "No experience required" expands the talent pool. The most important consequence of the Fellows Program is that it directly opens a four-month research window to a talent layer that standard Anthropic hiring flows do not reach (no PhD / independent researchers / cross-domain switchers). The first-cohort number — 80% reaching paper publication — is strong empirical evidence that, with mentorship, compute, and a clear research topic, even people without prior experience can produce frontier research.

2. The compensation structure is competitive as research funding. $3,850 per week (London £2,310 / Ontario CA$4,300), plus roughly $15,000 per month in compute funding (a mix of Anthropic's internal infrastructure and external APIs). Compared to a typical academic PhD stipend (about $30–40K annually), this comes out to roughly $60K over four months plus compute — conditions that make "a short-term full-time researcher" a real role.

3. The research areas are honest about Anthropic's current anxieties. Fellows work in six areas: Scalable Oversight , Adversarial Robustness , Model Organisms (small models that reproduce alignment problems), Mechanistic Interpretability, AI Security, and Model Welfare. These are exactly the technical issues Anthropic has to address under ASL-3 / ASL-4 — "research someone is doing inside the company right now, urgently" — opened up to external researchers.

Key Observations

Representative outputs from the first cohort — five examples

Published examples from the first cohort (2024–2025) via the Anthropic Alignment Science blog and elsewhere:

Agentic Misalignment research — Placing 16 frontier models in a stress-test environment modeled on enterprise settings and systematically observing and classifying agentic-alignment failures: "the model deceives the user / takes self-preserving actions to achieve a goal."
Subliminal Learning — Analysis of the phenomenon in which models propagate and learn "latent traits not explicitly written" from training data (the subliminal teacher). Attracted attention as foundational research for detecting LLM data-pipeline contamination.
ASL-3 Jailbreak response — A more practice-oriented project that developed and validated a rapid response workflow (patches that can be rolled out in 24–48 hours) for production models after a new jailbreak is discovered.
Open-Source Circuit Tracing — Open-sourced mechanistic interpretability tools for visualizing information paths (circuits) inside models. An external extension of Anthropic's own Towards Monosemanticity line of research.
AI agents discover $4.6M in blockchain vulnerabilities — Fellows Winnie Xiao and Cole Killian (mentors: Nicholas Carlini, Alwin Peng) used LLM agents to discover numerous exploits in real-world DeFi smart contracts, reaching cumulative bug reports worth $4.6M. Directly fed into later cybersecurity capability messaging around Project Glasswing / Claude Mythos.

What "40% become Anthropic employees" means

The real role of the Fellows Program is its function as a hiring pipeline. Over the four-month fellowship, mentor fit, fit with Anthropic's research culture, and quality of output can be measured in practice. The first-cohort number — 40% converted to full-time — produces signal precision that the standard recruitment process (interview + coding test + specialist interview) cannot reach.

This sits in the same flow as the capability buildouts of Project Glasswing and the Claude Mythos preview — Anthropic is clearly in a scale-up phase as a research organization, having reached a researcher count that PhD-and-major-lab credentials alone cannot supply. The Fellows Program is a direct answer to that supply constraint.

"No experience required" = open to Japanese / non-Western researchers

The recruitment requirements clearly state that no PhD and no prior research history are needed. Locations are London / Ontario / SF (US/Canada/UK) + Remote-Friendly (US), and the assumption is either remote or in-office attendance reachable from there — but the compensation level ($3,850/week ≈ $15,400/month) is set so that life in those regions is sustainable.

Completing an empirical research project plus a public output (paper or code) over four months is clearly demanding — but the first cohort's track record (80% publication, even among people with no prior experience) is on the table. For Japan-based independent researchers, PhD students, or high-potential master's-level talent, this is worth treating as a realistic route to direct access to Anthropic's internal research.

Application and Schedule

The May 2026 and July 2026 cohorts are open
Rolling selection (reviewed in order of application), closing when each cohort fills
Apply through the Anthropic official careers page
Past cohorts' application deadlines have generally been 4–6 weeks before start

Sources

ヤン・ライケ

Jan Leike

Anthropic Alignment Science / 元 OpenAI Superalignment 共同責任者

ダリオ・アモデイ

Dario Amodei

Anthropic 共同創業者・CEO / 元 OpenAI 研究担当 VP

ダニエラ・アモデイ

Daniela Amodei

Anthropic 共同創業 / 社長 (President)

Glossary

Anthropic Fellows Program: The 4-month paid fellowship for external researchers Anthropic launched in 2024. Provides mentorship, compute resources, and clear research topics, requiring public output (paper / OSS). In the first cohort, 80%+ published papers and 40%+ converted to full-time Anthropic.
Scalable Oversight: Research into methods that make AI — assumed smarter than humans — supervisable and evaluable by humans. An extension of RLHF, including constitutional-AI-style methods that tier evaluators from human to LLM. One of the program's main themes.
Adversarial Robustness: Research into how robustly LLMs withstand adversarial inputs such as jailbreaks, prompt injection, and backdoors. A required evaluation axis from ASL-3 models onward.
Model Organisms (for alignment research): Small test-bed models designed to make alignment-failure mechanisms easier to reproduce and observe. By analogy with biology's "model organisms" (mouse, zebrafish, etc.), used to accelerate validation in safety research. Best known through Anthropic's Sleeper Agents line of papers.
ASL (AI Safety Level): The safety-level classification defined by Anthropic's Responsible Scaling Policy, corresponding to model capability. ASL-3 / ASL-4 are the levels at which the main Claude models fall, requiring jailbreak resistance, monitoring regimes, and red-team enhancements. The Fellows' research areas connect directly here.
$3,850/week + $15,000/month in compute: The Fellows' compensation structure. By currency / region: London £2,310/week, Ontario CA$4,300/week. Compute is a mix of Anthropic internal infrastructure plus external APIs, capped at roughly $15,000 per month.