Niklas Gustavsson · 02:55 "By far, the majority of PRs we're shipping today are co-authored by an AI agent and a developer. Coding is no longer the bottleneck."
Niklas Gustavsson's Code w/ Claude London 2026 talk is the first public report that answers "what happens in a large-enterprise engineering organization once AI coding tool adoption has run a full cycle" with six months of data. In the context where Anthropic's Boris Cherny laid out the overall Claude Code roadmap at Code w/ Claude, the customer side (Spotify) plays the role of corroborating "the actual reality after adoption."
What matters from a MEMEX editorial point of view: the talk presents not an introduction to individual tools but an organizational thesis An argument as organizational theory. Niklas presents Spotify's AI transition not as 'a tools-adoption story' but as a structural argument: 'the infrastructure (FleetShift, Backstage, standardization) we built in the pre-AI era produces secondary effects in the AI era.' This includes the implicit claim that other companies trying to copy Spotify cannot reproduce the result by copying only the tools. . "FleetShift, Backstage, and the standardization philosophy were all built in the pre-AI era. That's why, when AI came, we could put Honk on top of it" — the temporal implication is an implicit warning to latecomers. Alongside Intercom's Claude Code 2x and PFF's post-engineer org, this is a core data point in the "AI engineering organization, one year after adoption" series.
The adoption curve — Opus 4.5 was the line
The fact Niklas opens with: "We're always rolling out internal tools, but we've never seen an adoption curve like the one we've seen with AI coding tools." On the graph, Claude Code (orange) draws a vertical ascent compared with the other tools. The inflection point is the November 2025 release of Opus 4.5. From there, not just Claude but AI tool use overall became, in Niklas's word, "completely bananas."
Internal numbers six months later, as of May 2026:
- Over 99% of engineers (the figure Niklas presents) use AI coding tools weekly
- 94% of engineers self-report productivity improvements from AI tooling (a record high)
- +76% PR-frequency increase (Niklas notes he kept rewriting the number during talk preparation because it kept rising)
- In Niklas's phrase, "by far the majority" of PRs are co-authored by an AI agent and a developer (not just auto-generated — including developer-agent collaboration)
Niklas points out that the PR-frequency growth curve also ascended sharply at the Opus 4.5 release.
FleetShift — a precondition built in the pre-AI era
The first reason Niklas cites for Spotify's smooth AI transition is groundwork laid before AI. Several years ago, Spotify noticed that the production codebase was growing seven times faster than the engineer count — for every engineer hired, the maintenance load equivalent to seven engineers accrued on top of the existing codebase. As a result, engineers ended up pulled into existing-codebase maintenance instead of building new features.
FleetShift A pre-AI in-house fleet-management system at Spotify. It abandoned the traditional method of sending migration paths to hundreds of teams and having them update manually, replacing it with the 'mutate the entire fleet' notion: from a single shift definition, it auto-generates PRs against thousands of components, verifies them in CI, and auto-merges. Simple changes like dependency bumps and API deprecations are almost fully automated. As of May 2026, it has merged a cumulative 2.5 million automated maintenance PRs. : Spotify abandoned "per-component manual migration" and built a system from the notion "mutate the entire fleet" — closer to replacing parts in a fleet of 100 cars by rewriting one line in the factory line's configuration so all 100 update at once, rather than swapping parts one car at a time by hand. From a single shift definition it generates PRs against thousands of components, runs them through CI, and auto-merges those judged safe. A cumulative 2.5 million automated maintenance PRs merged.
"This was pre-AI," Niklas emphasizes. For simple changes (dependency bumps, configuration updates), FleetShift's deterministic scripts worked decisively. But for complex changes like API replacements, the script needed to cover every corner case — and via Hyrum's Law An empirical rule formulated by Google's Hyrum Wright. Summarized as: 'with enough users of an API, every behavior — even ones not in the contract — becomes depended on by someone.' As a result, the API provider, in trying to maintain backward compatibility, must consider even undocumented side effects, and migration / deprecation costs become far larger than expected. Frequently invoked when executing API deprecation in large OSS or internal monorepos. — the migration script grew unrealistically large.
Honk — a harness wrapping the Claude Agent SDK inside a pod
After LLMs arrived, Spotify iterated repeatedly in the direction of "what if we let an LLM do the code modification instead of deterministic scripts?" "At first the models were too stupid, and the way we were using them was too stupid too," but with model improvements and trial and error, Honk was born.
The Honk architecture, in the range Niklas discloses:
- The core is the Claude Agent SDK
- Spotify's own harness wraps the SDK
- Scheduled in the cloud as Kubernetes pods, with many Honk jobs running in parallel
- A set of trusted tools provided to the agent (only verification tools are shown in the slide, but there are many others)
- Verification runs real code builds in Spotify's CI environment (multi-OS, since the client runs on multiple OSes)
The relationship between Honk and FleetShift: FleetShift handles orchestration, Honk handles per-component code modification — a two-layer structure. From the team-level UI, you can see "for the current shift, how many PRs created, how many merged, how many CI failures."
The concrete result Niklas cites: a Java migration completed in three days. The Spotify backend runs on the JVM in a 40-million-line monorepo (equivalent to about 800,000 pages as a single book). Work that previously meant distributing migration paths to hundreds of teams and taking "weeks to months" was completed in three days by a single engineer. This can be read as a production-scale implementation of Tejas Kumar's harness systematization.
Honk V2 — from Slack invocation to a multiplayer agent
Niklas reveals he released Honk V2 alpha the day before the talk (during hack week). "We call it v2 but it's actually the 8th revision, so the versioning is loose" — joking aside, he presents major feature expansions.
The trigger was organic user behavior: the fact that engineers wanting to "use Honk for things beyond migration" began calling Honk via Slack themselves. A pattern spread internally: mention @honk in a Slack conversation, Honk works in the background and returns a PR.
Honk V2 redesigns this user pattern as first-class:
- Chirp An agent orchestration tool Spotify released alongside Honk V2. Functionally overlaps in part with Anthropic's Claude Agents and AgentDeck, but its differentiator is deep integration with Spotify's internal infrastructure (Backstage, FleetShift, Honk). Acts as a control plane that launches many agent sessions in parallel and coordinates their state. — an agent orchestration tool. Close to Claude Agents or AgentDeck, but deeply integrated with Spotify infrastructure. Coordinates many agent sessions in parallel
- Schedule Honk jobs via Chirp
- Multiplayer collaboration — the feature Niklas describes as "Google Docs for Claude." Multiple developers can join a single agent session and share feedback / ideas in real time
- Project-level upper structure — create a project per new feature or product, hold multiple Honk sessions under it, and have the team collaborate toward a shared goal
- Any-device support — not tied to the desktop, talk to the agent from anywhere
Niklas says the multiplayer feature is what excites him most personally. "We're entering a stage where we have to re-imagine how an agent collaborates with multiple developers and teams." This points to evolution beyond the solo model in Anthropic's long-running agent workshop — "a person babysitting an agent" — toward a multi-human + agent collaborative model.
Backstage and standardization — the soil that makes agents effective
In the second half, Niklas digs into another pre-AI investment, Backstage. Spotify built it as a developer portal and open-sourced it; internally it is run in production.
The origin is simple. Before Backstage, Spotify had about 100 different tools developers touched, with no clear sense of "which is deployment, which is CI, which is A/B testing." "All of those tools were kind of shit as well," so an integrating portal became necessary.
The first feature of Backstage was simply "a catalog where you can look up the owner of any software component." Starting from the need to page the owner team during an incident, a component → owner mapping was the entry point. Over several years, many tools were integrated on top of the catalog, and it grew into "the central convergence point for human developers when they take action."
The secondary effect of Backstage in the AI era: every action is now exposed to agents too, via MCP / CLI. Claude can look up an owner and even ping the team on Slack when needed. "Backstage was for human developers, but it works as an agent portal as-is."
Move fast with fewer technologies — standardization helps agent performance
The creed Niklas has lived by for 15 years: "the fewer technologies you use, the faster you can move." To implement this, Spotify has:
- Technology Radar — a list of available technologies with recommended status (recommended / not recommended)
- Golden State — a recommended stack per component type ("for this kind of backend service, this stack")
- Soundcheck UI — a UI for self-assessing how closely your team's components conform to the golden state (e.g., is a valid owner defined?)
- Static analysis + linting — lint checks that implement the above rules are built into the codebase, returning violations as immediate feedback
The unexpected effect in the AI era: "When Claude looks at the surrounding code and imitates it, Claude does a better job if the code is consistent." Conversely, in a fragmented codebase the quality of Claude's output drops — observed in internal data. "Even when Claude generates a non-optimal gRPC call, the linter returns the correct version immediately" — standardization and lint function as training wheels for the agent.
The shift of the bottleneck — from coding to product decisions
The structural prediction Niklas offers at the end: "the era of coding being the bottleneck is ending." A dramatic change has already occurred at Spotify in prototyping.
The old prototype: "days to weeks. You had to persuade an engineer to make time" → after Claude + skills are in place: "literally minutes. Anyone can prompt Claude inside the production codebase to generate a prototype, distribute it as an installable app, and put it into internal validation." Niklas reveals that "one of Spotify's CEOs is making prototypes this way too."
The structure this change reveals: when the constraint of coding loosens, the bottleneck moves to human decision-making — what to ship, which ideas to explore — product judgments. In an era constrained by coding capacity, these were judgments you did not have to make as often; "our product development six months from now should look meaningfully different from today," Niklas predicts.
A side effect: a 76% PR-frequency increase is also a 76% PR-review-load increase. "Right now the most common complaint is 'too many freaking PRs to review.'" Spotify has already started, in some areas, operating with "PRs judged safe enough get auto-merged without human review," and the redesign of "where to concentrate human judgment" is ongoing.
Editorial Observations — as MEMEX's industry axis
Three lenses for covering Niklas's talk on MEMEX.
(1) Highly valuable as the first public data point for "one year after AI adoption." Many companies stop at "we adopted Claude Code"; Spotify has published the concrete numbers — 99% adoption + 76% PR increase + Java migration completed in 3 days. Alongside Intercom's Claude Code 2x, this is one of the few cases of "quantitative outcome reporting."
(2) The structural argument that pre-AI infrastructure investment produces secondary effects in the AI era. To say it plainly up front: this is not an introduction to the individual tool Honk — it is the question of whether the soil that can support Honk exists. FleetShift (fleet management), Backstage (catalog), technology standardization, company-wide lint — these were built before AI spread. Niklas's implicit claim: "Copying only Honk won't reproduce Spotify's results. You need the soil." This claim sits above the "individual technique" layer of Tejas Kumar's six harness elements or Pedro Rodrigues's skill design, depicting an "organizational capability" layer.
(3) The structural prediction of bottleneck displacement. When you think about it, the end of "the era when coding was the bottleneck" shakes the fundamental premises of an engineering organization. "Coding constraint → product-decision constraint" is not just Spotify's story but a hypothesis about the industry overall. If correct, PdM / designer / executive-judgment importance rises relatively, and the engineer's role shifts from "writing" to "deciding / verifying." Spotify's data sits on the same "redefinition of the engineer role" industry axis MEMEX observes — alongside 10 Downing Street's Insurgency Model and PFF's post-engineer org.
The reason MEMEX covers Niklas's talk is the larger question that ties (1)(2)(3) together — "after AI removes the constraint, where does engineering expertise remain?" Niklas answers, "verification and engineering practice will not fade," but that is a hypothesis, and a question MEMEX records in the archive to compare against data from other companies six months and one year out.
Video Outline
- (00:00) Introduction, Spotify's scale (about 3,000 engineers = the scale of one university faculty, 4,500 deployments a day, 40M LOC backend monorepo + several thousand polyrepos)
- (01:11) The AI adoption curve — explodes at Opus 4.5; 99%+ use AI tools weekly
- (02:08) 94% self-report productivity improvements, a record high
- (02:30) PR frequency +76%, the majority co-authored by AI agent + developer
- (03:23) Pre-AI: the problem of the codebase growing 7x faster than engineers
- (05:13) How FleetShift works, 2.5 million PRs auto-merged cumulatively
- (06:00) Sufficient for simple changes; complex changes blocked by Hyrum's Law
- (07:10) Tried code modification with LLMs from early on (pre-Claude)
- (07:44) The birth of Honk — silly name and icon, serious utility
- (08:17) Honk architecture — Claude Agent SDK + own harness + Kubernetes pods + verification tools
- (10:15) The case of a Java migration completed in 3 days
- (10:30) Available commercially via Backstage Developer Portal Premium
- (11:00) Slack-mediated Honk invocation arises organically
- (11:30) Honk V2 alpha release — Chirp, multiplayer, project, any device
- (13:30) Spotify's standardization creed "fewer technologies, faster" for 15 years
- (15:13) Observation that standardization helps agent performance too
- (16:11) The origin of Backstage — started as a catalog integrating ~100 internal tools
- (17:55) Agents also use Backstage via MCP / CLI for owner lookup and Slack notifications
- (18:11) Standardization via Technology Radar, Golden State, Soundcheck UI
- (19:39) Lint / static analysis function as immediate feedback to the agent
- (20:31) Summary — engineering practice will not fade; verification matters
- (22:06) Redesigning where to place human judgment — the side effect of a 76% PR-review load increase
- (23:19) Bottleneck displacement — from coding to product decisions
- (24:50) Prototypes shrink from days/weeks → minutes; even the CEO prototypes
- (26:00) Prediction: "product development six months from now will look meaningfully different"
- (27:00) Closing, mention of the FleetShift / Honk commercial offering
Related Resources
- Boris Cherny (Anthropic) on the overall Claude Code roadmap
- Intercom's Claude Code 2x adoption report
- PFF's post-engineer organization
- Tejas Kumar (IBM) on the 6-element harness system
- Anthropic's long-running agent workshop
- Pedro Rodrigues (Supabase) on the 3 principles of skill design
- 10 Downing Street's Insurgency Model
- People profile: Niklas Gustavsson
Sources
- Coding is no longer the constraint: Scaling devex to teams and agents at Spotify (Code w/ Claude London 2026)
- Code w/ Claude London official session page
- Spotify cuts migration time by 90% with Claude Agent SDK (Anthropic customer story)
- Background Coding Agents: Honk Part 4 (Spotify Engineering blog)
- Kelsey Hightower x Niklas Gustavsson on Fleet Management (2023)
- Backstage (official)