Guillaume Vernade · 06:51 "Across DeepMind, we ship something new on average every five days. Look at GenMedia alone and it's more than one release a month. Some weeks we ship multiple times."
Guillaume Vernade's AI Engineer Europe 2026 Day 1 workshop is a rare opportunity to survey Google DeepMind's full set of GenMedia models in a single workflow. Normally the Nano Banana, VO, and Lyria announcements happen separately, and developers learn each model in isolation. This workshop integrates four models into a single story (illustrating a book), demoing the combined workflow while opening up the design philosophy behind the GenMedia API and the strategy inside the organization.
From the MEMEX editorial vantage, what matters is that this sits alongside Anthropic's B2C / enterprise strategy and OpenAI's Codex / Agent strategy as Google DeepMind's AI product strategy as a whole — spoken directly by a Developer Advocate from inside the frontier lab. It supplies the DeepMind voice that has been missing from MEMEX so far, a key node for reading the strategic divergence among the top three frontier labs.
"One ship every five days" — what the release cadence means strategically
The most striking number in the talk slips out during the self-introduction. "Across DeepMind, we ship something new on average every five days. Look at GenMedia alone and it's more than one release a month. Counting small features, some weeks we ship multiple times." It's one of the most concrete release-cadence numbers a frontier lab has put on the record.
Behind this pace lies DeepMind's world model vision. Long before Yann LeCun's new company began promoting "world models" in public, DeepMind was already pursuing the same vision. The endpoint is "one model that takes every modality as input and produces every modality as output," while in practice specific models (Nano Banana for images, VO for video, Lyria for music, Gemini for text and multimodal) ship separately, with the same research investment circulating underneath. "For releases, specific models are easier to handle, and you avoid the risk of breaking something every time you update the main model" — the reason for keeping a multi-model architecture is articulated explicitly.
The fight to unify the API — until the Imagen brand disappeared
A concrete example of what Guillaume "fought for internally" as a Developer Advocate: unifying the Imagen / Nano Banana API. "From a normal developer's perspective, each model having its own API set makes no sense. It should be enough to swap the model name." He kept arguing this for a long stretch, and eventually won by "default" — the Imagen brand disappeared (folded into Nano Banana).
This points to the weight of the Developer Advocate role inside DeepMind. A position that mediates "common sense" between the research side that builds the models and the developers who actually use them carries authority that can move official brand strategy. Compared to Anthropic's DevRel (Christian Ryan, Erik Schluntz, and others) or OpenAI's Developer Relations structure, the differences in organizational design across the top three frontier labs become an interesting observation point.
Gemini 1.0 to 1.5 — the "multimodal got pulled" episode
A behind-the-scenes story Guillaume shares about Gemini's history. Gemini 1.0 was originally meant to ship as a multimodal model (every DeepMind model has been built multimodal-first from the start). Testing wasn't ready in time for launch, so 1.1 had multimodal input removed. It came back in 1.5, but residual training from the 1.0 era — "I can't deal with images" — would still surface in 1.5 from time to time. Only with 2.0 was it fully resolved.
This insight points to a structural fact about frontier models: feature releases cannot be separated from the assumptions baked into training. Surface-level capabilities can be changed, but the self-understanding burned into the base — "what I can and cannot do" — is hard to fully overwrite later. A technical observation that connects directly to the context of Amanda Askell's AI Personality discussion and Constitutional AI.
The four GenMedia models today — as of April 2026
The current model lineup Guillaume organized for the workshop:
| Model | Capability | Pricing / notes |
|---|---|---|
| Nano Banana 2 (Gemini 3.1 Flash Image) | Image generation | 520px to 4K, search grounding plus image grounding |
| VO 3.1 / 3.1 Lite | Video generation (image to video, with audio) | Lite at 5 cents/second (40 cents for an 8-second clip) |
| Lyria | Music generation (30-second clip / 3-minute full song) | 4 cents per clip / 8 cents per full song |
| Lyria Real-Time | Live music generation (predict model) | Real-time mixing by swapping prompts |
Lyria Real-Time is Guillaume's personal favorite. Because it's a predict model, not a diffusion model, the design isn't "give a prompt and get something back" — it generates continuously, and you swap prompts mid-stream to mix it DJ-style. It's the model in GenMedia with the most unusual interaction pattern, and many developers haven't noticed it yet.
The heart of the workshop — Gemini writes prompts, GenMedia draws, in a loop
The hands-on demo illustrates Kenneth Grahame's "The Wind in the Willows" (1908, from Project Gutenberg) using GenMedia. The structure:
- Feed the whole book into Gemini — File Upload API plus chat mode, keeping the full context
- Gemini generates character prompts — portrait prompts for the major characters (mole, water rat, toad, badger), as structured output
- Nano Banana 2 generates the character images — global style instructions like "colorful building-block style" hold the look together
- Gemini generates chapter prompts — an illustration prompt for each chapter, plus the list of characters appearing in it
- Nano Banana 2 generates chapter images — passing in the relevant character images as references
- VO turns each chapter image into video — Gemini writes a separate video prompt ("what happens a few seconds after this image"), the image goes in as the first frame to VO 3.1
- Lyria generates background music for each chapter — Gemini writes an instrumental song prompt, and a 30-second clip is produced per chapter
The most interesting insight comes mid-demo. "Much of the training data for the GenMedia models is prompts written by Gemini. That's why Gemini is so good at writing prompts for GenMedia." This is a direct view into the circular structure of model development inside Google DeepMind — Gemini generates training prompts for GenMedia, GenMedia generates images, video, and music, and users have Gemini write prompts to drive GenMedia. The whole family is optimized as a single ecosystem.
The Interactions API — from stateless to stateful
Mid-workshop, Guillaume introduces what he calls a "release from a few months ago": the Interactions API, a pivot in how GenMedia is used. The earlier API was stateless, requiring the full context (such as the entire text of the book) to be resent every turn — heavy on both cost and latency.
The new API offers three improvements: (a) server-side context held against an interactions ID, (b) automatic caching, (c) easy fork of the discussion (branching from a single context in multiple directions — for example, generating a song and a cover image in parallel from the same source). It's currently in preview, but Guillaume hints it's likely to become "the default API by Google I/O 2026." A structural shift in the GenMedia API, and infrastructurally significant for developers.
Service Tier — three SLA levels
During the workshop demo Guillaume introduces "something we shipped yesterday": Service Tier Priority. Three levels:
- Normal — standard price, standard queue
- Flex — 50% discount, tolerating up to several minutes of delay
- Priority — 2x price, fast-track guarantee
An API-side echo of AWS's Spot / On-Demand / Reserved structure. Developers can choose a cost-latency trade-off per request. It indicates that frontier-lab API design has reached cloud-infrastructure maturity.
Editorial reading — where Google DeepMind's strategy sits in the MEMEX map
Three angles for taking this workshop into MEMEX.
(1) Strategic divergence among the top three frontier labs. Anthropic integrates B2C (Cowork), developers (Claude Code), and enterprise ($350B valuation) on a single platform (see the Claude Cowork explainer and the Anthropic strategy piece). OpenAI splits Codex, Agent, and ChatGPT and casts a wide net. Google DeepMind runs a scale strategy of "unified multi-modality plus one ship every five days." Even within the same frontier competition, implementation philosophies differ.
(2) The structuring of the GenMedia market. From a situation in which image generation (OpenAI DALL-E, Stable Diffusion, Black Forest Labs FLUX, Google Nano Banana), video generation (Sora, Runway, Pika, VO), and music generation (Suno, Udio, Lyria) competed as separate markets, Google DeepMind's move to integrate four modalities under one vendor maps onto "the competition over multimodal integration" — a thread that connects to Black Forest Labs (FLUX) and the Open Research strategy and to Roboflow's Transformers ate Vision discussion.
(3) The loop in which Gemini drives GenMedia training. The insider note that "much of the training data is written by Gemini" is a working example of self-bootstrapping in frontier model development. Read as the Google DeepMind implementation of the "AI driving AI" pattern that appears in Anthropic's Skills discussion and Karpathy's Software 3.0. Across the frontier labs, the direction is converging — fast iteration of AI by AI.
Video outline (highlights only)
- (00:00) Self-introduction, definition of the Developer Advocate role
- (03:00) The Imagen / Nano Banana API unification fight (internal battle)
- (04:30) DeepMind's world model vision
- (05:40) Gemini 1.0 to 1.5 and the "multimodal got pulled" episode
- (06:51) "One ship every five days" — DeepMind-wide release cadence
- (07:30) The latest models — Nano Banana 2, VO 3.1, Lyria, Lyria Real-Time
- (10:00) Workshop kickoff — illustrating Kenneth Grahame's book
- (15:00) Gemini File Upload plus chat mode to hold the full book in context
- (20:00) Structured output to generate character prompts
- (23:00) Nano Banana 2 generates the character images
- (28:00) Chapter image generation (passing in character references)
- (35:00) The Interactions API explained (stateful)
- (38:00) Service Tier (Normal / Flex / Priority) — shipped yesterday
- (45:00) VO 3.1 animates the chapter images
- (52:00) The "wrong character speaking" issue and prompt fixes (having Gemini generate the video prompt separately)
- (60:00) Lyria generates the chapter BGM — both instrumental and lyrics variants
- (68:00) Insider note that Gemini writes much of the GenMedia training data
- (75:00) Closing and Q&A