Give Your Chat Agent a Voice — Luke Harries / ElevenLabs (AI Engineer Europe)

AI Engineer Europe · May 9, 2026

Luke Harries · 06:43 A prediction. These chat agents will die.

AI Engineer channel (published May 7, 2026, around 8 minutes). A lightning talk at AI Engineer Europe 2026 in London (April 8–10).

"The home screen has become a chat interface" — SEO tweets from Linear and PostHog, GovUK's chat-agent policy, various other implementations all show that over the past year, chat has become the default entry point for talking to AI. On top of that, Luke's prediction is simple and provocative: chat agents will die, and the choice is to either move to voice or quietly disappear — an 8-minute lightning talk.

The presenter is Luke Harries — Growth + Engineering at ElevenLabs. A cross-disciplinary career: Cambridge pre-med → reinforcement learning at Microsoft Research → co-founder of Y Combinator-backed Fella Health → interim head of product at PostHog. When ElevenLabs CEO Mati Staniszewski first reached out, he questioned the go-to-market and passed on the investment; six months later, after the company hit 1 million users and a $3.3B valuation, he joined.

Why voice. Luke offers several reasons. (1) Fast and interactive. (2) Accessibility for users who struggle with keyboards or reading (dyslexia). (3) Omnichannel — an AI agent that joins a Zoom call and corrects a wrong statistic in real time; customer support that uses the existing phone line as-is; voice can be layered on top of existing interaction designs. He summarizes: "what we should ultimately do is upgrade all these chat agents into voice agents."

But on the ground, there was a different problem, Luke says. ElevenLabs started in TTS, and has since built a full-stack voice agent platform with enterprise customers like Revolut customer support. Most of those customers already have chat agents and have invested heavily in evaluation and transcript curation. "Start over from scratch? For what?" — that resistance became the adoption barrier. ElevenLabs' answer this time is to wrap the existing chat agent with a new primitive called Voice Engine — announcing a research preview to launch within the next few weeks.

Key observations

The design decision to carve out a "voice engine" as a primitive (02:46)

Until now, the ElevenLabs platform packaged LLM + RAG + tool calls + STT + TTS together. The new Voice Engine is a structural change — the "voice engine" portion has been pulled out into a "first-class primitive." The Server SDK works by adding a loop to an existing chat agent: "create a voice engine → attach a wrapper → proxy on each new session." The internals are ElevenLabs' best models — Scribe for STT, V3 for TTS, an advanced turn-taking layer that recognizes emotion and context. The design's elegance is that the customer gets to choose the granularity — the full package, or a thin layer on top of what already exists.

3-line Client SDK + ShadCN/Vercel-style UI (04:15)

The Client SDK that pairs with the Server SDK puts a voice widget on the site with three lines of code. It also ships with UI components matched to ShadCN and Vercel's style, so a coding agent (Claude Code, etc.) can be told "use ElevenLabs components" and produce a prototype. The intent to invest in developer experience is explicit. Luke's message — "we genuinely care about the people in this room (= developers)" — shows up directly in the structure of the implementation demo.

"Convert to a voice agent in a single prompt" (04:48)

What Luke demoed live: an existing chat support agent, with a single Claude Code prompt of "convert to a voice agent," returns code already integrated with Voice Engine in seconds and runs locally. He notes that Skills (Anthropic) will also be bundled at launch — the intent is to automate everything down to "analyze the codebase → detect the chat agent → propose how to deploy and how to wrap it." The strategy of cutting the "tedious integration work" to lower the entry barrier is clear.

Leave tool calls to the existing agent, but provide DOM-direct tools too (07:01)

A point that came up in Q&A. Existing chat agents normally handle tool calls in the backend, so the Voice Engine wrapper only needs to transparently proxy them. That said, ElevenLabs also has its own client-side / server-side tool concepts, allowing usage like "expose a frontend tool that directly manipulates the DOM" on the spot. A two-tier design that respects the existing flow but lets you go deeper when needed.

Video outline

(00:00) Giving chat agents a voice; 2025 was the year of chat
(00:30) Linear / PostHog home screens = chat interfaces; the GovUK policy
(01:00) Why voice — a natural medium, speed, accessibility, omnichannel
(01:24) Implementation scenarios — joining Zoom calls, customer support phone lines
(01:39) The theme: upgrade chat agents into voice agents
(01:50) ElevenLabs history — starting in TTS, expanding with Revolut and others into full voice agents
(02:23) The customer voice on the ground — "we already have an agent; is it worth starting over?"
(02:46) Voice Engine primitive design — wrapping the existing agent
(03:04) Inside the voice engine — Scribe (STT) + V3 (TTS) + advanced turn-taking
(03:42) Server SDK structure — generate the client → Voice Engine → attach the wrapper to the existing agent
(04:15) 3 lines of Client SDK for the widget; Telephony / CSAT bundled
(04:35) ShadCN / Vercel-style UI components
(04:48) The "one prompt converts to a voice agent" demo
(05:31) Walkthrough of the generated code — attach Voice Engine per session → proxy
(06:00) Design philosophy — from pure TTS to a higher-abstraction bundle
(06:43) Prediction — chat agents will die; add voice or end with chat
(06:51) Calling for design partners
(07:01) Q&A — handling tool calls (leave to the existing agent + DOM tools)

Sources

Give Your Chat Agent a Voice — Luke Harries, ElevenLabs (AI Engineer)

ルーク・ハリーズ

Luke Harries

ElevenLabs Growth + Engineering / 元 Microsoft Research・PostHog

comment is stripped from the HTML output. */}