80% of Context Engineering Is Agentic Search — Leonie Monigatti / Elastic (AI Engineer Europe)

AI Engineer Europe · May 8, 2026

Leonie Monigatti · 02:12 "About 80% of context engineering is agentic search."

AI Engineer channel (published May 2026, around one hour). Workshop from AI Engineer Europe 2026 (April 8–10) in London.

Context engineering — the technique of deciding what to put into an LLM's context window — has risen rapidly as a topic over the past year. Leonie frames the theme as "80% is agentic search" and systematizes it across a one-hour live workshop. The session walks through the design evolution from fixed RAG, to agentic RAG, to multi-source handling, and finally to dedicated tool suites, paired concretely with failure patterns and remedies.

The speaker is Leonie Monigatti — Developer Advocate at Elastic (the company behind Elasticsearch). With prior experience as a Developer Advocate at Weaviate (the vector database company), she contributes a series of technical blog posts referenced across the industry to Elasticsearch Labs, including "Search tools for context engineering," "Database retrieval tools for context engineering," and "Agent Journey Map." She occupies the position of developer-education lead on the cross-cutting theme of building agents on top of search.

The talk opens with "the past three years of history." Early RAG was a fixed pipeline: take the user message verbatim as a search query, retrieve from a database, and append to context. This hit limits — "searching every time regardless of whether it's actually needed" and "only being able to query once when multi-hop is required" — and search became an independent "tool." The agent itself decides whether to call it based on the situation. That is current agentic search. This is the groundwork.

The core is "three patterns by which agentic search breaks in the field." (1) The agent doesn't call the tool (it judges that its own parametric knowledge suffices); (2) it calls the wrong tool (e.g., calling web search when web search is not desired); (3) it does call the right tool, but with wrong parameters. As prescriptions, Leonie shows how to write tool descriptions (central purpose → parameters → trigger conditions → relationship to other tools), reinforcement via system prompts, and Anthropic Skills-style progressive disclosure (skill loading), demonstrated through a concrete Elasticsearch ESQL example.

Key observations

Counting offloaded outward — starting from "agents are famously bad at counting things" (34:07)

A demo that loads AI Engineer Europe's own schedule data into Elasticsearch and runs an ESQL aggregation: "How many sessions are on April 8?" The design framed: "LLMs are famously bad at counting, so let the search tool execute the aggregation and return only the result." The moment ESQL's COUNT correctly returned the concrete number (27 sessions), the framing clicks all at once — "don't make the LLM count; outsource counting to the search tool." The nested structure — demo material being the workshop's own schedule — is a nice touch.

Three failure modes × the prescription of tool descriptions (10:00 – 12:30)

After organizing the three failures — "the tool isn't called," "the wrong tool is called," and "parameters can't be generated" — Leonie shows a staged recipe for what to write in a tool description. Starting from one minimal sentence, she thickens it in order: (1) central purpose, (2) parameter explanations, (3) trigger conditions (when to use the tool, when not to), (4) relationships to other tools ("call this skill first," "confirm with the user before this tool"). If even a perfect tool description doesn't fix it, layer additional instructions in the system prompt. A colleague's testimony — "training the agent not to call the web search tool was the hardest part" — backs the persuasiveness of her framing.

"Low floor, high ceiling" — a quality criterion for agent tool design (47:11)

The practical guideline shown in the talk's second half. Low floor = the agent can use it without error, makes few mistakes, is efficient, doesn't need to call the tool many times. High ceiling = the agent can respond on the spot to unexpected complex queries, handling parts that the dedicated, narrow tools couldn't break through alone. Semantic search tools are an example of the floor; shell tools and general-purpose query execution tools are examples of the ceiling, illustrated with concrete cases. The recommended approach: "if you don't yet know how the agent behaves, start from general-purpose tools, log them, and once you spot a pattern where the same tool is called 4–5 times in a row, carve it out as a dedicated tool." A vocabulary from user experience theory, repurposed as agent tool design — an interesting transfer.

Demonstrating progressive disclosure via "Elasticsearch ESQL skill loading" (29:43 – 33:00)

A concrete implementation of progressive disclosure — the same Anthropic Skills design — built on top of an Elasticsearch client. The tool body is a general-purpose search, but the tool description says "first load the Elasticsearch ESQL skill before using it," and the LLM middleware injects the skill itself into the context as needed. Writing ESQL syntax rules (double quotes, wildcard patterns, etc.) into the skill file lets the agent generate correct ESQL — a flow demonstrated live. At the same venue, Sam (Mistral) and Luke (ElevenLabs) noted that "Anthropic Skills will ship bundled at launch," and the current state — Skills spreading as a shared industry foundation — is verifiable here in another company's (Elastic's) implementation.

Video outline

  • (00:00) Introduction, workshop goals
  • (01:20) What context engineering is — the technique of selecting from context sources into the window
  • (02:12) "80% of context engineering is agentic search"
  • (02:24) The past three years — the arrival of fixed RAG and its limits
  • (03:46) Transition to agentic RAG — search becomes an independent "tool"
  • (04:23) Expanding context sources — local files, working memory (scratchpads), DBs
  • (08:04) "Doing the right search is very hard" — variations of search methods
  • (08:55) Vector vs. keyword; dense, sparse, and multi-vector embeddings
  • (09:00) Three patterns by which agentic search breaks in the field
  • (10:00) Failure 1: the tool isn't called; Failure 2: the wrong tool is called; Failure 3: parameters are wrong
  • (11:00) A staged recipe for tool descriptions — central purpose → parameters → triggers → relationships
  • (11:50) Reinforcement via system prompts
  • (12:00) Handling parameter complexity
  • (29:43) Progressive disclosure and Skills
  • (30:30) Live demo of Elasticsearch ESQL skill loading
  • (32:00) ESQL aggregation demo — "How many sessions on April 8?" → 27
  • (34:07) "Agents are bad at counting" — design that outsources aggregation to search tools
  • (45:30) "Low floor, high ceiling" tool design guideline
  • (47:11) Recommended approach — start with general-purpose tools and carve out dedicated tools from logs
  • (48:50) Q&A — whether the required tool stack depends on the model

Sources

Agentic Search for Context Engineering — Leonie Monigatti, Elastic (AI Engineer)

comment is stripped from the HTML output. */}