AI and ambiguity: Building legal AI systems that recognise uncertainty before it becomes risk

Insights

Feb 17, 2026

A conversation with Thomas Barillot, VP AI at Avantia

As AI becomes more embedded in legal and compliance workflows, the conversation is shifting. It is no longer enough for systems to retrieve information quickly. They must retrieve it reliably. At Avantia, that means going beyond returning top rank similarity results and asking a harder question: can an AI system recognise when it does not fully understand the task in front of it?

The hidden assumption behind semantic search

Semantic search sits at the centre of modern AI systems. Even long context window Agents that can ingest entire books still rely on it to pick up the right tool, the right document, and answer a question or solve a problem. That search relies on one basic idea: that similar pieces of text can be grouped together.

In legal AI, that assumption carries real consequences. When systems are used to review contracts, conduct KYC checks, or retrieve precedent language, there is little room for error. Ambiguity is not just inconvenient. It can create risk.

Thomas Barillot, VP AI at Avantia, along with Alex De Castro from Imperial College London, recently explored a simple but important question: can we detect when a question is unclear, or when there is not enough context, before a system retrieves the wrong information?

“At a high level,” he explains, “the problem is straightforward. When we search semantically, can we retrieve the best results? And just as importantly, can we recognise when the search itself is uncertain?”

Semantic retrieval systems work by encoding text sequences into vectors and comparing how close those vectors are. If two pieces of text are similar, their numerical representations will sit close together. In many cases, this works well.

But this approach assumes that meaning is neatly organised. It assumes that similar ideas are always grouped together in a clean and predictable way.

Barillot’s  research suggests that this is often not true.

When similarity is not enough

“When a question is ambiguous,” he says, “its surrounding information doesn’t form a single, tidy cluster. It can sit between different topics. And that changes how the system should respond.”

Rather than only asking which documents are closest to a question, the research looks at how those documents are arranged. Are they tightly grouped around one topic? Or are they scattered across several different areas?

The key insight is simple: ambiguity leaves a pattern.

If a question is clear and focused, and contains relevant context, the results it retrieves tend to cluster. If the question is vague or spans multiple topics, the results look more fragmented.

That structural difference matters.

 Two types of ambiguity

Barillot highlights two common scenarios.

The first is an under-specified question. This is where the user has not provided enough context. In a legal setting, it might be an instruction such as “check compliance exposure” without clarifying jurisdiction, regulation, or transaction type. The system is forced to guess what the user means.

The second is a multi-topic question. Here, the request genuinely spans different areas. For example, a KYC instruction that touches onboarding requirements, sanctions screening, and licensing obligations at the same time.

“A standard retrieval system may treat both cases the same,” Barillot explains. “But they require very different handling.”

If the system cannot recognise the difference, problems follow.

With an under-specified question, the system may retrieve loosely related material. That extra noise can distort the answer, trigger LLM hallucination and increase the risk of confident but incorrect output.

With a multi-topic question, the system may retrieve only part of what is relevant. The result may look coherent but still be incomplete.

“In legal AI,” Barillot notes, “you cannot verify answers in the same way you might verify a financial calculation or a piece of code. You need a way to assess whether the system is actually operating within the right context.”

 From research to practical control

This is where the research becomes practical.

By examining how the retrieved documents are grouped, the system can estimate whether it is dealing with an ambiguous request or a multi-topic one. It can detect when it is operating in a coherent area of meaning, and when it is sitting between different themes.

Importantly, this pattern appears consistently across different AI models. It is not specific to one provider or one architecture. That suggests ambiguity is not a technical quirk. It is a structural feature of how meaning is represented in embedding models

The implications are significant.

If a system can detect ambiguity, it does not have to follow a fixed set of rules. It can adjust how many documents it retrieves. It can broaden its search when a question spans multiple topics. Or, in some cases, it can pause and ask the user for clarification.

“In some situations,” Barillot says, “the correct response is not to retrieve more information. It is to go back to the user and ask for more detail.”

Instead of assuming every question is well formed, the system becomes capable of recognising its own uncertainty.

Why this matters for legal AI

For Avantia, this matters because legal workflows frequently involve layered instructions and overlapping regulatory domains. In areas such as contract review, investor onboarding, and transaction-level KYC, ambiguity is common rather than exceptional.

The goal is not to replace existing retrieval systems. It is to strengthen them.

“The foundation works,” Barillot explains. “This is about adding a verification layer. It allows us to handle exceptions more intelligently.”

In practical terms, that means building AI systems that do not just search, but evaluate the quality of their own search before generating an answer.

As AI becomes more embedded in regulated environments, reliability becomes the defining issue. Precision is not just about ranking documents correctly. It is about recognising when the system lacks sufficient clarity to proceed safely.

Barillot sees this as part of a broader evolution in enterprise AI.

“We now have a structured way to measure ambiguity,” he says. “The next step is applying that insight in production systems, where uncertainty is operational, not theoretical.”

The strategic takeaway

For legal and compliance leaders assessing AI tools, the message is clear.

Similarity alone is not enough.

The next generation of legal AI will not simply retrieve information faster. It will understand when it is uncertain, adapt its behaviour, and, where necessary, ask for clarification.

In regulated environments, that shift is not technical detail. It is the difference between automation and trusted autonomy.

Read the full paper here: https://arxiv.org/html/2406.07990v2