Synthetic research is scaling faster than we can evaluate it

Synthetic approaches are increasingly being scaled and defended with a level of confidence that the current evidence base doesn’t fully justify. It’s not that these methods lack merit – it’s that the infrastructure to properly assess them is still largely absent.

That’s a problem. And it’s one that, perhaps ironically, the industry has been slow to address, given how much value is placed on evidence quality and rigour.

Nick Baker Chief Research Officer 2 July 2026

Reassessing the go-to evaluation method

To evaluate a synthetic approach, the most common instinct is to run an AI-simulated survey alongside a real one and then compare the results. This can be useful – but it’s not enough on its own. At best, it tells you whether the outputs look similar, not whether they’re genuinely useful in practice.

Convergent validity – the degree to which synthetic and human outputs align – helps assess replication. But it doesn’t tell you whether synthetic methods can add something new, or whether they can generate insights that traditional approaches can’t.

And that matters, because some of the most promising use cases for synthetic research – including rapid hypothesis testing, low-cost scenario exploration and privacy-safe augmentation of limited data – aren’t about replication at all. In those contexts, simple convergence testing misses the point entirely.

What a proper evaluation framework looks like

A fit-for-purpose framework should assess synthetic approaches across at least four dimensions:

Construct validity
Does the synthetic representation actually model what it claims to model, and how has that been established? “Trust us” is not a methodology.
Temporal stability
How does performance hold up over time, and when does recalibration become necessary? All models decay – the key question is whether suppliers understands when their models do.
Boundary conditions
Where does the approach stop working, and is that clearly communicated to users? Every method has failure modes: transparency about them is essential.
Decision utility
Does the output improve the quality of decisions compared with available alternatives at similar cost? This is the dimension that matters most – and the one most often missing from vendor discussions.

Decision utility: the dimension that matters most

It’s also the one most often absent from vendor conversations – and, increasingly, from client briefs. That absence is telling.

Decision utility is harder to measure than convergence. It requires longitudinal tracking and a willingness to examine the baseline quality of decisions, which many organisations are reluctant to do. But it is the only measure that ultimately matters: the one that links synthetic research to the commercial outcomes it is meant to support.

A synthetic approach that converges beautifully with traditional research but adds no improvement in decision quality, even at lower cost, is a solution in search of a problem. And while faster, cheaper insight is naturally appealing, fast and cheap and wrong is still just wrong – at scale.

There are early signs of progress

Encouragingly, some technically sophisticated suppliers are starting to publish calibration methods and uncertainty bounds alongside their outputs. A small number of progressive clients are also running structured pilots with pre-specified success criteria. Industry bodies are beginning, slowly, to develop guidance.

But this is not yet keeping pace with the speed of commercial adoption. Collectively, the industry is still building the plane while flying it. That does not necessarily have to be a problem – provided we’re honest about it and put the right safeguards in place.

The opportunity – and the risk

The opportunity is clear: a sector that develops credible, shared evaluation standards for synthetic methods will hold a durable competitive advantage over one that does not. Many in the industry are already working to get this right, and the commercial case for doing so is strong.

The risk is just as real: a high-profile failure – where a synthetic output significantly mispredicts real-world behaviour in a commercially important context – could set the category back years. When that happens, the backlash is unlikely to distinguish between responsible practitioners and those taking a lighter-touch approach.

Curious about synthetic research? Try it for yourself.

Virtual Personas by Savanta combines validated behavioural science and decades of real consumer data to deliver reliable audience insight at speed. Try the free version and you’ll gain instant access to eight pre-built personas – so you can run interviews, test ideas and explore your audiences straight away, no commitment needed.