
What Makes a Synthetic Persona Trustworthy? The Case for First-Party Data
By Toluna
- article
- Agile Qualitative Research
- AI
- Artificial Intelligence
- AI Personas
- Survey Research
- Customer Panels
- Insight Communities
- Long Term Communities
- Qual-Quant Hybrid
- Survey Panel
- Synthetic Data
The question of whether synthetic personas can be trusted in market research is, in many ways, the wrong starting point. The more useful question is: what are they built on? The answer to that determines whether the output has any real value.
I want to explain how we arrived at synthetic personas at Toluna, because the journey matters. It is not a story that begins with generative AI. It begins much earlier, and understanding that origin is important for anyone thinking seriously about what synthetic research can and cannot do.
Learn more by watching or listening to Frédéric-Charles Petit on the Founders and Leaders Series podcast here:
Episode 10: Frédéric-Charles Petit, Founder & CEO, Toluna
The Starting Point: a Commitment to AI Before We Knew Why
In 2019, our engineering team came to me and said there was something we absolutely needed to embark on. They said we needed to accelerate using machine learning and AI. I asked why. Their answer was honest: we might not know why yet, but if we do not start now, it might be too late.
That is how it began. Not with a clear use case, not with a product roadmap, but with a conviction that the foundation needed to be built – and built early.
The first applications were practical and largely internal. We used machine learning and deep learning to vet the quality of the panel and responses. Then we used it for fraud detection. These are not glamorous applications, but they are fundamental. The integrity of a consumer panel depends on knowing who is in it and whether the responses it generates can be trusted.
Respondents as Mathematical Equations
At some point, something became clear to us. Every individual in our panel could be represented as a mathematical equation. That realisation changed the direction of what we were building.
From there, we could start to predict which survey would be most relevant for a given panel member, based on the modelling we had done. We could begin to understand each individual not just as a set of demographic data points, but as a pattern of behaviours, preferences, and tendencies that could be modelled and, to some extent, anticipated.
In parallel, we developed conversational research using AI – what we call QProbe. This is a system in which, within a survey, AI asks questions that we have not decided in advance. The system reads the respondent’s sentiment and probes accordingly. It gives a much richer picture of how individuals respond, not just what they say, but the texture of how they say it.
We also used AI to code sentiment at a large scale on qualitative data. Processing open-ended, unstructured responses at volume became another layer in the foundation we were building.
Building the Persona
The synthetic persona is built on all of that. It is important to understand what it is and what it is not.
It is not a process of taking a real person from the panel and constructing a digital copy of them. The persona takes a number of attributes, and then the system builds the rest. The attributes are the starting point; what the system generates from there is derived from the deep modelling of real panel behaviour over time.
What this means is that the persona is grounded in truth. It is not a fictional construct generated from general training data. It is rooted in the panel and in the accumulated understanding of how real respondents behave, built up over many years.
The most accurate way I know to describe it is as a large agentic system that presents itself as a synthetic persona.
From Personas to Business Questions
Once we had this foundation, the question became: how do we use it to create real value for clients?
We could have generated a very large number of personas. But the more useful frame was: what business questions do our clients need to answer every day? Starting from that question led us to a different kind of product.
First we tackled claims testing. Claims are something clients need to validate frequently and at speed. The persona could answer those questions with high correlation and fast turnaround. From there, we extended to ad pretesting.
Take ACT Instant AI as an example. A client puts their ad into the system. The system creates the questionnaire, interrogates two hundred or more personas, and returns results in almost no time. You can apply that to a wide variety of formats and build benchmarks very quickly. The quality of the information on which decisions are being based changes significantly.
This is the shift. Synthetic research is not being used simply for fieldwork. It is used to answer the business questions clients ask every day, but at a speed and scale that traditional methods cannot match.
The Quality Condition
None of this works without quality at the foundation. And I want to be clear about what quality means here, because I think it is sometimes misunderstood.
The quality of a synthetic persona presents exactly the same challenge as any other method of collecting insight – whether telephone, face-to-face, or online. The method differs, but the fundamental requirement remains the same. If the underlying data is poor, the output will be poor. That is not a problem unique to AI research. It is the same problem that has always existed in this industry.
What changes with AI is the scale at which the consequences, positive or negative, are felt. The output of a synthetic persona is only as reliable as the data and the modelling that produced it. That is why the history of how it was built matters. The claim to reliability does not come from the AI itself. It comes from what the AI was built on.
Where Adoption Stands
When I look at how clients are engaging with these tools today, I see a familiar pattern. When we launched online research, we saw a range of responses. There were cautious clients, and progressive clients – those who wanted to go all in. The same range exists now with AI.
What is different is that the piloting phase appears to be maturing. Clients are beginning to move from experiment to embedded use. The revenue we generated from AI-first and AI-infused solutions, across both qualitative and quantitative, in the second half of last year already exceeded the full year before it.
The market is still in its early hours, but it is moving. And the clients who are moving with the most confidence are those who understand that the quality of what synthetic research delivers depends entirely on the quality of what underpins it.
First-party data, grounded in truth, is not an optional extra in this process. It is the foundation on which everything else rests.
Learn more by watching or listening to Frédéric-Charles Petit on the Founders and Leaders Series podcast here:







