Top Tools - Synthetic Data for Research - Featured Image - Insight Platforms

Top Tools: Synthetic Data for Research

By Insight Platforms

  • article

What is Synthetic Data?

Synthetic data is artificially generated to mimic the attributes of data from real humans.

Companies tend to use synthetic data as it offers a safe and privacy-preserving alternative to using actual personal or sensitive data. Gartner estimates that by 2030, synthetic data will completely overshadow real data in AI models.

Some common uses include:

  • Robotics: The robotics industry pioneered the creation and use of synthetic data. Robotics firms use artificially generated data to simulate an infinite number of scenarios and real-world environments to test self-driving cars and drones without risking damage to objects or people. Waabi World is experimenting with self-driving vehicles and simulating crashes in a virtual world.
  • Healthcare/pharma: In the healthcare industry, synthetic data can be used to study disease patterns, personalise patient treatments, and improve patient outcomes without compromising privacy and sensitive personal data. Synthetic data can be used as a foundation for clinical trials when real data isn’t widely available. Pharmaceutical companies are using Generative AI to design new drugs.
  • Finance: Banks have to comply with strict data privacy and data retention laws and treat all personal and financial information carefully. With the help of privacy-preserving synthetic data, they can run analyses on artificially generated data based on real customer data without legal implications. American Express has been experimenting with training fraud detection models using synthetic data.

Apart from these examples, social media platforms, insurance companies, and many other industries and sectors use synthetic data in some shape or form.

What is Synthetic Data for Research?

Synthetic data for research refers to artificially generated data that mimics the data you might collect from surveys, qualitative research or even behavioural observations.

It is created using AI that has been trained on existing data, and Large Language Models such as GPT4 are increasingly being used to create synthetic personas for qualitative research and large-scale samples of ‘digital twins’ for quantitative simulations.

Synthetic data can be used to simulate survey responses in market research by creating virtual personas that represent different groups of respondents – allowing researchers to simulate scenarios and gather insights without relying on actual survey participants.

It can also be useful if you want to protect the privacy of survey respondents and comply with data protection regulations; or even to help develop hypotheses around niche and sensitive topics such as rare disease experiences, as in this example from Day One Strategy.

In qualitative, product and UX research, synthetic data can be used to generate small groups of personas with specific needs, behaviours or product usage characteristics. These personas can then answer open-ended questions and engage in discussions.

One of the principal benefits from synthetic data for research is in hypothesis development and exploring research designs: testing surveys or discussion guides, question formats and response patterns to optimise research before going into field.

And synthetic data for research is going mainstream, as shown by this article from Marketing Week columnist Mark Ritson.

But is synthetic data all it’s cracked up to be?


The use of synthetic respondents or participants in research has certainly sparked some controversy.

Tools like ChatGPT use Natural Language Processing (NLP) to create statistical models of text. They work by predicting the most likely next word in a sequence. But such algorithms do not understand meaning and are only capable of making predictions based on patterns identified in the model’s training data.

Critics argue that, due to the nature of LLMs (Large Language Models), data from synthetic respondents may not accurately capture the nuances and complexities of real-world data or the meanings humans convey. For example, this article claims that generative AI cannot create patterns of meaning using ‘synthetic users’ for design research.

There have also been concerns about the reliability and validity of research findings based on synthetic respondent data, as well as the potential misuse or misinterpretation of synthetic data – leading to misleading conclusions or unethical practices.

Others in the research industry are concerned about the broader pace of change and worry that the synthetic data solutions will lead to blind over-reliance on dubious evidence.

The Top Tools for Synthetic Data for Research

Whether you’re an enthusiast or a sceptic, take this opportunity to explore some of the leading solutions in this emerging field. We will update this collection of tools over time, as this is a fast-growing area.


Yabble is an AI tool that offers a suite of tools including Count, Gen, Summarize, and Augmented Data. Count enables automated analysis of unstructured data to identify key drivers for business growth. Summarize generates accurate and rich summaries of long-form data, while Augmented Data enhances data analysis capabilities. Gen serves as an AI research assistant that can provide insights by answering questions about data.

Yabble’s AI-powered insights tools combine custom-built algorithms with OpenAI’s GPT neural network to generate automated insights from unstructured text data. Yabble enables teams to create meaningful themes and sub-themes from data, validate multiple hypotheses in real-time, and explore variations on ideas in a protected data environment.

Yabble also recently introduced “Virtual Audiences”, in other words, dynamic AI-driven personas powered by Yabble’s proprietary Augmented Data model. These personas generate responses as if they were real individuals offering real-time, in-depth insights on a variety of topics, markets and target groups, by analysing data dynamically from diverse sources. Similarly, the Yabble ChatGPT plugin, running on the same model, works by taking the user’s query, turning it into a set of survey questions and using augmented data to answer them.

Find out more about Yabble here:

Watch a demo of Yabble’s solution here:


PersonaPanels uses AI-powered synthetic respondents to provide consumer insights to businesses. The platform creates machine-learning representations of key market segments by exploring the internet and uncovering trends. These synthetic respondents are used to test products, advertising, messaging, and other concepts.

They offer various solutions including Living Segmentation, PersonaPanels Monitoring, KnowNow Messaging, and Heartbeat by PersonaPanels.

Living Segmentation uses machine learning to transform existing segmentation research into virtual customer segments that evolve with changing markets. PersonaPanels Monitoring uses synthetic respondents to monitor targeted population segments and spot changing customer interests in real-time. Synthetic respondents are created using data from traditional research efforts and can access social media sites, process sounds and images, and access YouTube videos.

KnowNow Messaging allows for quick testing of new products and messaging ideas using synthetic respondents. Heartbeat by PersonaPanels allows for tracking and anticipating changing trends among US generational segments helping businesses understand the preferences and behaviours of different generational segments.

Find out more about PersonaPanels here:

Watch a lightning demo of the PersonaPanels solution here:

Synthetic Users

Synthetic Users offers a platform for testing ideas or products using AI participants. The company develops a conversational interface for creating surveys and building AI models of customers to simulate personas, validate concepts, test market fit, learn about preferences and behaviours, highlight blockers and get feedback.

Synthetic Users are trained on large Language Models, and they generate outputs based on patterns in that data.

Find out more about Synthetic Users here:

Watch a lightning demo of the Synthetic Users solution here:


Roundtable offers an AI-powered survey platform to gather instant feedback on any topic from any audience. The conversational interface allows users to create surveys on any subject they want to research.

The platform also enables users to build AI models of their customers and simulate new surveys, asking any question to any customer segment. Users can explore the data behind the AI model to understand how it works and gain confidence in knowing when to take action, and when to gather more data.

The suite of tools provided by the platform helps users identify optimal pricing, products, and messaging.

Find out more about Roundtable here:

Native AI

NativeAI is a platform that combines Natural Language Processing (NLP) and Artificial Intelligence (AI) to provide insights from large datasets.

With the power of generative AI, the platform aggregates qualitative data into actionable recommendations and comprehensive reports. It continuously scans the market for new data and synthesizes improvement recommendations using AI.

The platform provides a comprehensive insights dashboard, reporting features, visualisations, and alerts for specific products. It allows brands to analyse product and brand performance compared to competitors, identify consumer preferences, and fulfil unmet needs.

NativeAI’s proprietary Digital Twins solution uses generative AI to create clones of target customers using billions of parameters. Users can ask their custom digital twins specific questions about their products and get unbiased answers in minutes.

Find out more about Native AI here:


OpinioAI is a platform that uses AI language models to provide relevant insights, data, and opinions without the need for polls, surveys, or other traditional methods.

The platform offers features such as Persona Builder, which helps users build buyer personas; Ask Away, where predefined personas respond to specific questions; Analyze, which processes and analyzes datasets, reports, and research publications using AI; and Evaluate, which allows users to assess messaging and receive feedback from the perspective of their core personas.

OpinioAI aims to replace traditional data collection methods and enable researchers to use synthetic sampling and synthetic data generation and large language models (LLMs).

Find out more about OpinioAI here:

Synthetic data for research is still in its early days but its applications are vast. Stay up to date on this rapidly changing topic by subscribing to a free Insight Platforms account.


Scroll to Top