Designing for Data Quality: How Survey Researchers Can Help Themselves

By Dig Insights

  • article
  • Agile Quantitative Research
  • Agile Qualitative Research
  • Behavioural Insights
  • Brand Research
  • Concept Testing
  • Diary Studies
  • Innovation Research
  • Trend Monitoring
  • Trend Scouting
  • Questionnaire Design

Summarise with AI

ChatGPT
Claude Logo
Gemini Logo
Perplexity Logo

I have been on both sides of the table when it comes to insights. I have been the person in front of stakeholders, presenting findings, and feeling that little pang in the back of my mind about whether someone is going to question the sample, the results, or the numbers in the deck. I have also been the person defending a dataset in front of a group that genuinely wants to use the work in a meaningful way, but needs confidence in the inputs.

That experience shapes how I think about market research data quality. The conversation is shifting. It used to be about whether the insights are correct. Now it is often about whether we can trust where the data is coming from in the first place.

This article covers part of the webinar “Can You Trust Your Data? A New Model for Data Quality”. Rewatch the entire webinar here:


Why “Filtering Later” Is Not Enough

There are a few forces in the ecosystem that are making this harder.

One is professional survey takers. This is not new, but they are extremely good at getting through surveys quickly and passing the checks that many people rely on.

Another is AI-assisted survey responses. You can get open ends that look good, read well, and even seem thoughtful. But they are not coming from genuine human engagement.

A third is the complexity of the supply chain. Whether you are a research agency collecting data, or a client trying to understand the sample, it can be a complex web of suppliers and routing. That makes it harder to know what is happening beneath the surface.

Put those together and you get a situation where poor quality is not always obvious, and what looks like clean data can still carry unseen risk.

So yes, checks matter. But we also need to design research in a way that reduces the conditions that lead to bad data in the first place. I think of this as designing for quality upstream.

Designing the Survey Experience for Engagement

One of the areas we have been focused on for many years is designing surveys for mobile. This was not originally positioned as a “data quality feature”, but in practice it has a big impact.

More and more respondents complete surveys on mobile devices. If the survey experience is awkward, long, or difficult to navigate on a phone, you increase disengagement. You also increase the chance that people speed, satisfice, or drop out. That affects both quality and completion.

We built question types and experiences intended to mirror how someone uses their phone. The goal is to collect information in a way that is intuitive and aligned with the context of the device.

One of our major fieldwork partners, a global panel company, ran their own tests on successful completion rates on mobile. They were trying to avoid putting respondents into experiences that could not be completed. They came back and told us we had the highest successful completion rate on mobile compared with other suppliers they tested.

I bring that up because it points to something practical. When you make the survey easier to complete in the environment where people are actually taking it, you discourage disengaged behaviour and reduce some of the conditions that make downstream cleaning necessary.

Working with Sample Partners Who Have Strong Protocols

Another part of upstream quality is working with sample partners who have their own human verification layers and strong protocols. If a partner is ISO vetted, that matters. It is one of the ways you can improve confidence before a respondent ever reaches the first question.

That does not remove the need for in-survey checks, but it strengthens the foundation. It also helps reduce the temptation to treat quality as something you can bolt on at the end.

Using Analytics to Find Inconsistencies Inside Each Dataset

Even with strong design and vetted partners, you still need sophisticated detection. We have invested heavily in advanced analytics to build and refine additional checks and measures. Some of these are familiar, including bot detection, red herring checks, and speeding.

We have also instituted AI detection tools that look for more advanced patterns. One that I am particularly proud of is a machine learning model that adapts to each individual dataset. It is not trained only on “all surveys in general”. Instead, we deploy it on a specific survey dataset and look for inconsistencies in responses where there is no viable pattern, specific to that survey.

That matters because each survey is different – the questions, batteries, and structure change –, so you want tools that can assess validity in the context of the study you just ran, not only against generic expectations.

Using Multiple Sources to Add Context, Not Just Control

Upstream design also includes thinking about how you validate and contextualise what you are hearing in a survey.

Survey data captures what happens in a controlled environment. That does not always reflect real-world behaviour. When you can incorporate another layer of data, you can move beyond validation into contextualisation.

In our work, we have integrated social conversation data into the platform, so we can collect what people are saying across many different sources and understand sentiment around a topic. When we can marry what people say and feel in organic conversation with what we hear in primary research, it gives a fuller picture. It also makes us less reliant on a single data source.

That does not mean social data replaces surveys – it means it provides another lens. When findings align with real-world discourse, confidence increases. When they diverge, it is a signal that something needs to be understood before decisions are made.

Why Length of Interview Matters, and Why It Is Not Just a Preference

A question that came up in the Q&A was whether we have a way to grade the questionnaire itself. That is a fair question because the burden of quality does not sit only with panelists or platforms. Survey design plays a role.

I do not have a single scoring mechanism I can point to that covers every case, but I can say we manage this actively on a project-by-project basis, and we care deeply about completion and engagement.

There is a familiar negotiation that happens in research. A client starts with a long list, maybe a 25-statement battery. The research team tries to reduce it, perhaps down to 15. Then panel partners may push for it to be lower still. That pushback is not arbitrary – it is grounded in respondent experience and in what happens to data quality when surveys become too long or too boring.

Inside our platform, we make choices about the number of statements we include, the way stimuli are shown, how respondents are routed, and how many follow-ups are asked. These choices affect length of interview, and they affect quality.

There are always cases where you need longer surveys. A segmentation survey might be twenty minutes because of the nature of the work. But in many situations, we are designing shorter surveys and pushing for shorter length of interview because it makes it easier for respondents to stay engaged and complete the work properly.

A Practical Tension: AI Open Ends and Accessibility

One of the most immediate issues we are trying to solve right now is AI in open-ended responses. There is a concern that open ends are being copied and pasted from AI tools. The response can read as articulate and meaningful. Historically, someone might have treated that as a “gold” quote and put it on a slide.

Clients are savvy. They know what is going on. So we have to respond.

One idea is to prevent copy and paste inside the survey environment. But that runs into accessibility implications. Some people need to be able to copy and paste to participate effectively. So even something that looks like a clean technical fix can create a different problem.

Another direction we are exploring is relying more on audio or video open-ended responses, so you can hear the person speak or see them respond, rather than only reading typed comments. Again, this is not a silver bullet. It is part of ongoing work to design experiences that lead to genuine engagement.


Bringing It Together: Data Quality Is a System, Not a Checklist

When I step back, I do not think the right approach is to rely only on cleaning after the fact. That treats market research data quality as if it were a single step at the end of a project.

I think quality is a system. It starts with how you design the survey. It includes who you work with for sample, analytics and detection inside the dataset, using multiple sources to validate and contextualise what you are seeing, and recognising that the ecosystem has changed, so we need visibility and learning that extend beyond a single survey.

That is how we reduce the nervousness people feel when presenting insights, and how we make it easier to stand behind the decisions built on top of the data.

This article covers part of the webinar “Can You Trust Your Data? A New Model for Data Quality”. Rewatch the entire webinar here:


Author

Kevin Hare
Kevin is a data-driven executive with 20 years of experience leveraging insights to drive decision making and strategy for some of the largest global brands.
FIND OUT MORE Kevin Hare

Learn more about

Scroll to Top