Why Data Quality Is Not Just One Problem

Why Data Quality Is Not Just One Problem

By Walr

article

Survey Research
Data Processing

Summarise with AI

We have spent years talking about data quality in this industry. It’s been a persistent topic on the conference circuit, in trade publications, and in client conversations. But I think the way we’ve been framing it has made the problem harder to solve, not easier.

Data quality isn’t a single problem. Rather, it’s several distinct problems, each with different causes and different remedies. Until we are more precise about which problem we are actually trying to address, we’ll keep reaching for broad solutions that only partially work.

Learn more by watching or listening to Lewis on the Founders and Leaders Series podcast here:

Episode 14: Lewis Reeves, Founder & CEO, Walr

Founders & Leaders Series interview: Lewis Reeves of Walr on building the tech and data infrastructure to power market research at scale.

FIND OUT MORE

Why The Current Framing Falls Short

The phrase “data quality” has become a catch-all. When a dataset comes back with results that don’t look right, or when a client raises a concern about reliability, the response is often to talk about data quality as if it were a single lever you could pull. But that’s not the case.

At Walr, we have a fairly unique vantage point on this as an end-to-end platform covering survey design, audience access, and data structuring. We can see the whole chain. And what the data tells us is that the sources of poor quality are genuinely different from one another, and the weight of each varies considerably depending on the study.

Problem Category #1: Survey Design

The first category is survey design. A poorly designed survey produces poor data, full stop. This isn’t a sampling problem or a fraud problem. It’s a design problem, and it shows up in ways that are sometimes mistaken for respondent failure.

We can see in our own data what attention spans actually look like in practice. Once a survey goes past twenty minutes, dropout rates increase. So do post-survey disqualification rates. What’s worth considering is not whether those respondents became bad actors at the twenty-minute mark, when the more likely explanation is that they became bored.

The rest of the world has adapted to shorter attention spans. Matt Damon recently spoke about how films now need to establish what is happening four times in the first twenty minutes to hold an audience. Yet our surveys are largely unchanged in their design assumptions. There is a significant opportunity to use new technology not just to process data faster, but to design experiences that generate richer and more reliable responses in the first place.

Respondents who give us their time are a resource worth protecting because if we design badly, they don’t come back. On the other hand, the people gaming the system always come back; they will run through brick walls if necessary. That’s why the design of the experience is so much more than a cosmetic concern. It directly affects who ends up in your panel over time.

Problem Category #2: Respondent Inattention

Related to design, but distinct from it, is inattention. A respondent who is distracted or disengaged partway through a survey isn’t the same as a fraudulent respondent. They may have started the survey in good faith, but at some point, their attention dropped.

This matters because the solutions are different. Inattention is addressed through better survey design, shorter instruments, more engaging question formats, and smarter use of the data we already have about how people behave in surveys. Fraud requires different tools entirely.

Lumping inattention and fraud together as a single “data quality” problem means that interventions aimed at one often do not reach the other. You can have the most sophisticated fraud detection in place and still produce unreliable data from disengaged but legitimate respondents.

Problem Category #3: Fraud

Fraud is a real and separate problem. Bad actors deliberately entering surveys, providing false responses, or using bots to inflate completion rates are issues the industry has been dealing with for years. The scale of this problem, when you have large volumes of data, can be measured and tracked fairly precisely.

What I would say, though, is that the impact of fraud needs to be weighed carefully against the other categories. In our experience, the volume of genuinely bad actors is significant, but its contribution to poor data quality sits alongside, not above, that of poor design and inattention. The industry has tended to focus disproportionately on fraud because it is a more dramatic story. However, the more unglamorous work of improving survey design often has at least as much impact on data reliability.

The tools to address fraud are also improving. Technology that identifies behavioural patterns, timing anomalies, and duplicate entries has become considerably more sophisticated. Keeping bad actors out of surveys while maintaining a seamless experience for genuine respondents is a solvable problem, and one that benefits from continued investment.

Problem Category #4: Error

The fourth category is straightforward human error, both in survey construction and in data handling. Ambiguous questions, inconsistent scales, and logic that misfires: these produce data that looks like a quality problem but originates in the build rather than the response.

This is an area where the agentic tools we have been developing at Walr have direct application. A verification agent that checks survey structure against the original questionnaire intent can catch a meaningful number of these errors before the survey goes live. That isn’t a complete solution, but it reduces a category of problem that has historically relied entirely on human review.

What Better Categorisation Makes Possible

The reason I think this clearer categorisation of data quality issues matters in practice is that it gives research teams more levers to pull.

If you’re seeing quality problems driven primarily by inattention and survey length, the intervention is different from one driven by fraud. If you suspect design is the issue, that’s a conversation about questionnaire structure and format. If fraud is the main driver, the conversation is about panel vetting and behavioural monitoring.

Being joined up across survey, audience, and data, as we are at Walr, makes it possible to see these distinctions clearly. But even without that level of integration, the first step is to ask the more precise question: which type of quality problem are we actually dealing with here?

The industry has done a good job of raising awareness of data quality as a general concern. The next step is to get more specific about what we mean, because that is where the solutions are.

Learn more by watching or listening to Lewis on the Founders and Leaders Series podcast here:

Episode 14: Lewis Reeves, Founder & CEO, Walr

Founders & Leaders Series interview: Lewis Reeves of Walr on building the tech and data infrastructure to power market research at scale.

FIND OUT MORE

Author

Lewis Reeves

Lewis Reeves is the founder and CEO of Walr, an enterprise platform providing end-to-end online data collection for market research.

FIND OUT MORE

Learn more about

Walr

Walr combines survey software and audience access in an intuitive user interface with interactive sharing functions and customisable outputs.

FIND OUT MORE