All blogs
Cluster 05 · The Trust Deficit·Blog 11 · Long-form

How Do You Prove Your Data Is Clean? Most Agencies Can't.

1,490 words6 min readTrust & Integrity

At some point in almost every significant client relationship, the question is asked — sometimes directly, sometimes implied in the shape of a concern or a sceptical response to a finding: how do we know the data is real?

It is a reasonable question. Online survey data has a well-documented quality problem. Major industry bodies have published extensively on fraud rates. Journalists have written about the gap between claimed and actual respondent quality. Sophisticated clients who commission market research regularly understand that some proportion of online survey data is unreliable.

The question is: what can you show them?

The Standard Answer and Why It Fails

The standard answer to "how do you ensure data quality" is a process description. "We use speed traps and attention checks. We apply consistency filters. We run deduplication. We clean the data before delivery."

This answer describes what most agencies do. It does not differentiate any particular agency from any other. And for a sophisticated client who has read about the limitations of these methods, it does not answer the underlying question: how many fraudulent responses made it into my data before you removed them, and how confident are you that you found them all?

"We clean the data" is a description of recovery. What clients increasingly want is evidence of prevention.

What a Defensible Data Quality Claim Looks Like

A defensible claim is one that can be supported by evidence — specific, documented, reproducible evidence that does not require the client to take anything on trust.

It includes: the detection mechanisms applied to every respondent, the specific signals that triggered each flag, the disposition of each flagged respondent, and the aggregate fraud rate by supplier.

It means being able to say: "Of twelve thousand respondents who attempted this survey, eight hundred and forty were blocked at entry for the following reasons, distributed across your suppliers as follows, and here is the detailed log if you would like to review it."

Most agencies cannot produce this report because the detection happens — if it happens at all — in ways that do not generate auditable records. The cleaning is done, the exclusions are counted, and the methodology note says "n=340 excluded for quality reasons" without further detail.

The Competitive Dimension

The ability to produce detailed, auditable data quality documentation is not just a client reassurance mechanism — it is a competitive differentiator. In a market where most agencies are offering the same quality claim ("we use rigorous quality controls"), the agency that can substantiate that claim with evidence is offering something qualitatively different.

This matters most in high-stakes research contexts: regulated industries where data provenance is subject to scrutiny, research that will be used in public policy or litigation, and major strategic decisions where the client knows that the cost of a bad decision dwarfs the cost of the research.

In these contexts, the agency that can show its working — that can produce a complete, time-stamped, auditable record of every quality decision made on the project — has a genuine competitive advantage that is difficult to replicate without the underlying infrastructure.

SoftSight — SurveyGuard generates a complete audit log for every study. softsight.io