Newsletters

Leveraging AI to Improve Data Quality

11/25/2024

Leveraging AI to Improve Data Quality

Dr. Tammy Rosner and Keith Phillips, Dynata

Over the last few years, the market research industry has had to go through a major transformation to keep up with advancements in machine learning, increasingly sophisticated fraud, and combat reduced respondent engagement. A big part of that transformation is driven by machine learning (ML) and general artificial intelligence (AI), allowing for the tools and systems used within data collection to become much more sophisticated than they once were.

A Full-System View

Manual data checks and simple quality systems are no longer enough to catch and block fraudsters from entering surveys. This is why at Dynata we have a sophisticated, holistic system that evaluates panelists at every interaction, as well as evaluating their historical behaviour. At multiple points in our system, panelists are evaluated with different ML models. This determines the likelihood of them being fraudulent or chronically disengaged, allowing us to detect and block poor quality panelists at multiple stages. ML makes it possible to leverage the vast amount of data we have on our panelists and make informed decisions about them using all the data points at our disposal. These kinds of complex decisions would not be possible using basic systems or manual cleaning alone.

Moreover, the use of ML allows us to more seamlessly integrate client survey respondent feedback into our systems. When clients report rejected respondents, we provide those respondent IDs back to one of our models. This model can then look for patterns within the IDs, both at a project level and a system level. By providing the feedback directly to one of our ML models, we can better spot new fraud patterns as they emerge, quickly identify sources that may be experiencing a fraud issue, and pinpoint respondents who are consistently providing poor data across projects who should be blocked from our systems. In this way, our models learn directly from the feedback our clients give us.

Fraud Versus Disengagement

One of our key models is QualityScore, an AI-driven model that evaluates in-survey behaviour. This model uses standard industry checks, such as speeding, straight-lining, and open-end quality, while also taking into account background behaviours, such as copy/pasting answers, mouse movements, and survey acceleration. All told, QualityScore uses over 175 data points to make determinations about respondent quality. This system allows QualityScore to root out poor-quality responses, both by identifying fraud as well as by differentiating normal amounts of disengagement from severe disengagement. In this way, QualityScore ensures poor quality data are removed in real time, while not being too over-zealous with removals.

While we use AI to ensure that respondents are demonstrating unique behaviour, which we would expect from real, unique survey takers, we also use it to help differentiate fraud from disengagement at the system-level. For example, we use AI to identify data anomalies at the router and traffic level, which helps identify and block fraudulent survey takers.

We also use AI to differentiate fraud from disengagement when it comes to respondents we’ve blocked from our systems. As noted, we’ll block respondents who are reported across projects as providing poor quality data, along with those we’ve identified as potentially fraudulent. To become unblocked, respondents must pass an AI-based government ID validation check. This tool compares a selfie to a photo on government-issued ID as well as ensures the ID is valid and the information matches what we have on file. Fraudsters won’t be able to pass this check, whereas real respondents should be able to pass and can re-enter our system. In this instance, we can determine if these real respondents have likely become chronically disengaged.

Conclusion

These are just a handful of ways that we have leveraged AI in recent years to improve data quality and insights for our clients at Dynata. Indeed, the efficacy of these methods is clear, as they have led to an average 96% sample acceptance rate across our clients.

Interested in finding out more about how Dynata applies ML and AI to drive survey data quality? Visit our website or get in touch with our quality experts.