Garbage In, Garbage Out: How Auto-Coding Impacts Predictive Quality in Market Research

essidsolutions

In this article, Dave Birch, Chief Technology Officer, Zappi and Product Owner, Steven Perianen discuss how auto-coding impacts the quality of market research and affects insights derived by artificial intelligence technologies and humans.

In market research, there’s a disparity between what people want to know and the questions they ask. Scale and choice-based questions are great at qualifying how people feel when they rate stimuli, but open-ended questions help us understand why stimuli are rated this way.

That said, there is a problem when asking the latter: responses take the form of unstructured data which demands arduous, and sometimes inaccurate, coding practices. At Zappi, we’ve been asking whether we can have our cake and eat it too. Can we use technology to significantly reduce the cost of coding open-ended questions in research?

The Truth in Open-Ends

Let’s take a look at two areas where open-ended questions have real value.

First: A key component in understanding mental availability is understanding the proportion of respondents aware of a brand. This question is typically asked in two ways, as an open-ended question (like,” Which banks have you heard of in the UK?”) or by showing respondents a list of brands and asking a closed-ended question (like, “Which of the following UK banks have you heard of?”).

These two methods of exploring brand awareness are both common. They’re referred to as unaided awareness and aided awareness, respectively. There is a significant body of evidence that shows unaided awareness leads to a more accurate assessment of brand equity than aided awareness, though many brands rely on the latter’s data as it’s easier to work with.

To understand the drivers of equity, it’s important to uncover consumer perceptions, usually by way of open-ended questions like, “What three words or phrases come to mind first when you think of this brand?” These unaided questions can be more volatile and prone to human bias during manual coding, but tech can fix this with a consistent approach.

Second: An open-ended questions list lets researchers pick up on items they may have missed when designing the survey. Answers to open-ended questions not only provide qualitative insight to the survey itself, but also warn of unintended consequences.

For example, a respondent might say “I really liked the music but thought the joke was sexist.” This is not a technical hiccough that fails to maximize behavior change – this is potentially negative press coverage. This feedback might have been lacking were the survey not equipped with an open-ended response, instead offering a list of preconceived options.

Open-ends also encourage honesty. Diana KanderOpens a new window , Author and Keynote Speaker, says, “Open-ended questions demonstrate to someone that you actually care what they think. It makes it obvious that your agenda is to learn, not to convince.“
This was also addressed by Susan Farrell for Nielsen Norman GroupOpens a new window , leaders in research-based user experience. She says, ”When you ask people to explain things to you, they often reveal surprising mental models, problem-solving strategies, hopes, fears, and much more.”

So, the value in open-ended is arguably greater than close-ended questions, but the cost of quantifying the resulting unstructured data is also greater. In order to draw conclusions quantitatively, these answers must all be coded.

If we ask 500 respondents the aforementioned bank brand question and one answers “HSBC,” but another mistakenly says “HSBD,” this needs to be represented in the data. So, if encoding each response comes in at roughly $39, coding 1,000 respondents becomes a costly $394. This expense and other economic pressures on the industry drive a trend toward closed-ended questions.

Structuring Unstructured Data

The quality of data has a direct impact on the quality and efficacy of insights (i.e., the antithesis of “garbage in, garbage out”). Machines have so far been incapable of understanding the intricacies of human language due to how varied and complex our references can seem. Without the ability to build on context, it’s incredibly difficult to establish meaning.

A good example comes from Max Tegmark’s “Life 3.0: Being Human in the Age of Artificial Intelligence,” “[…] a typical Winograd challenge (a test of machine intelligence) asks what ‘they’ refers to here: ‘The city councilmen refused the demonstrators a permit because they feared violence.’”

To try to introduce context, we can combine open-ended and closed-ended questions (e.g., a respondent said they liked an ad and now they’re talking precisely about what they liked). The key is not in thinking broadly, but in shrinking down the scope of our problem by using other, previously established, structured data to build context and help inform future readings (you might benefit from visualizing this as “breadcrumb data,” as in Hansel and Gretel). Humans have evolved to derive conclusions in the context of their environment – we’re training machines to do the same.

But does an auto-coded solution beat the human bias of manual coders? Yes. Traditionally, verbatims are coded by humans and, naturally, there is a level of subjectivity in coding this way. At Zappi, we wanted to know how an auto-coding solution compares to its manual equivalent, so we ran an experiment.

We had the same set of verbatims coded by four different manual coding companies, then created a master set of coded verbatims by identifying the common codes between providers. We then compared how well each coding company performed in matching against the master set of coded verbatims and found that on average, manual coding companies were able to match our master set 37 percent of the time. Our auto-coding solution, however, was able to match this master set 45 percent of the time.

The Next Evolution

According to Edd Gent for New ScientistOpens a new window , “The key innovation is realizing you can build a model of the user merely based on what they have said in the past.”

Above, you can see market research’s equivalent to Maslow’s hierarchy of needs: at the base level, we have high quality data. Next, the data is auto-coded (as discussed). Following this, charts are of secondary importance (currently our way of displaying the story). At the top of the pyramid: codified, automated analysis.

With an open-ended AI algorithm, it’s much harder to input vague, surface-level responses as users tend toward evocative, honest statements (in other words, better data). In which case, the norms we’re used to seeing – built for closed questions without accounting for nuance – may stray from accuracy. If we can materially improve the data we produce, we risk paralyzing ourselves with fear that these norms databases will be invalidated.

Resisting such huge technological strides would be a mistake. Letting go of our attachment to norms, and choosing to measure in absolutes rather than in relatives, may prove an essential pivot for the industry.

But our vision extends beyond the top of the pyramid above. Rather than calibrating new norms, we want to treat the analyst as a brand-new data point. We think the patterns uncovered in their data, in the actions they take and the effectiveness of their actions, can function as a feedback loop, anchoring the reliability of our data in real-world business decisions and their outcomes.