AI-Based Fault Tolerant Qualitative Research

By David M. Schneer, Ph.D./CEO

4-Minute Read

Imagine if you could conduct an interview and observe when the respondent’s words do not match their body language.

Now, imagine that, at the end of the interview, you could feed the video into an AI-based software program that would also highlight verbal-physical inconsistencies.

What would that mean?

It would be like the notion of fault-tolerance in the computing sector. Webster defines the term as, “Relating to or being a computer or program with a self-contained backup system that allows continued operation when major components fail.” [1] Or, in finance parlance, the term “Belt and Braces” was used in early 19th Century Britain to describe an ultra-conservative approach to a deal. [2]

Image Credit: Mark Sardella

Of course, the term could apply to any system—even mechanical. But now we think the term can also stand next to qualitative research.

Well, you don’t have to imagine this scenario anymore. At Merrill Research, we have conducted the first fault-tolerant qualitative research study using a highly trained nonverbal intelligence expert (me) backed up by a highly tuned AI-based analytical tool (LightBulb AI).

And so, we put on our belts and suspenders for this study.

And what did we learn?

Of course, given that our research is custom I cannot offer specific details about the project, but this much I can tell you. I observed concrete instances where respondents’ words did not match their facial expressions and the AI-based software was tracking right along with me, second by second.  At the end of the interview, we had a complete longitudinal record of the respondent’s facial reactions complete with verbatim transcripts. As such, we were able to pinpoint which questions elicited the most engagement, joy, and surprise, or, on the contrary, contempt, disgust, fear, and anger. Neutral facial expressions were also measured.

This is how we did it.

Respondents in this study agreed to be digitally recorded, whether virtually or in person. We conducted 18, 60-minute in-depth interviews, six via Zoom video and 12 in-person interviews, six in a facility on the East Coast and six in a facility on the West Coast. These respondents were technical professionals. Respondents in this study agreed to be digitally recorded, whether virtually or in person. All video and in-person interviews were conducted by me. Professional videographers were hired to record in 4K video. In both venues, the videographers shot from the observation rooms to mitigate intrusion.

Once digitally recorded, the raw video was fed into an AI-based software program designed to detect the slightest emotion: joy (happiness), surprise, contempt, fear, anger, sadness, disgust and, neutral.

The software also derived an overall engagement measure.

While all these emotions were measured, some (fear, sadness, anger, and disgust) barely registered, if at all. This made sense, given the context of the conversation.

Would the respondents react emotionally like other consumers, or would they live up to their reputation as being linear, rational, and objective?

Spoiler alert. Our research proved undoubtedly that these individuals emoted just like the rest of us, although that did not come through in their verbal comments.

I, as well as the AI-based software, was looking for the seven universal human micro-expressions as defined below.

Joy/HappinessAs measured by a symmetrical smile with crow’s feet engagement. Usually accompanied by surprise.
SurpriseAs measured by a quick rise in eyebrows, and eyelids with mouth agape. An indication that one has been caught off guard, either positively or negatively. Often accompanied by Joy/Happiness
ContemptAs measured by an asymmetrical smile. Those showing contempt may either feel negatively toward stimulus, or contempt can be a feeling of superiority.
DisgustAs measured by a “crinkled” nose and mouth agape. Intense dislike.
AngerAs measured by eyebrows together and down with lip compression.
FearAs measured by a quick but subtle rise in eyebrows and eyelids.
SadnessAs measured by the eyebrows moving up and in with a pout or pressed lips.
NeutralThe absence of the above-mentioned emotions.

Next, we combined these emotions into three main categories: engagement, likability, and dislike.

Engagement comprised focus or interest and was measured on a 100-point scale. Any time engagement dipped below 80%—this was cause for further evaluation.

Next, we combined happiness or surprise on a 100-point likability scale, and the higher the score the better. Any time emotion spiked above 40% it was considered for further review.

We measured dislike as defined by contempt, disgust, fear, and sadness. Here, the lower the score the better, and anytime dislike rose about 40%, it was considered for further evaluation.

And finally, much like me, the AI software was trained to identify contradictions between speech and expression.

This is what the output looked like (again, redacted for privacy):

So, what did we learn?

  1. For the most part, respondents exhibited a very high level of engagement throughout the interview. This made sense, given that the questions we asked were provocative.
  2. We observed a low level of dislikes (negative emotions). Again, this makes sense since our topics, while provocative, were not incendiary or controversial.
  3. On those questions where I noticed inconsistencies, the AI software corroborated my observations. We were able to conclude that indeed some of the answers we were getting were contradictory to what respondents’ emotions were showing.
  4. We matched minute-by-minute, longitudinal verbatim quotations with observed emotional states. This allowed us to pinpoint the cause of the emotion and add quotations with context.
  5. The AI-based software was far more accurate than me. I expected this because not only did I have to interpret the answers, I had to ask the questions. And probe. Not so for the AI software.
  6. The AI software was highly accurate in identifying contradictions.
  7. We did not see any so-called AI “hallucinations”, but we did see some aberrations. We registered false negatives not because of the software mistakes but rather study design. For example, we noticed considerable drops in engagement when respondents looked away from the camera.

We believe that this is the future of qualitative research—highly trained moderators backed up by AI makes for mighty powerful insights with confidence.

Need help with designing AI-based qualitative research? Let Merrill Research guide you through your next research study!