People are terrible at predicting what they will do in the future.
“What people do, what people say, and what people say they do are entirely different things.” — Anthropologist Margaret Mead
People create narratives about themselves when attempting to predict their future behavior. They construct a version of the person they want to be or feel like they should be — and the result is an optimistic bias for behaviors associated with strong intentions that typically don’t represent real-world actions.
More realistic self-predictions require that someone consider situational barriers and competing demands — factors that often conflict with their self-narrative. Think of New Year’s resolutions as an example. People plan a path forward where they will eat healthy foods or exercise regularly, but those resolutions frequently aren’t actualized. Humans are not rational agents when it comes to decision-making, they are emotional. Implicit factors drive the vast majority of human thought, as well as action.
And yet…
Businesses rely on self-prediction anyway.
As many as 90% of companies use stated intentions in their market research as a key metric when evaluating the expected effectiveness of a marketing campaign. Stated intentions are a form of self-reported data produced when individuals are asked to predict what future actions they are likely to take.
Why the disconnect? Here’s what we think:
- It’s much easier to rely on an age-old fallacy than to come up with something new.
- Self-prediction questions are all around us, causing people to believe they must surely be a useful and accurate measure (think of how many net promoter scores you have been asked to give this month).
- There have been, up until recently, relatively few tools to help companies gain insights in a way that challenges these conventions.
Enter Bioacoustics.
Bioacoustics is a branch of science concerned with the production of sound by living organisms and its effect and influence on others around them.
Human communication is a complex process; it doesn’t only depend on the words that we use to say things (though these are very important) but also takes into account many factors related to the way words are expressed. There is an incredible wealth of data encoded in how we speak. We know that infants are capable of understanding the tone and pitch of a parent’s voice well before they can talk or even comprehend language.
It’s not that we haven’t been listening to people talk; various research methods use verbal communication. What has kept this crucial aspect of communication out of market research to this point is largely a result of how difficult it is to measure and quantify the emotional data in a manner beyond what a listener can sense when they hear what is being said.
When it comes time to make a business decision, it is very difficult for the person who conducted (or listened in on) the research interviews to make a case for something that was not measured or transcribed as part of the research process. Verbal cues, like facial expressions or body language, are therefore treated as anecdotal data points, discernible only by the most intuitive of moderators whose keen sense of human expressive nuance has been sharpened by years of practice. Ultimately, however, this form of data cannot be quantified, and therefore it cannot be used to inform critical decisions.
This paradigm is ready to change. The instruments for analyzing acoustics in a speaker’s voice are becoming more sophisticated, and for the first time, we can bring emotional analysis into research in unprecedented ways.
Science + Technology = New Solutions for Researchers
In health care, speech emotion recognition (SER) has already been established as a viable and reliable tool for diagnosing certain neurological conditions such as Parkinson’s disease, depression and autism due to subtle changes in voice quality, intensity, and pitch that are detectable at early stages of the disease.
inVibe, a company specialized in simulating phone interviews through a fully-automated research engine, has been experimenting with integrating machine learning systems to measure and analyze the acoustic signals in a respondent’s voice, in real-time. Now, after 12 months of beta testing, the inVibe insights platform is being made available to the world. It works by processing the waveform produced by a respondent’s voice, providing a visual analysis based purely on acoustic signals rather than the specific words being used. In other words, it’s not analyzing what was said, but how it was said.
The algorithm we currently use to label responses combines three individual metrics of emotion:
- Temper — a speaker’s overarching mood range. Temper ranges from gloomy or depressed to confrontational or aggressive.
- Valence — a speaker’s overall sentiment. Valence ranges from negative to positive.
- Arousal — a speaker’s energy level, activation, and stimulation. Arousal ranges from tranquil or bored to alert or excited.
We will be sharing more information about the algorithm, the training data, and the painstaking process that we undertook to validate the output in a future article.
Speech Emotion Analytics and Predicting Future Behavior
The feedback we have received when we demo our voice analytics technology has ranged from “Holy cow!” to “What the hell?” So, to help companies understand how they can use this new layer of data to support their marketing efforts, we have decided to present a case study that illustrates how one client used it to solve a challenge they were facing.
Client Challenge:
A pharmaceutical company needed to motivate patients to seek treatment for a highly stigmatized condition.
Background:
For over a year, the pharmaceutical company had been running a campaign to promote a brand to consumers that dealt with a particularly embarrassing and stigmatized health condition — something people found difficult to discuss with close friends, let alone medical professionals. However, despite positive feedback from their pre-launch market research studies, the advertising campaigns had failed to achieve the desired objective (triggering a patient-doctor dialogue) in the real-world.
Project:
A new set of ad concepts was in the works, and the company needed to identify what, if any, creative concepts and messages would drive patients to have a dialogue with their doctors.
Although they had already conducted traditional market research with another vendor, their previous experience made them reluctant to move forward with the concept that had received the highest scores. So, they decided that they would validate these findings with an alternate form of research.
Having heard about inVibe’s unique methodology that utilized virtual interviews with speech emotion analytics, they decided to reach out.
inVibe Approach
Since successful ads are typically measured by their ability to motivate individuals to take action, we focused our analysis on arousal as a key measure for predicting the effectiveness of the ads being tested. Because action requires energy, a change in behavior requires not only a trigger but the motivation to act. In previous studies, we had seen that the concepts that garnered the highest arousal scores from respondents in testing were typically the most likely to initiate an action that required physical energy in the real-world. This premise has been studied in great detail.
See:
- Arousal Increases Social Transmission of Information Jonah Berger
- Automatic Multi-lingual Arousal Detection from Voice Applied to Real Product Testing Applications Eyben et al.
The Study:
Part 1: Quantitative Survey
The first part of this study consisted of conducting a traditional survey that asked participants to rank and rate concepts they were shown. The results were consistent with the findings from the previous research, and, as is often the case, there were marginal differences between the bottom and top performing ads.
Part 2: inVibe Virtual Interviews
In the next part of the study, we asked participants to answer questions about the different concepts by recording their responses during a virtual interview conducted through inVibe’s fully-automated research platform. Once again, their stated responses (what they told us) matched the responses from the first part of the survey.
Part 3: inVibe Speech Emotion Analytics
Finally, the voice responses were analyzed for emotional markers. The arousal scores we extracted by running the recorded audio through our machine learning algorithm, which revealed something unique and exciting. The concept that was the most risqué (one that had produced average overall scores via traditional market research measures) delivered an arousal score that was nearly double that of the other concepts.
The bioacoustics data was so clear and compelling that the company decided to move forward with the concept that received the highest arousal scores.
Six Months Later
We recently heard from the client, and they provided us with an update of how their campaign has been performing. They were delighted to share that early reports are showing an increase in prescriptions for a patient population that had previously seemed unmovable. Success!
What’s Next?
We are very excited about the early promise that speech emotion recognition shows in the field of market research. As our body of data grows, so will our ability to further hone in on which signals matter most, and when.
In the meantime, we welcome all brands out there to put our platform to the test:
If you are dealing with a challenging situation that traditional market research hasn’t been able to solve, call us—we are looking for the hardest cases out there.
We believe it’s time that companies start doing more than just ask people what they think, and start measuring how they feel. Now that there’s a way to do it, why not try it out?