As humans, voice and listening are two of the oldest and most natural ways by which we express ourselves and our emotions and how we empathize with others. In fact, as we have begun to rely more on text and email as a means of communicating with one another, the cultural phenomenon of emojis exploded. These visual proxies are used to clarify our emotions and illustrate our desire to ensure that our true feelings are communicated. People have realized that when they are not using their voice to communicate, their intent and mood can be lost in the absence of emotional tones, inflections, and other clues within the voice.
If you’re not sure about the importance of voice, think about many times have you read an email or text and misinterpreted the intent.
The study of detecting and interpreting the emotions in our speech is called Speech Emotion Recognition, or SER. It is a field of study only a few decades old but is rapidly growing due to breakthroughs in technology and our understanding of emotions in decision-making. SER has applications in health, education, security, and entertainment.
In market research, SER allows researchers to better understand participants’ emotions through analysis of voice recordings. By measuring how the voice changes in the articulatory process, we are now able to decode the acoustic signals and assign discrete emotions to specific words or phrases that are being uttered.
Facial expressions and biometrics like pulse and skin temperature are also viable options for analysis, but speech offers several unique advantages: (1) ease of data collection; (2) ability to integrate with machine learning, and algorithms; and (3) the ability to analyze the data in combination with other qualitative methodologies to create rich data interpretations.
Deep Listening
Acoustic features such as pitch, energy, timing, voice quality, and articulation have a high correlation with underlying emotions. At inVibe Labs, we use validated algorithms to process the voice data and to classify these features into broader categories (e.g. activation, dominance, valence, etc.). Our expert analysts use this data to identify patterns and translate the signals into meaningful emotional insights brands can use to better connect with customers.
The key SER metrics we examine in our analysis are Activation, Dominance, and Valence. We listen to see if people are interested (Activation), confident in their perspective (Dominance) and whether they are negative or positive (Valence). While each of these metrics is important in isolation, their relationship with one another is key in helping our experts recognize emotional signals.
In our upcoming posts, we will go deeper into Deep Listening and explore the importance (and art) of question design.
Contact us to learn how inVibe is helping life science brands unlock customer insights and transform voice into value with spoken language analysis.
We’re always ready to listen.