inVibe logo

Responsible AI: Validating LLM Outputs

By Christopher Farina, PhD

Thu Dec 19 2024

Early Adopters

At inVibe, we’ve been working with AI since our founding in 2013, and we were among the first market research agencies to adopt large language models (LLMs) back in 2021, and we released our LLM tools to clients in 2023. AI isn’t a gimmick or a buzzword for us, it’s central to our founding mission: To develop technology that provides cost-effective, high-speed healthcare market research while maintaining the highest quality.

As AI takes a center stage in healthcare with the release of popular LLMs like ChatGPT and Claude, we’ve continued to integrate the latest AI models, features, and functionality into more of the work we do at inVibe. Working with AI every day, we quickly started to see the cracks and have prioritized finding ways to ensure the accuracy and integrity of AI-generated content.

Responsible Innovators

To ensure our clients have reliable AI insights, we designed and implemented an evaluation system to validate our AI tools. Our language experts and prompt engineers leverage this system to iteratively test the quality and reliability of the outputs. By testing different prompts and LLMs in this system, we’re able to ensure that our AI tools meet our clients’ needs and our high standards.

Practically speaking, we do this by having two trained language experts compare two outputs presented next to one another on one screen. Once we review the two outputs, we score each across 6 categories that are based on best practices for evaluating the quality of language data and conversational LLMs.

  • Completeness: To what extent does the output address everything it was asked to?
  • Accuracy: How closely does the output align with our understanding of the voice data? (Rating: 1-4)
  • Integrity: How clear is the relationship between the findings and the verbatim evidence in the output? (Rating: 1-4)
  • Organization: To what extent is the output presently coherently and in the prescribed way? (Rating: 1-4)
  • Formatting: Does the output include consistent formatting that enhances readability? (Rating: 1-2)
  • Citation: Are findings in the output supported by citations in the prescribed format? (Rating: 1-2)

Let’s take a look at one of these comparisons in action. In the below image, we’re seeing a quality test of two different LLMs that were prompted to report the key findings from the same set of voice data. After reviewing both outputs, the language expert scrolls down to where they assign scores to each output and select a ‘winner.’

Once we have a significant number of ratings for what we’re assessing, we compare the total scores, sub-scores, and selected winners to determine an overall winner (i.e., the model that performed better). This overall winner becomes our new baseline for this task in the next round of testing and becomes the default way that our AI tools perform this operation in our dashboard—until the next upgrade at least.

Through many iterations of this testing, we’ve improved the quality of our models by 46.73% over the last 12 months (and we expect to continue this pace in the next 12 months)! This means that our clients can rest assured that our AI tools will remain best-in-class: offering comprehensive, accurate, transparent, and comprehensible outputs tuned to market research best practices for qualitative voice data.

Service Partners

Are you interested in learning more about how our AI tools can make your qualitative research simpler, more systematic, and more scalable? Schedule a demo with us today and see for yourself!

Thanks for reading!

Be sure to subscribe to stay up to date on the latest news & research coming from the experts at inVibe Labs.

Recently Published

How we ask: How inVibe writes voice response questions

By Tripp Maloney

Thu Dec 12 2024

In Support of Crohn's & Colitis Awareness Week (Dec. 1-7)

By Christopher Farina, PhD

Wed Dec 04 2024

In Recognition of National Epilepsy Awareness Month (November)

By Beth Baldys

Tue Nov 19 2024

/voice

  1. Why Voice
  2. Sociolinguistic Analysis
  3. Speech Emotion Recognition
  4. Actionable Insights
  5. Whitepapers
  6. The Patient Voice

@social

2024 inVibe - All Rights Reserved