Ethan Elenberg

Ethan Elenberg, PhD is a Research Scientist at ASAPP. Before joining ASAPP, he completed his PhD at UT Austin under the supervision of Alex Dimakis and Sriram Vishwanath. His research interests include optimization, metric learning, and interpretable AI. Ethan has a BE in Electrical Engineering from The Cooper Union and has worked at Twitter, Apple, and the MIT Lincoln Lab.

How model calibration leads to better automation

Ethan Elenberg

Machine learning models offer a powerful way to predict properties of incoming messages such as sentiment, language, and intent based on previous examples. We can evaluate a model’s performance in multiple ways:

Classification error measures how often its predictions are correct.

Calibration error measures how closely the model’s confidence scores match the percentage of time the model is correct.

For example, if a model is correct 95% of the time, we’d say its classification error is 5%. If the same model always reports it is 99% sure its answers are correct, then its calibration error would be 4%. Together, these metrics help determine whether a model is accurate, inaccurate, overconfident, or underconfident.

Reducing both classification error and calibration error over time is crucial for integration into human workflows. It enables us to maximize customer impact in an iterative manner. For example, well-calibrated models can trigger mature platform features to use more automation only when they have a high chance of succeeding. Furthermore, proper calibration creates an intuitive scale on which to compare multiple models, so that the overall system always utilizes the most confident prediction and that this confidence matches how well the system will actually perform.

Automating workflows requires a high degree of confidence in the ML model predicting that this is what is needed. Proper calibration increases that confidence.

Ethan Elenberg, PhD

The models developed by ASAPP provide value to our customers not from raw predictions alone, but rather from how those predictions are incorporated into platform features for our users. Therefore, we take several steps throughout model development to understand, measure, and improve the accuracy of confidence measures.

For example, consider the difference between predicting 95% chance of rain versus 55% chance of rain. A meteorologist would recommend that viewers take an umbrella with them in the former case but might not in the latter case. This weather prediction analogy fits many of the ASAPP models used in intent classification and knowledge-base retrieval. If a model predicts “PAYBILL” with score 0.95, we can send the customer to the “Pay my bill” automated workflow with a high degree of confidence that this will serve their need. If the score is 0.55, we might want to disambiguate with the customer whether they wanted to “pay their bill” or do something else.

ASAPP—Uncalibrated model is overconfident while calibrated model accurated reflects the customer's intent

Following our intuition, we would like a model to return 0.95 when it is 95% accurate and 0.55 when it is only 55% accurate. Calibration enables us to achieve this alignment. Throughout model development, we track the mismatch between a model’s score and its empirical accuracy with a metric called expected calibration error (ECE). ASAPP models are designed with a method called temperature scaling, which adjusts their raw scores. This changes the average confidence level in a way that reduces calibration error while maintaining prediction accuracy. The results can be significant: For example, one of our temperature scaled models was shown to have 85% lower ECE than the original model.

When ASAPP incorporates AI technology into its products, we use model calibration as one of our main design criteria. This ensures that multiple machine learning models work together to create the best automated experience for our customers.

In Summary

Machine learning models can be either overconfident or underconfident in their predictions. The intent classification models developed by ASAPP are calibrated so that prediction scores match their expected accuracy—and deliver a high level of value to ASAPP customers.

Get Started

AI Services Value Calculator

Estimate your cost savings

Calculate Savings

Request a Demo

Get Started

Recently Published

Browse Blog

ASAPP’s AI-powered redaction safeguards sensitive data in real time, ensuring privacy, compliance, and trust across all customer interactions.

AI is leading, but humans play a critical role. Explore how human-in-the-loop is reshaping customer service with smarter, safer automation.

AI for CX: Augmentation or Automation? Learn how to balance your investments and maximize ROI in your contact center.

Protecting customer interactions from AI risks requires more than rule-based filtering. Learn how a layered defense approach helps prevent prompt injection.

Make sense of AI vendor claims. Learn the key questions CX leaders should ask to evaluate generative AI agents and find the right fit for their contact center.

CX leaders from ASAPP & Broad River Retail share 5 key steps to adopting generative AI—from journey mapping to human-AI collaboration. Start now. #CX #AI

Ensure CX security with rigorous penetration testing. Here's how ASAPP safeguards customer data with top experts, proactive threat detection & AI-specific protections.

The contact center tech stack is rapidly evolving, and conversation intelligence solutions are playing a critical role in improving customer experience (CX). Forrester’s The Conversation Intelligence Solutions for Contact Centers Landscape, Q1 2025 report provides a comprehensive look at 23 vendors in this space—including ASAPP.

Autonomous AI agents in contact centers offer 24/7 service, but not all solutions are equal. Learn how to identify true AI agents that balance safety, accuracy, and flexibility.

See how contact centers can redefine 'human in the loop' to improve AI collaboration, resolve issues faster, and enhance customer experiences.

Discover how acting now on generative AI can transform customer support. Don’t wait for perfect conditions—start your CX journey today.

Explore JetBlue's approach to balancing AI, costs, and culture in their contact centers. Learn how AI like Amelia 2.0 enhances efficiency and agent engagement.

Browse Blog

Transform your enterprise with generative AI • Optimize and grow your CX •