Blog
Introducing CLIP: A Dataset to Improve Continuity of Patient Care with Unsupervised NLP
Continuity of care is crucial to ensuring positive health outcomes for patients, especially in the transition from acute hospital care to out-patient primary care. However, information sharing across these settings is often imperfect.
Hospital discharge notes alone easily top thousands of words and are structured with billing and compliance in mind, rather than the reader, making poring over these documents for important pending actions especially difficult. Compounding this issue, primary care physicians (PCPs) already are short on time—receiving dozens of emails, phone calls, imaging, and lab reports per day (Baron 2010). Lost in this sea of hospital notes and time constraints are important actions for improving patient care. This can cause errors and complications for both patients and primary care physicians.
Thus, in order to improve the continuity of patient care, we are releasing one of the largest annotated datasets for clinical NLP. Our dataset, which we call CLIP, for CLInical Follow-uP, makes the task of action item extraction tractable, by enabling us to train machine learning models to select the sentences in a document that contain action items.

By leveraging modern methods in unsupervised NLP, we can automatically highlight action items from hospital discharge notes and action items for primary care physicians–saving them time and reducing the risk that they miss critical information.
James Mullenbach
We view the automatic extraction of required follow-up action items from hospital discharge notes as a way to enable more efficient note review and performance for caregivers. In alignment with the ASAPP mission to augment human activity by advancing AI, this dataset and task provide an exciting test ground for unsupervised learning in NLP. By automatically surfacing relevant historical data to improve communication, this work represents another key way ASAPP is improving human augmentation with AI. In our ACL 2021-accepted paper, we demonstrate this with a new algorithm.
The CLIP Dataset
Our dataset is built upon MIMIC-III (Johnson et al., 2016), a large, de-identified, and open-access dataset from the Beth Israel Deaconess Medical Center in Boston, which is the foundation of much fruitful work in clinical machine learning and NLP. From this dataset, with the help of a team of physicians, we labeled each sentence in 718 full discharge summaries, specifying whether the sentence contained a follow-up action item. We also annotated 7 types to further classify action items by the type of action needed; for example, scheduling an appointment, following a new medication prescription, or reviewing pending laboratory results. This dataset, comprising over 100,000 annotated sentences, is one of the largest open-access annotated clinical NLP datasets to our knowledge, and we hope it can spur further research in this area.
How well does machine learning accomplish this task? In our paper we approach the task as sentence classification, individually labeling each sentence in a document with its followup types, or “No followup”. We evaluated several common machine learning benchmarks on the task, adding some tweaks to better suit the task, such as including more than one sentence as input. We find that the best models, based on the popular transformer-based model BERT, provide a 30% improvement in F1 score, relative to the linear model baseline. The best models achieve an F1 score around 0.87, close to the human performance benchmark of 0.93.
Model pre-training for healthcare applications
We found that an important factor in developing effective BERT-based models was pre-training them on appropriate data. Pre-training exposes models to large amounts of unlabeled data, and serves as a way for large neural network models to learn how to represent the general features of language, like proper word ordering and which words often appear in similar contexts. Models that were pre-trained only on generic data from books or the web may not have enough knowledge on how language is used specifically in healthcare settings. We found that BERT models pre-trained on MIMIC-III discharge notes outperformed the general-purpose BERT models.
For clinical data, we may want to take this focused pre-training idea a step further. Pre-training is often the most costly step of model development due to the large amount of data used. But, can we reduce the amount of data needed, by selecting data that is highly relevant to our end task? In healthcare settings, with private data and less computational resources, this would make automating action item extraction more accessible. In our paper, we describe a method we call task-targeted pre-training (TTP) that builds datasets for pre-training by selecting sentences that look the most like those in our annotated data that do contain action items. We find that it’s possible, and maybe even advantageous, to select data for pre-training in this way, saving time and computational resources while maintaining model performance.
Improving physician performance and reducing cognitive load
Ultimately, our end goal is to make physicians’ jobs easier by reducing the administrative burden of reading long hospital notes, and bring their time and focus back where it belongs: on the patient. Our methods can condense notes down to what a PCP really needs to know, reducing note size by at least 80% while keeping important action items readily available. This reduction in “information overload” can reduce physicians’ likelihood of missing important information (Singh et al., 2013), improving their accuracy and the well-being of their patients. Through a simple user interface, these models could enable a telemedicine professional to more quickly and effectively aid a patient that recently visited the hospital.
Read more and access the data
Our goal with open sourcing CLIP is to enable lots more future work in this area of summarizing clinical notes and reducing physician workload, with our approach serving as a first step. We anticipate that further efforts to incorporate the full document into model decisions, exploit sentence-level label dependencies, or inject domain knowledge will be fruitful. To learn more, visit our poster session at ACL occurring Monday, Aug. 2, 11:00 a.m.—1:00 p.m. ET.
Paper
CLIP Dataset
Code Repository
Citations
Richard J. Baron. 2010. What’s keeping us so busy in primary care? a snapshot from one practice. The New England Journal of Medicine, 362 17:1632–6.
Hardeep Singh, Christiane Spitzmueller, Nancy J. Petersen, Mona K. Sawhney, and Dean F. Sittig. 2013. Information overload and missed test results in electronic health record-based settings. JAMA Internal Medicine, 173 8:702–4.
Alistair E. W. Johnson, Tom J. Pollard, Lu Shen, Li wei H. Lehman, Mengling Feng, Mohammad M. Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. 2016. Mimic-III, a freely accessible critical care database. In Scientific Data
Why I joined ASAPP: Taking AI to new levels in enterprise solutions
I have spent the past 20 years working in natural language processing and machine learning. My first project involved automatically summarizing news for mobile phones. The system was sophisticated for its time, but it amounted to a number of brittle heuristics and rules. Fast forward two decades and techniques in natural language processing and machine learning have become so powerful that we use them every day—often without realizing it.
After finishing my studies, I spent the bulk of these 20 years at Google Research. I was amazed at how machine learning went from a promising tool to one that dominates almost every consumer service. At first, progress was slow. A classifier here or there in some peripheral system. Then, progress came faster, machine learning became a first class citizen. Finally, end-to-end learning started to replace whole ecosystems that a mere 10 years before were largely based on graphs, simple statistics and rules-based systems.
After working almost exclusively on consumer facing technologies. I started shifting my interests towards enterprise. There were so many interesting challenges that arose in this space. The complexity of needs, the heterogeneity of data and often the lack of clean, large-scale training sets that are critical to machine learning and natural language processing. However, there were properties that made enterprise tractable. While the complexity of tasks was high, the set of tasks any specific enterprise engaged in was finite and manageable. The users of enterprise technology are often domain experts and can be trained. Most importantly, these consumers of enterprise technology were excited to interact with artificial intelligence in new ways— if it could deliver on its promise to improve the quality and efficiency of their efforts.
This led me to ASAPP.
I am firm in my belief that to take enterprise AI to the next level a holistic approach is required. Companies must focus on challenges with systemic inefficiencies and develop solutions that combine domain expertise, machine learning, data science and user experience (UX) in order to elevate the work of practitioners. The goal is to improve and augment sub-tasks that computers can solve with high precision in order to enable experts to spend more time on more complex tasks. The core mission of ASAPPis exactly in line with this, specifically directed towards customer service, sales and support.

To take enterprise AI—and customer experience—to the next level a holistic approach is required.
Ryan McDonald, PhD
The customer experience is ripe for AI to elevate to the next level. Everyone has experienced bad customer service, but also amazing customer service. How do we understand choices that the best agents make? How do we recognize opportunities where AI can automate routine and monotonous tasks? Can AI help automate non deterministic tasks? How can AI improve the agent experience leading to less burn out, lower turnover and higher job satisfaction? This is in an industry that employs three million people in the United States alone but suffers from an average of 40 percent attrition—one of the highest rates of any industry.
ASAPP is focusing its efforts singularly on the Customer Experience and there are enough challenges here to last a lifetime. But, ASAPP also recognizes that this is the first major step on a longer journey. This is evident in the amazing research group that ASAPP has put together. They are not just AI in name, but also in practice. Our research group consists of machine learning and language technology leaders, many of whom publish multiple times a year. We also have some of the best advisors in the industry from universities like Cornell and MIT. This excites me about ASAPP. It is the perfect combination of challenges and commitment to advanced research that is needed in order to significantly move the needle in customer experience. I’m excited for our team and this journey.
To realize Forrester’s vision of conversational intelligence, a human focus is needed.
For the CX industry, success always relied on an ability to deliver high-quality customer interactions at scale. The availability of omnichannel opened up new, convenient, avenues for customers to engage with organizations, yet it also increased the volume of interactions needing resolution. But thanks to modern advances in AI research, conversational and speech intelligence is having a new renaissance moment to improve CX revenue and efficiency at this rising scale.
As proof of this trend, Forrester Research released their new Q2 2021 report, “Now Tech: Conversation Intelligence” which names ASAPP among the leading conversation intelligence providers. The report guides forward-looking organizations to harness conversational intelligence in three key areas:
- Delivering CX insights at scale.
- Solutions which help organizations understand the voice of the customer and the agent at every interaction.
- Improving CX behavior at scale.
- Solutions which monitor and guide agents on what to say, actions to take, or areas to coach an agent.
- Accelerating revenue.
- Solutions which give sales teams insights they need to drive a greater volume of better leads and to ensure they are acted upon.
In looking at these areas, it’s no surprise that organizations like American Airlines and DISH are turning to ASAPP for real-time insights that empower customer and sales agents to achieve peak performance. At ASAPP, we believe that intelligence is best deployed at the point of where it matters: in real time where the interactions occur. It’s why we’re committed to advancing true AI which is redefining automation in contact center operations to triple throughput, increase digital adoption and transform contact center operations with lower operational costs.

Real-time insights from a continuously learning system improve a company’s ability to deliver highly personalized customer experiences—and substantially improve efficiency at the same time.
Macario Namie
This real time conversational Intelligence replaces yesterday’s rules-based systems by capitalizing on the insights of your agents and customers. A rules-based system, whether it feeds chatbots or humans, only captures a fraction of the available knowledge and doesn’t take advantage of the lessons learned from today’s data pools. Rigid rules-based systems aren’t flexible or generalizable for diverse customer needs, as no rules-based system will deliver customized, real-time intelligence that equip agents in what to say the moment actions occur.
It’s time for us to harness conversational intelligence that applies the knowledge of agents at scale. CX leaders who utilize a combination of conversational intelligence and automation understand how this leads to better customer voice and digital experiences that increase Customer Satisfaction (CSAT) and Net Promoter Scores (NPS). It’s why organizations that deploy ASAPP see an exponential improvement in performance that delivers measurable results in less than 60 days.
That’s all to say that we’re proud to see further recognition of ASAPP’s value in conversational intelligence. The Forrester Research report builds on our distinction as a “Cool Vendor” by Gartner. How are you thinking of using conversation intelligence at your organization?
Read the full report by Forrester Research for more details.
See the press release here.
Why AHT isn’t the right measure in an asynchronous and multi-channel world
Operations teams have been using agent handle time (AHT) to measure agent efficiency, manage workforce, and plan operation budgets for decades. However, customers have been increasingly demonstrating they’d prefer to communicate asynchronously—meaning they can interact with agents when it is convenient for them, taking a pause in the conversation and seamlessly resuming minutes or hours later, as they do when multitasking, handling interruptions, and messaging with family and friends.
In this new asynchronous environment, AHT is an inappropriate measure of how long it takes agents to handle a customer’s issue: it overstates the amount of time an agent spends working with a customer. Rather, we consider agent throughput as a better measure of agent efficiency. Throughput is the number of issues an agent handles over some period of time (e.g. 10 issues per hour) and is a better metric for operations planning.
One common strategy for increasing throughput is to merely give agents more issues to handle at once, which we call concurrency. However, attempts to increase throughput by simply increasing an agent’s concurrency without giving them better tools to handle multiple issues at once are short-sighted. Issues that escalate to agents are complex and require significant cognitive load, as “easier” issues have typically already been automated.
Therefore, naively increasing agent concurrency without cognitive load consideration often results in adverse effects on agent throughput, frustrated customers who want faster response times, and agents who burn out quickly.
The ASAPP solution to this is to use an AI-powered flexible concurrency model. A machine learning model measures and forecasts the cognitive demand on agents and dynamically increases concurrency in an effective way. This model considers several factors including customer behaviors, the complexities of issues, and expected work required to resolve the issue to determine an agent’s concurrency capacity at a given point in time.
We’re able to increase throughput by reducing demands on the agent’s time and cognitive load, resulting in agents more efficiently handling conversations, while elevating the customer experience.
Measuring throughput
In equation form, throughput is the inverse of agent handle time (AHT) multiplied by the number of issues an agent can concurrently handle at once.

For example, if it on average takes an agent half an hour to handle an issue, and she handles two issues concurrently, then her throughput would be 4 issues per hour.

The equation shows two obvious ways to increase throughput:
- Reduce the time it takes to handle each individual issue (reduce the AHT); and
- Increase the number of issues an agent can concurrently handle.
At ASAPP, we think about these two approaches to increasing throughput, particularly as customers move to adopt more asynchronous communication.

AHT as a metric is only applicable when the agent handles one contact at a time—and it’s completed end-to-end in one session. It doesn’t take into account concurrent digital interactions, nor asynchronous interactions.
Heather Reed, PhD
Reducing AHT
The first piece of the throughput-maximization problem entails identifying, quantifying, and reducing the time and effort required for agents to perform the tasks to solve a customer issue.
We think of the total work performed by an agent as both a function of the cognitive load (CL) and the time required to perform a task. This definition of work is analogous to the definition of work in physics, where Work = (Load applied to an object) X (Distance to move the object).
The agents’ cognitive load during the conversations (visualized by the height of the black curve and the intensity of the green bar) are affected by:
- crafting messages to the customer;
- looking up external information for the customer;
- performing work on behalf of the customer;
- context switching among multiple customers; etc.
The total work performed is the area under the curve, which can be reduced by decreasing the effort (CL) and time to perform tasks. We can compute the average across the interaction—a flat line—and in a synchronous environment, that can be very accurate.

ASAPP automation and agent augmentation features are designed to both reduce handling time and reduce the agents’ cognitive load—the amount of energy it takes to solve a customers’ problem or upsell a prospect. For example Autosuggest provides message recommendations that contain relevant customer information, saving agents the time and effort they would need to spend looking up information about customers (e.g. their bill amount) as well as the time spent physically crafting the message.
For synchronous conversations, that means each call is less tiring. For asynchronous conversations, that means agents can handle an increasing number of issues without corresponding increases in stress.
In some cases, we can completely eliminate the cognitive load from a part of a conversation. Our auto-pilot feature enables automation of entire portions of the interaction—for example, collecting customer’s device information, freeing up agents’ attention.

The result of use of multiple augmentation features during an issue is the reduction of overall AHT as well as reduction of work.
When the customer is asynchronous, the majority of the agent’s time would be spent waiting for the customer to respond. This is not an effective use of the agent’s time, which brings us to the second piece of the throughput-maximization problem.
Increasing concurrency
We can improve agent throughput by increasing concurrency. Unfortunately, this is more complex than simply increasing the number of issues assigned to an agent at once. Issues that escalate to agents are complex and emotive, as customers typically get basic needs met through self-service or automation. If an agent’s concurrency is increased without forecasting workload, then increasing concurrency will actually have an adverse effect on the AHT of individual issues.
If increasing concurrency results in increased AHT, then the impact on overall throughput can be negative. What’s more customers can become frustrated at the lack of response from the agent and bounce to other support channels, or worse—consider switching providers; and agents may feel overwhelmed and risk burning out or churning out
Flexible concurrency
We can alleviate this problem with flexible concurrency: an AI-driven approach to this problem. A machine learning model keeps track of the work the agent is doing, and dynamically increases an agent’s concurrency to keep the cognitive load manageable.
Combined with ASAPP augmentation features, our flexible concurrency model can safely increase an agent’s concurrency, enabling higher throughput and increased agent efficiency.


In summary
As customers increasingly prefer to interact asynchronously, AHT becomes less appropriate for operations planning. Throughput (the number of issues within a time period) is a better metric to measure agent efficiency and manage workforce and operations budgets. ASAPP AI-driven agent augmentation paired with a flexible concurrency model enables our customers to safely increase agent throughput while maintaining manageable agent workload—and still deliver an exceptional customer experience.
Gartner Recognizes ASAPP for Continuous Intelligence in CX
Every year Gartner scans the horizons for companies who offer technology or services that are innovative, impactful, or intriguing. Gartner analysts might ask themselves: What’s something that customers could not do before? What technical innovation is focused on producing business impact? Or what new technology or service appears to be addressing systemic challenges?
This year’s Gartner report naming ASAPP as a “Cool Vendor” affirms our efforts at the intersection of artificial intelligence (AI) and customer experience (CX). We entered into this $600 billion industry because we wanted to create real change—building machine learning products that augment and automate the world’s workflows—and address the most costly and painful parts of CX that are largely ignored today.
Despite billions of dollars spent on technology designed to keep customers away from speaking with agents—starting with IVRs a few decades ago and most recently, chatbots—the human agent is still there. And in record numbers. Most large B2C organizations have actually increased their agent population over the last several years. And it is these human agents, the ones who represent your brand to millions of customers, who have been most ignored by innovators.

By embracing automation—not as a replacement, but as augmentor—to human agents, the entire performance of sales and service contact centers is dramatically elevated.
Macario Namie
As ASAPP followers know well, this is why we exist. By embracing automation—not as a replacement, but as augmentor—to human agents, the entire performance of sales and service contact centers is dramatically elevated. Real-time continuous intelligence techniques are used to tell every agent the right thing to say and do, live during an interaction. The company benefits from radical increases in organizational productivity, while the customers get exactly what they want—the right answer in the fastest possible time.
We’re proud of the academic recognition ASAPP Research achieves for advancing the state of the art of automatic speech recognition (ASR), NLP, and Task-Oriented Dialogue. However, it’s the business results of this applied research that keeps ASAPP moving forward. We celebrate this Gartner recognition with our customers like American Airlines, Dish and JetBlue, who are seeing the business results of AI in their customer service.
So what makes a company applying artificial intelligence for customer experience a “Cool Vendor?” Well, check out the Gartner report. However, I would say it’s our exclusive focus on human performance within CX. Learn more by reading this year’s Gartner Cool Vendor report.
GARTNER DOES NOT ENDORSE ANY VENDOR, PRODUCT OR SERVICE DEPICTED IN ITS RESEARCH PUBLICATIONS, AND DOES NOT ADVISE TECHNOLOGY USERS TO SELECT ONLY THOSE VENDORS WITH THE HIGHEST RATINGS OR OTHER DESIGNATION. GARTNER RESEARCH PUBLICATIONS CONSIST OF THE OPINIONS OF GARTNER’S RESEARCH ORGANIZATION AND SHOULD NOT BE CONSTRUED AS STATEMENTS OF FACT. GARTNER DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, WITH RESPECT TO THIS RESEARCH, INCLUDING ANY WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Task-oriented dialogue systems could be better. Here’s a new dataset to help.
Dialogue State Tracking has run its course. Here’s why Action State Tracking and Cascading Dialogue Success is next.
For call center applications, dialogue state tracking (DST) has traditionally served as a way to determine what the user wants at that point in the dialogue. However, in actual industry use cases, the work of a call center agent is more complex than simply recognizing user intents.
In real world environments, agents are typically tasked with strenuous multitasking. Tasks often include reviewing knowledge base articles, evaluating guidelines in what can be said, examining dialogue history with a customer, and inspecting customer account details all at once. In fact, according to ASAPP internal research, call center phone agents spend approximately 82 percent of their total time looking at customer data, step-by-step guides, or knowledge base articles. Yet none of these aspects are accounted for in classical DST benchmarks. A more realistic environment would employ a dual-constraint where the agent needs to obey customer requests while considering company policies when taking actions.
That’s why, in order to improve the state of the art of task-oriented dialogue systems for customer service applications, we’re establishing a new Action-Based Conversations Dataset (ABCD). ABCD is a fully-labeled dataset with over 10k human-to-human dialogues containing 55 distinct user intents requiring unique sequences of actions constrained by company policies to achieve task success.
The major difference between ABCD and other datasets is that it asks the agent to adhere to a set of policies that call center agents often face, while simultaneously dealing with customer requests. With this dataset, we propose two new tasks: Action State Tracking (AST)—which keeps track of the state of the dialogue when we know that an action has taken place during that turn; and Cascading Dialogue Success (CDS)—a measure for the model’s ability to understand actions in context as a whole, which includes the context from other utterances.

The major difference between ABCD and other datasets is that it asks the agent to adhere to a set of policies that call center agents often face, while simultaneously dealing with customer requests.
Derek Chen
Dataset Characteristics
Unlike other large open-domain dialogue datasets often built for more general chatbot entertainment purposes, ABCD focuses deeper on increasing the count and diversity of actions and text within the domain of customer service. Dataset participants were additionally incentivized through financial bonuses when properly adhering to policy guidelines in handling customer requests, mimicking customer service environments and realistic agent behavior.
The training process to annotate the dataset, for example, at times felt like training for a real call center role. “I feel like I’m back at my previous job as a customer care agent in a call center,” said one MTurk agent who was involved in the study. “Now I feel ready to work at or interview for a real customer service role,” said another.
New Benchmarks
The novel features in ABCD challenges the industry to measure performance across two new dialogue tasks: Action State Tracking & Cascading Dialogue Success.
Action State Tracking (AST)
AST improves upon DST metrics by detecting the pertinent intent from customer utterances while also taking into account constraints from agent guidelines. Suppose a customer is entitled to a discount which will be offered by issuing a [Promo Code]. The customer might request 30% off, but the guidelines stipulate only 15% is permitted, which would make “30” a reasonable, but ultimately flawed slot-value. To measure a model’s ability to comprehend such nuanced situations, we adopt overall accuracy as the evaluation metric for AST.
Cascading Dialogue Success (CDS)
Since the appropriate action often depends on the situation, we propose the CDS task to measure a model’s ability to understand actions in context. Whereas AST assumes an action occurs in the current turn, the task of CDS includes first predicting the type of turn and its subsequent details. The types of turns are utterances, actions, and endings. When the turn is an utterance, the detail is to respond with the best sentence chosen from a list of possible sentences. When the turn is an action, the detail is to choose the appropriate slots and values. Finally, when the turn is an ending, the model should know to end the conversation. This score is calculated on every turn, and the model is evaluated based on the percent of remaining steps correctly predicted, averaged across all available turns.
Why This Matters
For customer service and call center applications, it is time for both the research community and industry to do better. Models relying on DST as a measure of success have little indication of performance in real world scenarios, and discerning CX leaders should look to other indicators grounded in the conditions that actual call center agents face.
Rather than relying on general datasets which expand upon an obtuse array of knowledge base lookup actions, ABCD presents a corpus for building more in-depth task-oriented dialogue systems. The availability of this dataset and two new tasks creates new opportunities for researchers to explore better, more reliable, models for task-oriented dialogue systems.
We can’t wait to see what the community creates from this dataset. Our contribution to the field with this dataset is another major step to improving machine learning models in customer service.
Read the Complete Paper, & Access the Dataset
This work has been accepted at NAACL 2021. Meet the authors on June 8th, 20:00—20:50 EST, where this work will be presented as a part of “Session 9A-Oral: Dialogue and Interactive Systems.”
Why a little increase in transcription accuracy is such a big deal
A lot has been written lately about the importance of accuracy in speech-to-text transcription. It’s the key to unlocking value from the phone calls between your agents and your customers. For technology evaluators, it’s increasingly difficult to cut through the accuracy rates being marketed by vendors—from the larger players like Google and Amazon to smaller niche providers. How do you determine the best transcription engine for your organization to unlock the value of transcription?
The reality is that there is no one speech transcription model to rule them all. How do we know? We tried them.
In our own testing some models performed extremely well in industry benchmarks. But then they failed to reproduce even close to the same results when put into production contact center environments.
Benchmarks like Librispeech use a standardized set of audio files which speech engineers optimize for on many different dimensions (vocabulary, audio type, accents, etc). This is why we see WERs in the <2% range. These models are now outperforming the human ear (4% WER) on the same data which is an incredible feat of engineering. Doing well on industry benchmarks is impressive—but what evaluators really need to know is how these models perform in their real-world environment.
What we’ve found in our own testing is that most off-the-shelf Automatic Speech Recognition (ASR) models struggle with different contact center telephony environments and the business specific terminology used within those conversations. Before ASAPP, many of our customers were able to get transcription live after months of integration and even utilized domain specific ASRs, but only saw accuracy rates in the area of 70%, nudging closer to 80% only in the most ideal conditions. That is certainly a notch above where it was 5 or 10 years ago, but most companies still don’t transcribe 100% of phone calls. Why? Because they don’t expect to get enough value to justify the cost.
So how much value is there in a higher real-world accuracy rate?

The words that are missed in the gap between 80% accuracy and 90% accuracy are often the ones that matter most. They’re the words that are specific to the business and are critical to unlocking value.
Austin Meyer
More than you might imagine. Words that are missed are often the most important ones—specific to the business and are critical to unlocking value. These would be things like:
- Company names (AT&T, Asurion, Airbnb)
- Product and promotion names (Galaxy S12, MLB League Pass, Marriott Bonvoy Card)
- People’s names, emails and addresses
- Long numbers such as serial numbers and account numbers
- Dollar amounts and and dates
To illustrate this point, let’s look at a sample of 10,000 hours of transcribed audio from a typical contact center. There are roughly 30,000 unique words within those transcripts, yet the system only needs to recognize 241 of the most frequently used words to get 80% accuracy. Those are largely words like “the”, “you”, “to”, “what”, and so on.
To get to 90% accuracy, the system needs to correctly transcribe the next 324 most frequently used words, and even more for every additional percent. These are often words that are unique to your business—the words that really matter.

Context also impacts accuracy and meaning. If someone says, “Which Galaxy is that?”, depending on the context, they could be talking about a Samsung phone or a collection of stars and planets. This context will often impact the spelling and capitalization of many important words.
Taking this even further, if someone says, “my Galaxy is broken”, but they don’t mention which model they have, anyone analyzing those transcripts to determine which phone models are problematic won’t know unless that transcript is tied to additional data about that customer. The effort of manuallying integrating transcripts to other datasets that contain important context dramatically increases the cost of getting value from transcription.
When accuracy doesn’t rise above 80% in production and critical context is missing, you get limited value from your data– nothing more than simple analytics like high level topic/intent categorization, maybe tone, basic keywords, and questionable sentiment scores. That’s not enough to significantly impact the customer experience or the bottom line.
It’s no wonder companies can’t justify transcribing 100% of their calls despite the fact that many of them know there is rich data there.
The key to mining the rich data that’s available in your customer conversations—and to getting real value from transcribing every word of every call is threefold:
- Make sure you have an ASR model that’s custom tuned and continuously adapts to the lexicon of your business.
- Connect your transcripts to as much contextual metadata as possible.
- Have readily accessible tools to analyze data and act on insights in ways that create significant impact for your business—both immediately and long term.
ASAPP answers that call. When you work with us you’ll get a custom ASR model trained on your data to transcribe conversation in real time, and improve with every call. Our AI-driven platform will deliver immediate business value through an array of CX-focused capabilities, fed by customer conversations and relevant data from other systems. Plus, it provides a wealth of voice of the customer data that can be used across your business. When you see your model in action and the tremendous value you get with our platform, it makes a lot more sense to transcribe every word of every call. Interested? Send us a note at ask@asapp.com and we’ll be happy to show you how it’s done.
You can't get this depth of VoC insight from a few surveys
An urgent case to support contact center agents with AI built for them
A colleague describes customer service as the heartbeat of a company. I am yet to think of a better description. And in the midst of this global pandemic, that heartbeat is working at about 200 beats per minute.
Why are customer service agents under so much strain?
There are a variety of factors putting pressure on customer service organizations:
- Volume of questions / calls / chats from customers is at an unprecedented high.
- The responses to their questions are changing daily as this situation unfolds.
- Many agents have been relocated to work from home.
- Many agents are unable to get into work and cannot work from home, so total staffing is lower.
- Customers are scared and frustrated (after long wait times). They need answers to their questions and more than ever, they want to hear those answers from a human.

During this crazy time you can either let that heartbeat keep going up until it can no longer do what’s needed, or you can provide the necessary tools to make sure it can keep supporting the other organs / functions.
Rachel Knaster
Why isn’t anyone helping?
Unfortunately the trend in this space over the last several years has been to “contain” or “deflect” customers from connecting with agents. While AI and ML have become familiar terms within contact centers, the primary use has been to engage bots— aimed at preventing as many customers as possible from talking to agents.
How can you help your agents?
Our philosophy on AI and ML in this space is: Let’s use this powerful technology to augment the humans. Let’s allow conversations between customers and agents, learn from them, and use those learnings to drive better, more efficient interactions. This philosophy rings through our platform from our proprietary language models, to our intuitive UI/UX, to our ongoing engagement with agents through focus groups and roundtables to make sure what we are building is working for them.
Why focusing on agents is most important
- It drives the best results: Increased agent efficiency with increased customer AND agent satisfaction.
- Agents are the bottleneck right now.
- Your agents are on the front line — an important face of your brand to your customers.
- Better performing agents lead to happier customers.
- Agents provide the best feedback loop for what works and what doesn’t work.
- During this crazy time you can either let that heartbeat keep going up until it can no longer do what’s needed, or you can provide the necessary tools to make sure it can keep supporting the other organs / functions.