Blog
Why companies who want true VoC need to engage the power of AI
The best businesses succeed by developing a holistic understanding of their customers. Most, if not all, consumer companies have a Voice of the Customer (VoC) program, intended to capture and analyze feedback, leveraging the insights to drive both strategic and operational improvements across the business. While the intent of these programs is critical to constant improvement, the tools that have been available to CX professionals fall short of delivering what they really need.
Surveys and samples only give a partial view
Many organizations build VoC programs solely on a “survey and score” foundation. When done right, surveys can play an important role in any VoC program. But due to their low average response rate and general bias, they provide organizations with a limited view of the overall customer experience and the quality of service that is being delivered by your organization.
An overreliance on surveys has other pitfalls, too. Relationship-based surveys, for example, evaluate general brand satisfaction, but often fail to provide clear feedback on the internal processes, people, and frontline events that contribute to customer experience. On the flip side, transaction-based surveys capture feedback in the moment, but tend to lose sight of what the overall relationship looks like from the customer’s point of view.
Other companies might record calls, then either listen to or transcribe a subset of these calls. This approach also limits analysis to a small sample of customer interactions.
Analyzing only a fraction of your calls fails to tell the whole story. Yet companies rely on this data to make important decisions about product, sales, and marketing initiatives as well as contact center operations.
What’s more, with both of these approaches, there can be a significant time lapse between capturing the data, gaining insight from that data, and putting that insight into action. The truth is that most of us in the customer experience world have never had a full view of the quality of service we are delivering to our customers, and the opportunities that exist to improve the way in which we serve our customers across the enterprise.
AI elevates VoC with new possibilities
Artificial intelligence fuels new options for gaining more comprehensive customer insight. And, for putting that insight into action. Forward thinking CX leaders are excited about mining this wealth of data and are heartened to learn that they won’t need an army of data scientists on staff to do it.
Highly accurate transcription is key
The best of these new solutions start with highly accurate real-time transcription of every call. Transcription is not the goal, but a means to an end. However the importance of the quality of transcription can’t be overstated, as this is the fuel for meaningful analysis. More on this here.
AI solutions that use machine learning models custom trained on a company’s lexicon are—not surprisingly—far more accurate than solutions using generic models trained on everyone’s data. Consequently, they can deliver far more value.
Michael Lawder
Getting this data in real time gives companies the opportunity to take action instantly instead of waiting weeks, months, or even longer to address customer needs. And having it for every call gives companies a much fuller customer perspective.
Rich actionable insights
The real value comes not in just getting the data, but in being able to put it to use in meaningful ways. Beyond accurately transcribing customer conversations, an AI-driven VoC program can:
- Analyze sentiment and even predict CSAT and NPS scores
- Capture customers problem statements
- Classify intent at a useful level of detail
- Spot correlations between things—for example: callbacks or sentiment by agent, intent, or length of call
- Highlight trends and anomalies in customer conversations
- Alert supervisors of coaching need by agent or topic
- Automate summary notes, providing cleaner data for analysis and better records for future customer contact
For the first time, you can effectively measure the quality of service you are delivering for every product, every interaction, every agent.
Cultivating VoC of this depth can do more than help manage and optimize CX operations. It has the power to influence business as a whole. CX leaders become the ultimate advocate for the customer, able to synthesize customer wants and needs as they relate to every stage of the customer journey. This elevates their stature in the organization, as they become trusted sources for insights that inform key decisions and strategy aimed to build customer loyalty and grow revenue. If you’d like to hear how companies in your industry are using AI-driven speech intelligence solutions in their VOC programs, drop us a line at ask@asapp.com.
A silver bullet to end the conflict between lower cost and better customer service
Modern CX Teams: What do they have that you don’t?
Long before the pandemic, consumers made the digital shift—but business has been slow to catch up. For over a decade, texting and digital messaging have been the most popular ways to communicate. Yet, the average Fortune 500 company still spends 80% or more of their contact center budget on voice calls—where the experience is worse and the cost is higher. Customers are handled more like anonymous callers, and the levers to drive efficiency and quality are limited.
For brands to stay competitive, customer service needs to modernize, using new technologies to really know customers and be there for them where and how they want to be supported. Call centers will still be part of the mix, but they need to deliver more value, and be integrated into a larger, holistic omni-channel support strategy. By embracing innovative capabilities, companies can deliver cost-efficient human-centric digital experiences that help win and keep customers.
Rethinking digital communications
When customers have a question or a problem, they want quick, convenient resolution on their preferred communication channel. Today that most often means digital. Customers may want to use self-service tools, reach out on social media, and chat, text, or message in a variety of ways. The best brands will make it an easy, personalized experience that nurtures loyalty and increases lifetime value, regardless of channel. There is real power in engaging with your customers in the same place they talk to their friends and family.
Many budget-minded executives see digital communications solely as a way to minimize costly engagement with customers. They measure success in deflection and containment. But that’s not a sustainable (or logical) approach. What if, instead, your digital capabilities made every interaction substantially better?
Reimagining CX with AI at the core dramatically changes the paradigm. Companies can deliver great experiences AND drive new levels of efficiency.
Michael Lawder
AI innovations are making that possible. Today, businesses can not only deliver smarter digital channels, but empower better customer service communications all around.
One of the biggest wins is meeting customer needs on the first contact. When an AI-powered solution integrates all your digital communication channels, both agents and customers have a greater opportunity to get it right the first time. If a customer does need to follow up on something, a smart system provides all the relevant details so any agent can quickly move things forward instead of starting all over.
That’s one part of a better, modernized customer experience. Taking it further, AI-driven solutions also make agents significantly more productive through every interaction. Machine learning augments CX teams with the right knowledge at the right time for faster, easier resolution. I love that some of the most advanced machine learning technologies are being used to help make people (in this case, agents) better.By dramatically increasing efficiency, customer care agents can deliver a more personalized and contextual interaction — turning potentially negative experiences into real loyalty moments.
Fueling conversation-powered operations
Conversations are the heart of customer service, and digital-first technologies can make those communications operationally more efficient. In real time, it means faster resolution for shorter and more concurrent interactions, at lower costs. In the long view, it provides a wealth of data for machine learning insights, providing ‘conversational analytics’ to fuel strategies and actions that help the business.
Think about ‘voice of the customer’ programs, where feedback mostly comes through surveys. What if you could easily uncover insights from conversations across many touch points?
After decades leading contact center organizations, I think the future of workforce management is in conversation-powered operations. Companies have a goldmine of data that isn’t being tapped. With the best of modern technology, machine learning can extract valuable customer sentiment from 100% of contacts with zero manual work instead of the old way of listening to calls and reading transcripts.
It can rapidly analyze thousands of agent conversations for internal quality assurance to improve compliance, soft skills, and process optimization. This modernization provides a significantly greater ability to understand the quality of service agents are delivering, and opens a window of insights into the enterprise. It’s a faster, smarter way to identify how and where to make both transactional and strategic improvements that are better for the business and the customer experience.
Humanizing your agents
Circling back to empowering agents, modern CX teams recognize customer service agents are the voice of their brand and should be brand ambassadors. We all know the adage about ‘making every agent as good as your best agents’—but you can only get there if you support them well. Instead of juggling multiple tools and inefficient processes, agents need to be able to focus on ensuring each customer feels known, valued, and supported.
That’s the kind of experience that powers a brand. And that’s where ASAPP really shines.
ASAPP modernizes CX with a streamlined, unified platform to easily support customers across channels. With AI-driven predictive knowledge, agents know what to say and do to serve each customer better and faster. It uses the power of technology not to replace the human touch or hide from customers, but to make contact more personalized. Ultimately, that’s the path to greatness for a brand…empowering both sides of the conversation with the right balance of digital efficiency and an emotional connection. That builds relationships with your customers and makes great customer experience a core part of the value proposition of your brand.
If you're missing the signals you'll likely miss the sale
The brittleness of RPA is failing you
Every day contact center agents help millions of customers. To assist in each one of those contacts, an agent must first listen to the customer, diagnose the issue, and apply problem-solving to determine the correct sequence of actions to address customers’ needs. In each step of this process, agents must fetch, read, and update information from multiple back-office applications that, more often than not, are complex systems optimized to support business operations, not for providing a great user experience to agents.
A high learning curve
In practice, agents must know the purpose of each of those back-office systems to determine in which one they may find the information they seek. In addition, they must know the specific navigation flow that leads to the information within each system. Enabling this level of knowledge is expensive because it requires significant agent training, documentation and learning environments for trainees. And while training is a good start, experience is also a key. But, deep experience is rare in an industry where attrition averages 30-45% and can range to more than 100% annually. As a result, agents spend an appreciable amount of time wandering in applications looking for information. This leads to longer waiting times, incomplete answers, and more frustration for customers.
The limitations of RPA
In legacy systems, Robotic Process Automation (RPA) is a standard way of automating tasks in User Interfaces (UIs). However, RPA is highly resource-intensive since it requires manual scripting of each sequence of actions to be performed across many applications with potentially hundreds of navigation flows each. Hence, RPA simply doesn’t scale efficiently or effectively.
Where RPA is brittle and resource-intensive to scale, a machine learning system creates navigation flows automatically—and readily adapts when there are changes in the UI or agents’ behavior.
Nicolás D'Ippolito, PhD
Adding to the challenge, to automate tasks in RPA, developers must provide a list of actions to be performed. The definition of action in RPA requires a detailed description that allows the robot to identify the specific object in the UI to which it has to interact. As a result RPA scripts are highly coupled with the UI structure, making them very fragile to subtle changes in the UI. Although backend systems tend to change slowly, frontend systems often change frequently. Since the trend in the industry is to adopt web-based systems both internal and SaaS, fragility of RPA tools is an increasingly large problem.
Meeting the challenge with AI
In contrast, ASAPP AI-powered features can automatically determine the back-office system and the navigation flow that gets the agent to the required information. Our models evaluate the conversation context and identify potential navigation suggestions. When a recommendation is found, the agent is presented with a compact description of the system and flow leading to the required information. If the agent takes the recommendation, the ASAPP platform leads the agent to the information.
To implement these UI augmentation features we combine machine learning with stochastic analysis to generate behaviour models that abstract the potential interactions with the UI. The process to train this system is based on analysis of historical user interaction data. These models allow for efficient recall of high probability navigation paths towards a navigation goal from the current system state. This capability gives us the power to automatically create a robust navigation tool for any given application. We then combine our navigation tools with NLP and classification models to determine for each conversation context which navigation tool and flow to use.
In addition to the automation benefits, our UI augmentation greatly increases the resiliency to changes in the UI. Since we have a behavior model of every system we can detect deviations from standard usage patterns due to changes in navigation flows in the UI. We can also detect new states in the system that we didn’t observe before due to changes in the UI or agents’ usage patterns. In both cases, these deviations are considered during the automatic retraining cycles. Our models adapt to the changes in the UI, and navigation tools are re-generated, ensuring agent access to needed information is always current.
UI augmentation enables agents to spend less time navigating back-office systems and more time helping customers. This leads to faster issue resolution, and therefore, happier customers. In addition, reducing cognitive load for agents opens the possibility for digital agents to engage with more than one customer at a time. So more customers can be served with the same amount of agents, further increasing agent productivity while also improving customer satisfaction as their issues are addressed more quickly and accurately. With the power of machine learning, we can train UI augmentation features on any enterprise’s infrastructure, enabling those companies to quickly get the benefits of agent augmentation.
It's time to rethink customer service
It’s time to rethink customer service. For decades, I’ve seen how contact centers are pressured to focus on costs and end up sacrificing quality. It’s easy to understand why: Customer service is one of the largest operating expenses at most consumer companies. But, when the drive for efficiency comes at the expense of customer experience—and ultimately loyalty—companies often lose as much as they gain.
A no-win situation for companies and the consumers they serve
Here’s a typical scenario: To save money companies hire thousands of contact center workers in low cost locations who have never used their product or service, give them 2-3 weeks of training, and unleash them to interact with customers. They invest in bots to keep customers away. And they may try some chat channels, with hope of having agents help more than one customer at a time. None of these works particularly well to serve the customer. So, they may try one and then another—and because each channel is in a separate system—they have to start all over with every attempt.
The critical need for a new strategy
Meeting customer needs and controlling your brand have never been more challenging—nor more important. Consumers now have a powerful voice to sway opinion, their expectations can change with a tweet, and brand loyalty is fleeting. People in this mobile-first, digital-first world expect fast, anytime, anywhere service, but they also want companies to know and value them. Customer service needs to deliver all of that, yet too often it misses the mark because frontline workers aren’t set up to be successful.
Instead of using technology to keep customers away from your agents, use it to empower those agents to deliver great customer experiences. It will pay off in both greater productivity and increased customer and agent satisfaction.
Michael Lawder
Customer care agents are the voice of your brand. But what happens if they have to struggle to solve problems with insufficient knowledge and an array of inefficient tools? It’s no wonder many businesses have an agent annual attrition rate that often reaches 100% or higher—while customer satisfaction is going down.
After two decades working every level of customer service at top brands like Apple and Samsung, I know there’s a better way.
Empower every agent to be their best
Companies don’t have to choose between reducing cost and providing great customer experience. We just need to leave old thinking behind, and take advantage of emerging technologies that enable contact center employees to do their best work.
The key is in supporting agents, helping them to be more productive—and more engaged and satisfied with their jobs. When I think about the agent and contact center of the future, a few essentials rise to the top:
- Make it easy to focus on customers.
- Typically, agents spend most of their time and attention hunting for information across multiple systems, and manually working through processes. Meanwhile the customer waits and gets frustrated. Giving agents a streamlined, integrated system they can learn in hours instead of days will both increase operational efficiency and drive higher customer satisfaction.
- Give every agent instant access to the best knowledge available.
- It’s time we put the information agents need at their fingertips using the power of AI. No more relying on shoulder-tapping a coworker for answers or putting customers on long holds while culling through virtual libraries. Innovative technologies can predictively deliver the right knowledge and procedures, so agents know exactly what to say and do to best serve customers and solve problems faster. Augmenting workers with AI and machine learning means you don’t eliminate the human touch but make it dramatically more productive.
- Support customers seamlessly anywhere they are.
- Digital-first is now the name of the game, and that includes making it easy for customers to solve a problem or meet a need using any channel they prefer. Suppose someone starts on a call but needs to drop, and wants to finish resolving their issue later via chat or texting. Contact centers need to provide that continuity across channels, so agents can instantly pick up where things left off and customers have an effortless experience and feel that you know them. The more convenience and simplicity you provide, the more likely you will earn loyalty and build trust, driving lifetime value along the way.
The new demands of customer experience require a new approach for customer service. That’s why I’m so passionate about working at ASAPP. For the first time in decades, I see the promise of artificial intelligence in action. With a unique technology solution, ASAPP solves two primary challenges that are typically in conflict. Empowering agents to be more productive reduces costs and improves the bottom line, and at the same time creates a simpler and effortless customer experience that drives loyalty and retention for your brand.
It’s customer service for the future—right now, when consumers and businesses need it most.
ASAPP tops ASR leaderboard with E‑Branchformer
ASR technology has been beneficial for businesses and their customers for many years. ASR, or Automatic Speech Recognition, is the software that translates human speech into text. With continual advancements in research and AI modeling, accuracy has improved immensely over time. Developing the most accurate ASR possible has become a high priority for many top tech companies because of how much it benefits businesses when it’s done correctly.
ASR’s primary goal is to maintain high recognition accuracy. There are various units of evaluating recognition rates or error rates, such as phonemes, characters, words, or sentences. The most commonly used method to determine the accuracy of ASR is Word Error Rate (WER).
Healthy competition
To fairly compare AI speech recognition research studies across the industry, we evaluate WER on public datasets. Librispeech, one of the most widely used datasets, consists of about 1000 hours of English reading speech with transcription and extra text corpus. Researchers worldwide have been competing for years to substantiate their methods’ superiority using the Librispeech dataset and WER.
Recently, the speech community has been trending towards end-to-end (E2E) modeling for ASR. Instead of having separate acoustic and language models, as in conventional ASR methods, E2E modeling has achieved great success in both efficiency and accuracy by simultaneously training a single integrated model.
Although several E2E model structures, such as Transducer and Attention-based Encoder-Decoder (AED) have been explored, most of them share a common encoder, the module that extracts meaningful representative information from the input speech.
Speech scientists, looking to create a more powerful encoder, are actively studying novel training objectives, acoustic feature engineering, data augmentation methods, and self-supervised learning using untranscribed speech.
But these research areas don’t address a fundamental question, “What is the optimal neural network architecture for constructing the encoder?”
To address this question, ASAPP researchers recently developed the E-Branchformer model, a highly accurate neural network. Other similar models include Transformer, Conformer, and Branchformer; however, the E-Branchformer surpasses these models in accuracy. Here’s a quick overview of the different models ASAPP used to develop E-Branchformer.
Transformer
The Transformer has shown promising performance in several sequence modeling tasks for ASR and NLU (natural language understanding). This potential is due to the strength of multi-headed self-attention, which can effectively extract meaningful information from the input sequence, while considering the global context.
Conformer
To improve the Transformer, many methods have been introduced and utilized to create synergy by applying convolution, which has advantages in modeling the local context.
In particular, Conformer was introduced and is widely considered as the state-of-the-art accuracy in Librispeech ASR tasks.
By evaluating with an external Language Model (LM) trained using Librispeech text corpus, Conformer achieves 1.9% and 3.9% WER on test-clean and test-others, respectively. Although Conformer demonstrates that stacking convolution sequentially after self-attention is a better method than using them in parallel, other research studies, like Branchformer, have applied these two neural networks in parallel, and found performance to be noticeable.
Branchformer
Branchformer was introduced with three main components:
- Local-context branch using MLP with convolutional gating (cgMLP)
- Global-context branch using multi-headed self-attention
- Merging the module with a linear projection layer
Each branch is computed in parallel before results are merged. Through intensive experiments, Branchformer showed comparable performance with Conformer. Other experiments stacked different combinations by mixing Branchformer and Conformer, but didn’t achieve better results.
E-Branchformer
Inspired by the Branchformer studies, ASAPP researched how convolution and self-attention can be combined more effectively and efficiently.
This resulted in the highest performing model, E-Branchformer, setting the new state-of-the-art WER at 1.81% and 3.65% on Librispeech test-clean and test-other with an external LM.
Kwangyoun Kim
To develop E-Branchformer, we made two primary contributions to Branchformer that significantly improved performance.
- We enhanced the merging module, which combines the output of the global and local branches, by introducing additional convolutions. This change has the effect of combining self-attention with convolution sequentially and in parallel. Through extensive experiments on several types of merge modules, we proved that adding a single depth-wise convolution can significantly improve accuracy with negligible computational increase.
- We revisited the point-wise Feed-Forward Network (FFN). Transformer and its variants commonly stack FFN with self-attention in an interleaving pattern. We experimentally demonstrated that even in Branchformer, stacking FFN together is more effective in improving the model’s capacity. For example, we found that a stack of 17 Branchformers and 17 FFNs in an interleaving pattern has a similar model size to the 25 Branchformers, but is much more advantageous in accuracy.
ASAPP has topped the leaderboard of WER in Librispeech ASR tasks by using the newly proposed E-Branchformer. We are confident that this new model structure can be applied to other tasks and achieve impressive results.
We’re sharing our findings with the community so that everyone can benefit from them. You’ll be able to find all of the detailed methods and experimental results in our upcoming paper, which has been accepted and will be presented at SLT 2022. We’ll also release more information about how we implemented E-Branchformer. Our models’ recipes will be available through ESPnet, so anyone who wants to can reproduce our results. If you’d like to talk about E-Branchformer in person, please reach out to us during the SLT 2022 conference.