Join us on November 19th at 12pm EST for a webinar: Agent Churn is bad. What if it didn't matter?

Blog

All
Browse by topic
Machine Learning
R&D Innovations
Transcription
Articles

Reducing the high cost of training NLP models with SRU++

by 
Tao Lei
Article
Video
Feb 24
2 mins

Natural language models have achieved various groundbreaking results in NLP and related fields [1, 2, 3, 4]. At the same time, the size of these models have increased enormously, growing to millions (or even billions) of parameters, along with a significant increase in the financial cost.

The cost associated with training large models limits the research communities ability to innovate, because a research project often needs a lot of experimentation. Consider training a top-performing language model [5] on the Billion Word benchmark. A single experiment would take 384 GPU days (6 days * 64 V100 GPUs, or as much as $36,000 using AWS on-demand instances). That high cost of building such models hinders their use in real-world business, and makes monetization of AI & NLP technologies more difficult.

Tao Lei
Our model obtains better perplexity and bits-per-character (bpc) while using 2.5x-10x less training time and cost compared to top-performing Transformer models. Our results reaffirm the empirical observations that attention is not all we need.

- Tao Lei, Research Leader and Scientist, ASAPP

The increasing computation time and cost highlight the importance of inventing computationally efficient models that retain top modeling power with reduced or accelerated computation.

The Transformer architecture was proposed to accelerate model training in NLP. Specifically, it is built entirely upon self-attention and avoids the use of recurrence. The rationale of this design choice, as mentioned in the original work, is to enable strong parallelization (by utilizing the full power of GPUs and TPUs). In addition, the attention mechanism is an extremely powerful component that permits efficient modeling of variable-length inputs. These advantages have made Transformer an expressive and efficient unit, and as a result, the predominant architecture for NLP.

A couple of interesting questions arises following the development of Transformer:

  • Is attention all we need for modeling?
  • If recurrence is not a compute bottleneck, can we find better architectures?

SRU++ and related work

We present SRU++ as a possible answer to the above question. The inspiration of SRU++ comes from two lines of research:

First, previous works have tackled the parallelization/speed problem of RNNs and proposed various fast recurrent networks [7, 8, 9, 10]. Examples include Quasi-RNN and Simple Recurrent Unit (SRU), both are highly-parallelizable RNNs. The advance eliminates the need of eschewing recurrences to trade training efficiency.

Second, several recent works have achieved strong results by leveraging recurrence in conjunction with self-attention. For example, Merity (2019) demonstrated a single-headed attention LSTM (SHA-LSTM) is sufficient to achieve competitive results on character-level language modeling task while requiring significantly less training time. In addition, RNNs have been incorporated into Transformer architectures, resulting in better results on machine translation and natural language understanding tasks [8, 12]. These results suggest that recurrence and attention are complementary at sequence modeling.

In light of the previous research, we enhance the modeling capacity of SRU by incorporating self-attention as part of the architecture. A simple illustration of the resulting architecture SRU++ is shown in Figure 1c.

ASAPP—Figure 1: An illustration of SRU and SRU++ networks. (a) the original SRU network, (b) the SRU variant using a projection trick to reduce the number of parameters, experimented in Lei et al. (2018), and (c) SRU++ proposed in this work. Numbers indicate the hidden size of intermediate inputs / outputs. A more detailed description of SRU and SRU++ is provided in our paper.
Figure 1: An illustration of SRU and SRU++ networks. (a) the original SRU network, (b) the SRU variant using a projection trick to reduce the number of parameters, experimented in Lei et al. (2018), and (c) SRU++ proposed in this work. Numbers indicate the hidden size of intermediate inputs / outputs. A more detailed description of SRU and SRU++ is provided in our paper.

SRU++ replaces the linear mapping of the input (Figure 1a) by first projecting the input into a smaller dimension. An attention operation is then applied, followed by a residual connection. The dimension is projected back to the hidden size needed by the elementwise recurrence operation of SRU. In addition, not every SRU++ layer needs attention. When the attention is disabled in SRU++, the network reduces to a SRU variant using dimension reduction to reduce the number of parameters (Figure 1b).

Results

1. SRU++ is a highly-efficient neural architecture

We evaluate SRU++ on several language modeling benchmarks such as Enwik8 dataset. Compared to Transformer models such as Transformer-XL, SRU++ can achieve similar results using only a fraction of the resources. Figure 2 compares the training efficiency between the two with directly comparable training settings. SRU++ is 8.7x more efficient to surpass the dev result of Transformer-XL, and 5.1x more efficient to reach a BPC (bits-per-character) of 1.17.

ASAPP—Figure 2: Dev BPC on Enwik8 dataset vs GPU hours used for training. The SRU++ and Transformer-XL model both have 41-42M parameters and are trained with fp32 precision and comparable settings (such as learning rate).
Figure 2: Dev BPC on Enwik8 dataset vs GPU hours used for training. The SRU++ and Transformer-XL model both have 41-42M parameters and are trained with fp32 precision and comparable settings (such as learning rate).

Table 1: Comparison of reported training costs (measured by total GPU days used) and test results between SRU++ and various Transformer models. (*) indicates mixed precision training. Numbers are lower the better.

Table 1 further compares the training cost of SRU++ and reported costs of leading Transformer-based models on Enwik8 and Wiki-103 datasets. Our model can achieve over 10x cost reduction while still outperforming the baseline models on test perplexity or BPC.

2. Little attention is needed given recurrence

Similar to the observation of Merity (2019), we found using a couple of attention layers sufficient to obtain state-of-the-art results. Table 2 shows an analysis by only enabling the attention computation every k layers of SRU++.

Table 2: Test BPC on Enwik8 dataset by varying the number of active attention sub-layers in SRU++ models. We tested two 10-layer SRU++ models with 42M and 108M parameters respectively. Most of the gains are obtained using 1 or 2 attention sub-layers. Numbers are lower the better.

Conclusion

We present a recurrent architecture with optional built-in self-attention that achieves leading model capacity and training efficiency. We demonstrate that highly expressive and efficient models can be derived using a combination of attention and fast recurrence. Our results reaffirm the empirical observations that attention is not all we need, and can be complemented by other sequential modeling modules.

For further reading, ASAPP also conducts research to reduce the cost of model inference. See our published work on model distillation and pruning for example.

Customer Experience
Future of CX
Transcription
Articles

Why companies who want true VoC need to engage the power of AI

by 
Michael Lawder
Article
Video
Feb 12
2 mins

The best businesses succeed by developing a holistic understanding of their customers. Most, if not all, consumer companies have a Voice of the Customer (VoC) program, intended to capture and analyze feedback, leveraging the insights to drive both strategic and operational improvements across the business. While the intent of these programs is critical to constant improvement, the tools that have been available to CX professionals fall short of delivering what they really need.

Surveys and samples only give a partial view

Many organizations build VoC programs solely on a “survey and score” foundation. When done right, surveys can play an important role in any VoC program. But due to their low average response rate and general bias, they provide organizations with a limited view of the overall customer experience and the quality of service that is being delivered by your organization.

An overreliance on surveys has other pitfalls, too. Relationship-based surveys, for example, evaluate general brand satisfaction, but often fail to provide clear feedback on the internal processes, people, and frontline events that contribute to customer experience. On the flip side, transaction-based surveys capture feedback in the moment, but tend to lose sight of what the overall relationship looks like from the customer’s point of view.

Other companies might record calls, then either listen to or transcribe a subset of these calls. This approach also limits analysis to a small sample of customer interactions.

Analyzing only a fraction of your calls fails to tell the whole story. Yet companies rely on this data to make important decisions about product, sales, and marketing initiatives as well as contact center operations.

What’s more, with both of these approaches, there can be a significant time lapse between capturing the data, gaining insight from that data, and putting that insight into action. The truth is that most of us in the customer experience world have never had a full view of the quality of service we are delivering to our customers, and the opportunities that exist to improve the way in which we serve our customers across the enterprise.

AI elevates VoC with new possibilities

Artificial intelligence fuels new options for gaining more comprehensive customer insight. And, for putting that insight into action. Forward thinking CX leaders are excited about mining this wealth of data and are heartened to learn that they won’t need an army of data scientists on staff to do it.

Highly accurate transcription is key

The best of these new solutions start with highly accurate real-time transcription of every call. Transcription is not the goal, but a means to an end. However the importance of the quality of transcription can’t be overstated, as this is the fuel for meaningful analysis. More on this here.

Michael Lawder
AI solutions that use machine learning models custom trained on a company’s lexicon are—not surprisingly—far more accurate than solutions using generic models trained on everyone’s data. Consequently, they can deliver far more value.

Michael Lawder

Getting this data in real time gives companies the opportunity to take action instantly instead of waiting weeks, months, or even longer to address customer needs. And having it for every call gives companies a much fuller customer perspective.

Rich actionable insights

The real value comes not in just getting the data, but in being able to put it to use in meaningful ways. Beyond accurately transcribing customer conversations, an AI-driven VoC program can:

  • Analyze sentiment and even predict CSAT and NPS scores
  • Capture customers problem statements
  • Classify intent at a useful level of detail
  • Spot correlations between things—for example: callbacks or sentiment by agent, intent, or length of call
  • Highlight trends and anomalies in customer conversations
  • Alert supervisors of coaching need by agent or topic
  • Automate summary notes, providing cleaner data for analysis and better records for future customer contact

For the first time, you can effectively measure the quality of service you are delivering for every product, every interaction, every agent.

Cultivating VoC of this depth can do more than help manage and optimize CX operations. It has the power to influence business as a whole. CX leaders become the ultimate advocate for the customer, able to synthesize customer wants and needs as they relate to every stage of the customer journey. This elevates their stature in the organization, as they become trusted sources for insights that inform key decisions and strategy aimed to build customer loyalty and grow revenue. If you’d like to hear how companies in your industry are using AI-driven speech intelligence solutions in their VOC programs, drop us a line at ask@asapp.com.

CX & Contact Center Insights
Customer Experience
Digital Engagement
Future of CX
Articles

Think beyond the bot: 3 proven strategies for digital-first business success

by 
Michael Lawder
Article
Video
Feb 12
2 mins

Everyone’s talking about the importance of digital-first customer experience, especially these days—but you have to think bigger than that. It’s not enough to just have a great mobile app or self-service chatbot. Market-leading success relies on having a digital-first culture that drives your business and meets your customers where they are.

In simplest terms, ‘digital-first’ means making your operations as digital and mobile-friendly as possible, including enabling seamless communication across multiple responsive, persistent and asynchronous digital channels. Particularly in customer service, digital-first operations and the notion of engaging with your customers in the same place that they talk to their friends and family is a game-changer.

I’m not talking about technology replacing people, but about using the latest digital capabilities to make your organization radically more efficient and productive, and improving the customer experience at the same time. It’s all about empowering employees to do their best work, and meeting customer needs faster and better, wherever they are and whenever they need you.

Where should you focus first? I think a digital-first culture for customer service relies on a few fundamentals for meeting today’s demands. Let’s look at three proven strategies that are powering market leading companies.

1—Be where customers are—service on the go

Customer service is all about building relationships with your customers. In a digital-first world, those conversations are now across many channels from online support portals and webchat, to email, in-app, standard text messaging, and social media. Among all the options for digital engagement, asynchronous messaging has emerged as a clear winner with consumers. In fact, messaging has been the dominant way in which people communicate since over a decade ago. It makes sense that they want to engage with brands they love in the same place, given the familiarity and convenience.

For digital-first customer service, providing messaging capabilities is now the essential way to be there for customers in the moment. It gives customers a fast, convenient way to reach out, and can make it easier to resolve service issues cost-efficiently on the first contact.

Big tech leaders paved the way with technologies like Apple Business Chat and Google Business Messaging, enabling companies to integrate messaging at multiple touch points. New AI-powered solutions are now taking it a step further.

I like to call this the ‘asynchronous revolution.’ Digital-first leadership now demands multimodal customer service that supports ongoing conversations. The “start and stop” flexibility that people love so much in their personal messaging is now a business-critical interaction model that today’s digital consumers expect.

2—Make conversations seamless across channels

Being there for customers in digital messaging is key—but what if they need to pause the conversation or switch channels in the middle of an issue? Typically it means starting all over again, which is a frustrating experience. When people have to explain their problem to multiple agents, customer satisfaction drops off a cliff.

Delivering true omni-channel support is the next essential in a digital-first business, and it is finally here.

Whether consumers reach out via messaging, webchat, or voice, all of these channels must be intelligently integrated to ensure customer service agents really know each person. Employees need to be equipped with the right interaction history and procedural knowledge to quickly move toward resolution, no matter where or when that agent stepped into the conversation.

Many companies are finding that’s the most critical customer experience problem to solve, and the biggest source of frustration for consumers who need help. That’s why I’m so excited when I see new technologies seamlessly unify and thread conversation histories across channels. Innovative solutions are finally bringing a holistic approach to customer service.

Someone can request assistance in an app, then switch to a webchat, mobile messaging, or even voice, and any agent will know all the relevant details. It ensures the customer has an easy, cohesive experience, and enables agents to work more efficiently.

In this way, AI helps fuel a digital-first business and empower support and sales teams to treat every customer interaction as a moment that matters.

Michael Lawder

AI amplifies the effectiveness of digital-first channels and new techniques help to accelerate adoption. Powering digital and mobile touch points with data-driven intelligence also radically increases productivity at a lower cost—while delivering responsive, personalized customer experiences that win brand loyalty.

Michael Lawder

3—Empower employees with predictive capabilities

Once you’ve got digital messaging and omni-channel support in place, it’s time to kick it up a notch by augmenting customer service agents and sales teams with predictive and highly contextualized knowledge, powered by self-learning AI models. Most companies have vast volumes of data that can be harnessed to make support and marketing efforts substantially more productive.

Here’s where cutting-edge AI solutions really stand out. They can transcribe and analyze digital and voice conversations in real time, and integrate those insights with other transactional and historical data. The system can then proactively guide agents and sales reps with the most accurate and relevant information for a given customer need or situation, ensuring the best answer and outcome every time.

Those ‘conversational analytics’ translate into predictive insights that deliver powerful benefits:

  • Dramatically improves productivity
  • when your workforce knows exactly what to say and do through every interaction, without having to dig for details.
  • Captures the knowledge
  • of your best agents and sales reps. Sometimes that’s the best ‘data’ you’ve got, and it’s a lot more valuable when everyone can access it.
  • Delivers data-driven intelligence
  • to sales and marketing teams, making it easier to improve your operation and tailor customer experiences based on predictive insights instead of best guesses.

Ultimately, I believe the power of AI and predictive technologies like ASAPP are defining the future of digital-first business. This is the first time in the history of the customer service industry that we can simultaneously meet customers where they are, drive revenue growth, deliver a better customer and employee experience—and do it all at lower costs. And it’s about time.

No results found.
No items found.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Get Started

AI Services Value Calculator

Estimate your cost savings

contact us

Request a Demo

Transform your enterprise with generative AI • Optimize and grow your CX •
Transform your enterprise with generative AI • Optimize and grow your CX •