Using Machine Learning to Serve CSRs Resolutions for their Customers

By feeding details from customer interactions into our machine learning platform, and drastically cutting time on task for our search functionality, we turned our customer service reps into superheroes.

Project Overview

Shortly into the pilot phase of our newest internal Customer Relationship Management (CRM) module, we were tasked with leveraging machine learning to suggest routing for case assignments, classify the root cause of customer issues, and recommending highly relevant knowledge articles to assist customer service reps (CSRs) in resolving live customer interactions.

Project Details

Timeline: November 2021 through January 2022
My Role: Design and Dev Manager
Related Topics: Machine Learning, CRM, KMS, Case Management
Estimated Reading Time: 9 minutes

The Problem with Having Too Much Information

Understanding the User Story

The Context

Over the last year and a half we have been developing and piloting a brand new CRM tool—designed to enable CSRs to route and track things like customer complaints, questions, service requests, and issues in the form of cases or service tickets, which we call "inquiries." Our reps use the CRM tool to service inquiries which have been submitted by our customers, or to transfer inquiries to other relevant teams from across various lines of business for servicing. As the reps work through each customer inquiry, they often need assistance in classifying inquiries, reaching a resolution, and identifying root cause.

In order to differentiate our in-house CRM and drive adoption across all business units, our stakeholders challenged my team to present them with some ideas for how we might leverage Machine Learning to provide this assistance.

The Need

The user needs help in the form of reference materials when working to resolve inquiries with complex resolution steps, where the relevant resolution steps have recently changed, and for resolving inquiry types they are not familiar with. To accomplish this, they would reference resolved inquiries and knowledge articles with characteristics that match the inquiry or problem they are currently working to resolve.

As a result, they'll get good ideas on how to properly route inquiries, identify root cause more accurately, resolve inquiries faster, and be sure to walk customers through relevant troubleshooting or resolving steps rather than wasting time on irrelevant or outdated procedures.

The Problem

As we see it, the problem is that basic search and filter functionality in our tool doesn't go far enough to minimize the time CSRs have to spend researching older, resolved inquiries or searching for relevant knowledge articles. They have to search among hundreds of thousands of old inquiries—many of which were resolved using out-of-date procedures or reflect now-defunct underlying causes. There is simply too much information, and it changes too frequently for our users to reduce their handle time

The Path Forward

To solve this problem and lead our users forward, we plan to use machine learning algorithms that leverage Natural Language Processing (NLP) to constantly ingest the vast, mutable stores of data, and to present only the most the relevant, digested data to the CSRs in highly actionable ways.

A Solution in Search of a Problem...or Something More?

Understanding the Priorities

Solution 1: Routing and Classification Based on Similar Inquiries

A large part of any CRM is Case Management (inquiry management), and the name of the game there is accountability. So, it's not surprising that enhancements to assignments/routing, audit trails, and reporting often get the focus on our feature roadmap. But no matter how well were fine those components, we always push up against the human factor. CSRs play a huge role in the accountability chain through inquiry classification at the point of capture and root cause classification the point of resolution.

Improper root cause classification skews our data, and leads us down dead ends on the product roadmap. Worse, inquiry transfers and misrouting can cost the company hundreds of thousands in lost efficiency, and even millions in missed SLAs. As we have become more confident with ML in other applications (like building chatbots and voice-driven virtual assistants)—seeing the power of NLP trained with context data on our business' domains—we knew that we could dramatically improve outcomes and insight by leveraging historical data to suggest classification and routing to our agents.

Solution 2: Surfacing Relevant Knowledge Articles

When our integrations with the primary Knowledge Management systems used across the company proved uninspiring to the users we spoke with, we asked why. With hundreds of articles provided for each individual team—many of which featured the same "topics"—increasing the available information in the KMS didn't result in a direct increase in usefulness. Users were already able to rate the helpfulness of articles, and they could link a resolved inquiry to a knowledge article—features originally meant to help the writers of those articles. We believed it was possible to do a lot more with that data—saving our users and their customers from confusion, and reducing resolution time.

The pre-existing KMS which has been integrated within our tool.

Increasing Our Bandwidth

I presented both of these ideas to our stakeholders, and they agreed we could really move the needle with either direction—which is why, of course, they asked if we could work on them in parallel! With such a heavy lift, we knew it would be necessary to leverage our relationship with IBM who builds and supports the Watson product which powers a significant portion of our Machine Learning applications. I had my team design an approach that would fit our existing service orchestration, and IBM built new ML models into that approach for us to invoke based upon our use cases.

Purpose Must Make Progress

Design Approach

Whenever we're adding features with the potential to fundamentally change how users engage with a tool, it's often tempting to completely rework the interface with new, hypothetical workflows prioritized over existing ones. Although we were very excited to get our ML-enhanced features in front of our users, we had to decide on an approach that would truly test the machine learning models' performance while interfering with the rest of the interface as little as possible.

It's worth noting that machine learning models improve over time, but—with only "pilot" volumes of user data to work with—we knew it could take a little while for us to train the model to the level where using it would be preferable to most users over the manual search methods they had been using. For this reason, we worked IBM—our partner for Artificial Intelligence and Machine Learning—to help us train our ML models on our domain-specific language and the existing data we had. Our stakeholders were concerned about poor performance on the ML model’s part would confuse CSRs, and suggested we simply add ML results to the existing standalone KMS search function rather than bring it into the inquiry details.

To test whether this would fit users’ mental models, we ran an exercise sometimes called an Open Card Sort with 18 pilot users selected randomly from across all the major divisions that comprise our user base. We provided each participant with topics of content that already existed in the tool, and found that knowledge articles were consistently positioned as subordinate items to inquiries. Based on our findings, the stakeholders agreed that the semantic approach to including suggested articles within the inquiry details portion of the interface.

Whenever we're adding features with the potential to fundamentally change how users engage with a tool, it's often tempting to completely rework the interface with new, hypothetical workflows prioritized over existing ones.

Inquiry Details are broken into several components, like classification data, submitter data, comments, attachments, etc. Each of these components lives under a separate "tab" in the inquiry detail panel of our interface. To support our users’ mental models, we created a corresponding tab for "Similar Inquiries" and for "Suggested Articles." Users handling these inquiries at any point—from just after creation through resolution—are able to access the information in these tabs to assist them in routing, investigating, resolving, or re-classifying the inquiry.

Federated Search now incorporates relevancy scoring.

Keeping things consistent—and by way of addressing our stakeholder’s original request—we also incorporated the ML model’s relevancy scoring into the existing standalone knowledge search function.

Turning a Base Hit into a Home Run

Rollout & Results

Through the design and development process, we socialized these features with our pilot users. When we finally rolled things out, people were very interested to see whether this promising technology would deliver them any real value.

Our initial feedback was highly positive, and users started using the features right away—linking ML-suggested knowledge articles to inquiries as they resolved them, and anecdotal evidence that misrouting and misclassifying were diminishing steeply on some teams.

It wasn't long, however, until we found we had a significant gap—something the users could ignore, but we couldn't. The API call we made to our IBM-based machine learning was focused on positive identification of relevant matches for knowledge articles. We had let the technology dictate the terms of engagement, and had forgotten to build in a way for users to unlink knowledge articles that were linked erroneously. This isn't such a problem for users, who could see whether something was relevant or not at a glance—skipping on to the next suggested article if needed. But it had the potential to degrade the integrity of our ML model.

There's a saying we use frequently in machine learning:
“Garbage in, garbage out.”

The data underlying an ML model is weighted so that any one mistake is minimized over time. But there were still too many potential problems caused by keeping this erroneous data in the model. There's a saying we use frequently in machine learning: "Garbage in, garbage out." Our business units agreed it needed to be rectified before we could put any real weight on the feature’s performance.

Within 2 weeks, we had rolled out a change to let CSRs unlink knowledge articles from inquiries. I also started working with team managers and reporting groups to build in checkpoints for keeping the data clean with minimal effort.

Measuring Our Success

Less than 2 months after initial rollout, and it was already clear people were excited to use these features:

Nearly two-thirds of new inquiries have at least 1 knowledge article linked to them on resolution, which is up from under a quarter of inquiries resolved over the 3 months prior to rollout (when the search process was entirely manual).
The average relevancy score given to the top suggested result for Similar Inquiries has increased steadily by an average of 6% week-over-week.

Nearly 600 survey respondents from among our pilot users (a 49% response rate) indicated an average 8.8/10 satisfaction rating for the Suggested Articles feature, and an average 9.4/10 satisfaction rating for the Similar Inquiry feature.
Monthly aggregate inquiry misrouting has dropped from a rate of 1 in 148 to 1 in 220 according to our latest reconciliation.
First Contact Resolution has increased by nearly 8% for some teams.

Further testing and additional data will show the team how they can improve these features, but in my last months on the team, they were already making plans to expand usage of ML to things like real-time outage tracking, automated incident response, and cross-team resolution. Machine learning and "AI" aren't just leading the way for increased automation. They put a spotlight on patterns of user behavior in ways that don’t replace, but rather increase the humanity of what we make.