Will chatbots ever live up to the hype?
Despite years of hype—and some incredible technological breakthroughs—many people think of chatbots as an even more frustrating replacement for offshore call centers. I’m the CEO of a chatbot company, and even I can’t name a single chatbot that’s great.
I don’t think that was anybody’s game plan for chatbots. It certainly wasn’t my game plan when I built the first version of Botpress in 2015.
So how did we get here? And will chatbots ever live up to the hype? I wouldn’t be writing this if I didn’t believe the answer was yes, but don’t take my word for it. Let’s explore what’s happening with the technology together and see if you agree.
Why do chatbots suck?
Despite all of the advances in natural language processing (NLP), most chatbots only use the most basic form of it. They parse conversations through intent classification—trying to organize everything a customer might say into a preconceived bucket based on the intention of their inquiry.
For example, “Hello, I would like to change my billing address” might be classified into the change billing address bucket—and the chatbot would reply accordingly.
While this can handle some common requests, it’s difficult to provide and maintain a satisfying customer experience. To classify a customer conversation, the conversation designer must anticipate the correct intent and add all possible conversation triggers for those intents.
Even if those intents have been well anticipated with good trigger phrases, the chatbot can only deliver on a single intent. Conversations with real people can be messy and full of nuance, and people often want multiple things at once. Intent-based classification just finds the intent that a conversation resembles and pushes the canned response for that bucket. The chatbot might know the answer to the question, “Can I change my billing address so it matches other profiles on my account?”, but have the information under account settings, not address change.
In my view, this hardly qualifies as AI—it’s closer to a search function. You’re not having a conversation; you’re interacting with a conversation machine, playing a text adventure like Zork. This function is error-prone and a lot of work to build, but more importantly, it’s fundamentally wrong. Here’s an example of why:
Imagine an image classification program that identifies animals and furniture. You feed it a pile of labeled images—this is an animal, this is furniture—and it learns to recognize them. Suppose at some point you need to differentiate bears from dogs as well as couches from chairs. Now you have to relabel everything to be more specific. Now suppose the program encounters a bear skin on a couch—that’s multiple matches, so now you have a conflict.
Ideally, a chatbot should be able to ingest a user’s query, understand what the user is trying to achieve, and then help the user achieve their objective—either by taking action or by generating a helpful, human-sounding response.
This isn’t what today’s chatbots deliver. Instead, they are glorified Q&A bots that classify queries and issue a canned response. Recent advances in several areas of AI, however, offer opportunities for producing something better.
What are chatbots capable of, and why aren’t we there yet?
Today, there are better, “intentless” ways to design chatbots. They rely on advances in AI, Machine Learning (ML), and NLP fields such as information retrieval, question answering, natural language understanding (NLU), and natural language generation (NLG).
Soon, chatbots will leverage these advances to deliver a customer experience that far exceeds today’s rudimentary Q&A bots. Imagine a chatbot that can:
- Understand complex queries with all the messy nuance of human speech.
- Generate human-like answers to complex queries by drawing from a knowledge base.
- Use natural language to query structured tables, such as a flight information database.
- Generate human-sounding phrases that match a specific dialect or brand tone.
While a chatbot with these features would provide fantastic user experiences, upgrading from existing versions takes more effort than most organizations are willing to bear. They’ve already invested years in building intent-based chatbots, including training datasets and writing canned responses.
Ideally, developers would pull the data out of these chatbots and build it into newer, more sophisticated bots. Unfortunately, this isn’t possible. Chatbot datasets are created and maintained to match the specific way in which a bot works. A dataset trained against a particular set of intent classifications is no use to a newer, intentless chatbot that uses more advanced NLP concepts. As a result, an organization that wants to implement a more advanced chatbot would have to rebuild it—and its dataset—from scratch.
Why would it be worth rebuilding? NLP is a rapidly evolving field, and changes are coming that will help chatbots live up to their promise. Let me give you some concrete examples.
Four NLP advances that will help chatbots live up to the hype
NLP models are essentially a chatbot’s “brain.”
Intent-based chatbots use basic NLP models that match user inputs against a dataset of labeled examples and try to categorize them. However, in recent years, we’ve seen huge advances in NLP models and related technologies that will profoundly impact chatbots’ ability to interpret and understand user queries. These include:
Support for larger NLP models
Since 2018, NLP models have grown hyper-exponentially. The graph below shows how quickly the number of parameters in modern NLP models have grown
You can think of a parameter as comparable to a single synapse within a human brain. Nvidia estimates that by 2023 it will have developed a model that matches the average human brain parameter-for-synapse at 100 trillion parameters. To support these massive models, Nvidia just announce their Hopper engine, which can train these massive models up to six times faster.
While model size isn’t the only factor in measuring the intelligence of an NLP model (see the controversy surrounding several existing trillion-plus parameter models), it’s undoubtedly important. The more parameters an NLP model can understand, the greater the odds it will be able to decipher and interpret user queries—particularly when they are complicated or include more than one intent.
Tooling
The evolution of frameworks and libraries such as PyTorch, TensorFlow, and others makes it faster and easier to build powerful learning models. Recent versions have made it simpler to create complex models and run deterministic model training.
These toolsets were initially developed by world leaders in AI/ML—Pytorch was created by Facebook’s AI Research Lab (FAIR) and TensorFlow by the Google Brain team—and have subsequently been made open-source. These projects are actively maintained and provide proven resources that can save years of development time, allowing teams to build sophisticated chatbots without needing advanced AI, ML, and NLP skills.
Since then, new tools have further accelerated the power of NLP models. For those wanting the power of these tools without the burden of configuring them, MLOps platforms like Weights & Biases provide a full service platform for model optimization, training, and experiment tracking. As the ML field becomes more sophisticated, more powerful tooling will come along.
Parallel computing hardware
Whereas a CPU provides general purpose processing for any given function, GPUs evolved to process a large number of simple mathematical transformations in parallel. This massively parallel computation capability make it ideal for NLP. Specialized hardware such as TPUs and NPUs/AI accelerators have taken these capabilities and created specialized hardware for ML and AI applications.
As hardware grows in power, it becomes faster and cheaper to build and operate large NLP models. For those of us who aren’t shelling out the money for these powerful chipsets, many cloud providers are offering compute time on their own specialized servers.
Datasets
NLP datasets have grown exponentially, partly due to the open-sourcing of commercially built and trained datasets by companies like Microsoft, Google, and Facebook. These datasets are a huge asset when building NLP models, as they contain the highest volume of user queries ever assembled. New communities like HuggingFace have arisen to share effective models with the larger community.
To see the effect of these datasets, look no further than SQuAD, the Stanford Question Answering Database. When SQuAD was first released in 2016, it seemed an impossible task to build an NLP model that could score well against SQuAD. Today, this task considered easy, and many models achieve very high accuracy.
As a result, new test datasets challenge NLP model creators. There’s SQuAD 2.0, which was meant to be a more difficult version of the original, but even that is becoming easy for current models. New datasets like GLUE and SuperGLUE now offer multi-sentence challenges to give cutting edge NLP models a challenge.
Should you build or buy?
In hearing about all these advances in AI, ML, NLP, and related technologies, you may think it’s time to chuck out your chatbot and build a new one. You’re probably right. But there are fundamentally two solutions for development teams:
- Build a chatbot from the ground up to incorporate today’s superior technologies.
- Purchase a toolset that abstracts the difficult NLP side of things—ideally with some additional features—and build from there.
This is the classic “build or buy” dilemma, but in this case, the answer is simpler than you might think.
For a smaller development team with limited resources, building a chatbot from scratch to incorporate the latest AI, ML, and NLP concepts requires great talent and a lot of work. Skills in these areas are hard (and expensive) to come by, and most developers would prefer not to spend years acquiring them.
What about development teams at larger organizations with resources to hire data scientists and AI/ML/NLP specialists? I believe it still likely isn’t worthwhile to build from scratch.
Imagine a big bank with a dedicated team working on its latest chatbot, including five data scientists working on a custom NLP pipeline. The project takes perhaps 18 months to produce a usable chatbot—but by that time, advances in open-source tooling and resources have already caught up with anything new the team has built. As a result, there’s no discernible ROI from the project compared to working with a commercially available toolset.
Worse, because the chatbot relies on a custom NLP pipeline, there’s no simple way to incorporate further advances in NLP or related technologies. Doing so will require considerable effort, further reducing the project’s ROI.
I confess I am biased, but I honestly believe that building, maintaining, and updating NLP models is simply too difficult, too resource-intensive, and too slow to be worthwhile for most teams. It would be like building your own cloud infrastructure as a startup, rather than piggybacking on a big provider with cutting edge tooling and near infinite scale.
What’s the alternative?
A toolset like Botpress can abstract the NLP side of things and provide an IDE for developers to build chatbots without hiring or learning new skills—or building the tooling they need from scratch. This can provide a series of benefits for chatbot projects:
- Significantly reduced development time.
- Easy upgrades to the latest NLP technologies without significant reworking.
- Less effort to maintain chatbots as updates are automatic.
Best of all, developers can focus on building and improving the experience and functionality of their own software—not learning AI/ML/NLP.
Start building chatbots today
If I’ve piqued your interest in building chatbots, you can start right now. At Botpress, we provide an open-source developer platform you can download and run locally in under a minute.
To get started, visit our chatbot developer page. For a walkthrough on how to install the platform and build your first chatbot, refer to our getting started with Botpress guide.
You can also test out the live demo of our latest product—a radically new method of creating knowledge-based, “intentless” chatbots, called OpenBook, announced this week.
The Stack Overflow blog is committed to publishing interesting articles by developers, for developers. From time to time that means working with companies that are also clients of Stack Overflow’s through our advertising, talent, or teams business. When we publish work from clients, we’ll identify it as Partner Content with tags and by including this disclaimer at the bottom.
Tags: ai, chatbots, NLP, partner content, partnercontent
19 Comments
Two facts that prove chatbots will never be prolific in successful companies:
1) When customers call, it’s because they want to talk to a human ASAP – PERIOD, full stop.
2) When a customer calls you, it presents the rarest and most valuable occurrence in your organization – the opportunity to speak directly with your client one-on-one.
Companies that recognize and capitalize on these two points will dwarf all other competition and be loved by their customers. Companies that ignore these facts will continue to make their customers angry, disloyal, and looking for anything else.
Stop wasting time/money trying to make your customers angrier.
I definitely agree. Unlike browsing Facebook, customers won’t reach out if they don’t have any problem. When I call in, surely I want to talk to someone. Even if I “have to” chat, I prefer chatting with a real person.
I disagree, I placed an order a few days ago and when it arrived it was missing an item. I got back on Amazon and with their latest AI tool I was able to get a new part sent to me with no trouble at all. It took less than a minute.
I think you’ve hit on something but I’m afraid it might be by design.
It might be true that a person who thought about it would agree with you, that it’s a great opportunity to talk to customers, and a support person would be able to take that information and relay it to product designers/etc. to either fix the problem’s root cause, find new project ideas, etc.
But that’s not really what people who buy chatbots are thinking. The customers of chatbots are probably people who want to “reduce support costs” so that their own metrics get prettier for their bosses. There are whole genres of products designed to make one particular silo look good at the expense of another one (or at the expense of your customers — sales is another silo). The “support” department and the “product” department probably aren’t even in the same silo, and even if they are, the more time a “product” person spends talking to customers the less story points they get done.
Nevermind that engaging customers is vital to product design, the computerized schedule doesn’t understand that, it only knows “if story point count > 36 then good, otherwise alert next level up”. Neither support nor product design is measurable, but that doesn’t stop people from trying. It sounds bad and it kind of is, but if everyone learns to recognize this sort of thing, and does whatever they can to counter it (remember servant leadership, if you don’t have it, start it), then eventually the good deeds will add up.
Work hard and take on hard important problems, the most important stuff you can find. People who just look at numbers and don’t ask questions are very easily fooled. Get real success and you may find that, well, the numbers work (as if, by magic?) and nobody will complain about it if you’re “winning”. If you fail in real life but succeed on paper, no amount of metrics will save you, and the bean counters will not come to your aid, a well measured failure will just shorten your career.
I don’t bother with the chatbots anyway because they have never ever solved a non-trivial problem for which I actually need customer service.
I absolutely agree with your comments. We are calling into speak to a real person. Its frustrating for me when you call for customers service help and the AI or Chatbox keeps sending you in a loop
When companies replace humans with computers to talk to you, it’s the ultimate way of saying “we don’t care at all about you and aren’t willing to spend a second on you” without saying it to your face. “You are not even worth the time of a human-being employed by us. We have better things to do than to deal with you, our paying customer”. See? It feels good to interact with a computer when you’re too busy and really need to get stuff done and really need a human to help you fast, doesn’t it?
Chat bots are the no. 1 way to be sure I take my business elsewhere. The only useful thing I’ve ever managed to get them to do is offer discount codes for checkouts. Other than that they’re a waste of JavaScript that stand between me and what I want to achieve as a paying customer.
As you nicely told, current chatbots sucks, period.
Future chatbox … Who knows?
Let’s keep trying.
Next subject 👉
“Understand complex queries with all the nuance of human speech”
“Generate answers to complex queries by drawing from a knowledge base”
“Use natural languatge to query structured tables”
“Generate phrases that match a specifc dialect”
Assuming the transcription tech was perfect, this kind of reminds me “natural language programming”. English is a pretty hokey and forgiving programming language, it’s not surprising that intent based chatbots only implemented a very narrow subset of it.
Amen. No one likes a company with a dehumanizing perception.
*Starts the article with “Why do chatbots suck?” then at the end pitches their own chatbot service*
Not doing a real good job of selling me…
> Conversations with real people can be messy and full of nuance, and people often want multiple things at once.
I have heard AI researchers define the ability to deal with nuance as the essence of “intelligence,” as in the quality that humans possess and machines have great difficulty emulating.
Chatbots suck because they don’t have the power to fix my problems. If you have an automated solution to a problem, don’t connect it to a chat-bot. Connect it to a user-facing button. Until someone builds a chat-bot that can solve rare problems that require privileged access, I consider them a waste of time.
You’re exactly right. People that design chatbots aren’t thinking about people like us that read docs / try very hard to find any way to do something *before* talking to somebody.
I’m pretty sure chatbots are the product of managers who are extremely frustrated that their team has to spend time getting questions like “how do I reset my password” and are convinced that 99% of their customers will be satisfied with a chatbot doing it for them.
But when we try and contact the company we’ve exhausted every single option available to us.
Our problems are probably more like:
– “we didn’t put that button in the GUI”
– “it’s a bug in the software”
– “something on the company’s end is messed up”
– “company policy actually requires an employee to do this for you because we don’t trust you to do it yourself”
I guess that’s why I feel like automated chatbots and AI assistant phone answers are a little patronizing. I *know* the AI can’t help, because it’s obvious to me that the designers didn’t think about my problem, so just let me talk to a human.
At the end of the day, at best, the “chat bot” is absolutely nothing more than complicated and long winded way of searching through some documentation database. Please stop wasting our time.
They will for automated Tech Support but for normal use nah
The biggest advancement you could possibly make is for the bot to understand the phrase “I want to talk to a person”. The experiment has already been completed. Ask a million people if they would rather talk to a bot than a person, you’ll get 900,000 who say “no”, 99,999 who say “what’s a bot?”, and 1 who invested in a chatbot company.
I’ve been developing conversational interfaces since early 2016 as part of a start-up, so I have lots to say here – I saw this advertisement ~~article~~ when it was published, but responding over Easter seemed like ‘work’ so (opens a beer) here we are now.
> Why do chatbots suck?
We’ve in the past used wit.ai (ok until FB absorbed it, then ¯\_(ツ)_/¯), luis.ai (who at one point wouldn’t let us train a model on our whole dataset, so we had to keep retraining until one worked), api.ai – who later became dialogflow – and were incapable of API version control. Like the time they dumped in a load of stopwords without telling anyone. Even though we basically tortured their platform, it still wasn’t good enough. So we then tried spaCy, which was at least deterministic but just too inflexible.
And, though they seemed good at the time, they were (and probably still are) actually all terrible.
> The chatbot can only deliver on a single intent.
And here’s my first real question here, jumping ahead slightly, sorry. Data gathering and marking up that data are incredibly intensive processes – you have to get hundreds or thousands of representative inputs and label them for the ML training.. and getting training data for multiple intents (and marking them up) is even harder.. how do you do expedite that, exactly, especially for multiple intents, which aren’t of course neatly parcelled up in sentences?
> Conversations with real people can be messy and full of nuance
100%. Different people who look at the same input (for labelling purposes) can interpret the input in quite different – and equally valid – ways: ambiguity in language has become an unexpected specialty of mine. Take for example these two phrases – _”get me a cab airport”_ – the airport could be your origin or destination. But _”get me a cab home”_ *unequivocally* means you want to go home. Language is (I have discovered) weird.
> and people often want multiple things at once
Which isn’t a really big problem, but as above, getting training data becomes exponentially more difficult with the number of inputs (let’s call them ‘slots’) you want.
> Ideally, a chatbot should be able to ingest a user’s query, understand what the user is trying to achieve, and then help the user achieve their objective—either by taking action or by generating a helpful, human-sounding response.
I respectfully disagree here, in a small way – even if the chatbot _”help[s] the user achieve their objective .. by taking action “_ it *still* has to give a response.
> Today, there are better, “intentless” ways to design chatbots. They rely on advances in AI, Machine Learning (ML), and NLP fields such as information retrieval, question answering, natural language understanding (NLU), and natural language generation (NLG).
Agreed, with caveats.
> Soon, chatbots will leverage these advances to deliver a customer experience that far exceeds today’s rudimentary Q&A bots. Imagine a chatbot that can:
So we already can:
– [x] Understand complex queries with all the messy nuance of human speech (_with caveats, obviously_)
– [x] Generate human-like answers to complex queries by drawing from a knowledge base.
– [x] Use natural language to query structured tables, such as a flight information database. (_though I’m not really sure what you mean here? Flight information APIs don’t support natural language queries?_)
– [x] Generate human-sounding phrases that match a specific dialect or brand tone. (_Our responses are bespoke and handcrafted in _code_ and not by ML. And I would argue that they have to be, but that’s a ‘Strong Opinion, Weakly Held’ as per SO’s founder)
A bit more about responses – there’s a constant tension between _”responses must be human-sounding and idiomatic”_ and _*”customers will want to edit responses”*_ – I wonder how you deal with the latter?
[Skip]
> Build a chatbot from the ground up to incorporate today’s superior technologies.
> This is the classic “build or buy” dilemma, but in this case, the answer is simpler than you might think.
> For a smaller development team with limited resources, building a chatbot from scratch to incorporate the latest AI, ML, and NLP concepts requires great talent and a lot of work. Skills in these areas are hard (and expensive) to come by, and most developers would prefer not to spend years acquiring them.
There are ten people in our startup. And we have all of this, and much more which I probably shouldn’t discuss here.
> What about development teams at larger organizations with resources to hire data scientists and AI/ML/NLP specialists? I believe it still likely isn’t worthwhile to build from scratch.
If I was assessing you as a provider I would be asking questions out data gathering and labelling – this is the hard stuff. Applying hyper-parameters to a model (OK, I’m out of my comfort zone here) seems easier. Kick the model until it works – *once you’ve got the data to train it with*.
> Imagine a big bank with a dedicated team working on its latest chatbot, including five data scientists working on a custom NLP pipeline. The project takes perhaps 18 months to produce a usable chatbot—but by that time, advances in open-source tooling and resources have already caught up with anything new the team has built. As a result, there’s no discernible ROI from the project compared to working with a commercially available toolset.
So, you take the labelled training data and throw it at the new tooling, and kick it until it works. It’s not my side of things, but I’m not convinced by this argument. Sorry to bang on, but once you’ve decided on your ontology and labelled the training data then you can (I would think) jump onto any toolset.
> Worse, because the chatbot relies on a custom NLP pipeline, there’s no simple way to incorporate further advances in NLP or related technologies. Doing so will require considerable effort, further reducing the project’s ROI.
As above.
> A toolset like Botpress can abstract the NLP side of things and provide an IDE for developers to build chatbots without hiring or learning new skills—or building the tooling they need from scratch. This can provide a series of benefits for chatbot projects:
I looked at your demo page, and quite honestly it took me back to the old techs above – wit, luis, dialogflow. I’m sure under the hood it’s better but I would look at how it handles things like this:
– Followups. Sometimes in my code I have to persist two different states – the previous state, and the state from the new input – merge them for the response, then decide what to keep. Here’s an example – _”how much to fly from london to paris on tuesday?_” followed either by _”what about wednesday”_ or _”what if I went from Luton?”_
– Disambiguation. _”Is that Paris (France) or Paris (Texas)_”?
– Deixis. _”What other flights do you have **on that day**?”_
Maybe you can handle all of these with your drag-and-drop-low-code interface.. but I somehow doubt it. And these are all problems we have successfully addressed. Now we just have to convince someone it’s actually true.. (for followups please contact burningcows@gmail.com – and yes, that is a Mars Attacks reference)