Chatbots

Human or Chatbot?

Monday, July 15, 2019
15
Jul
By Andrew Chen, PhD

The Turing Test was one of the original thought experiments about the philosophy of artificial intelligence. This is the scenario - you’re sitting at a computer and having two conversations by typing into two different boxes on your screen. In one conversation, a real human person is responding to your messages. In the other conversation, a machine “designed to generate human-like responses” is responding. Can you tell the difference between the two, and identify the human and the machine correctly?

The Turing Test has become a benchmark for AI developers - if they can fool human users into thinking that their chatbots are other humans, then they’ve succeeded. This may sound eerily familiar, with modern-day chatbots making it really hard to know whether we are talking to a person or an algorithm. The most common place where this happens is in customer support systems on the internet. On millions of websites around the world, you can ask companies about their products or how their websites work, often through a chat window that looks like an instant messaging system at the bottom-right corner of the screen.

But how can you tell if you’re talking to a human or a chatbot? There are actually three types of systems that are commonly used by these websites:

  • Human-only: where real humans are responding to your messages
  • Chatbot-only: fully automated system with no human intervention
  • Hybrid: using a chatbot as a triage tool for simple questions, and passing on more complex or difficult questions to a human operator

To understand how we might be able to figure out who or what is responding to you, we should understand how chatbots work first. There are a couple of pieces of technology that get packaged together:

  1. Our message goes through Natural Language Processing (NLP). This stage unpacks your message and tries to convert your message from plain English into a structure that the computer can process and understand. For example, it will try to separate the nouns, verbs, and adjectives in your message, so that it can figure out what the key subject of interest is.
  2. The key terms go into a decision system, where the algorithm tries to figure out what sort of response it should give. For example, if you ask about the shipping cost for a particular item, then the algorithm goes through a database of information that the company has, and grabs the price. This part of the system is also responsible for maintaining a conversation and keeping track of previous messages - for example, if you asked about the weather in Auckland, and then asked about the weather next weekend, the chatbot should be able to remember both elements and localise the response correctly.
  3. The chatbot responds with some text. The really sophisticated chatbots are automatically generating their responses in real-time based on the information in the decision system (give Cleverbot a go). However, the vast majority of chatbots currently use pre-written messages from humans, where people have tried to guess what users might ask and have crafted responses for the chatbot. This is sort of similar to a script that a call center operator might use to ensure that there is a consistent and comprehensive experience across customers.

Understanding this process gives us a couple of clues about how we might be able to figure out if we are talking to a human or a chatbot. Each of these automated stages can suffer from some sort of failure, where the algorithm doesn’t quite operate the way we’d like it to.

At the first stage, if the NLP system fails, then the chatbot can reply with something that is completely irrelevant to what you asked. For example, you might ask a chatbot about the company’s privacy policy, and it tells you about public policy instead. Maybe the chatbot only saw the word “policy”, and didn’t realise that “privacy” was also an important part of that noun. It’s the sort of mistake that you wouldn’t expect a human to make, unless they were really incompetent (in which case, good luck getting help from them anyway!). Thankfully, these errors are less common nowadays, because NLP has become quite sophisticated and is pretty accurate (so this tends to only happen in older, outdated systems).

From Sam


The other type of error at this stage is where NLP can’t figure out what you’re talking about at all, often when your message is very short. This is most obvious when the chatbot says “sorry, I didn’t understand that” and leaves users frustrated because they have to try and rephrase their question in a different way.

The second stage (decision system) is where things get interesting. A common scenario is where you ask about something that the chatbot has no knowledge about - whereas a human might say “give me a moment while I try to find out”, a fully automated chatbot may not have a human to ask, and can only say “sorry, I don’t know the answer to that.'' If you ask something that’s even a little bit outside of the chatbot’s body of knowledge, it will often just give up. Another is where the chatbot loses track of the conversation, and forgets things that you literally just talked about five seconds ago:


From chatbot.fail

In the third stage, most chatbots currently use pre-written, canned responses. This means that if a chatbot starts repeating itself, you can be pretty sure that it’s not a human. If the answer doesn’t actually answer your question and just directs you to somewhere else, that can also be a sign that a pre-written response is being used. It can be very time-consuming and laborious to write all the possible responses, so often developers just don’t bother. It’s hard to have a conversation with someone when they only know how to say a few pre-written responses!


Oscar from Air New Zealand

Grammar errors like speaking in the wrong tense given the question can also be a dead giveaway, because the pre-written response can’t dynamically respond to subtle changes in the question. However, if a system is responding with spelling errors, then that’s probably a human doing typos in real-time! Spelling errors were one of the signs that exposed Zach, an AI that was supposed to be able to interpret ECG results and patient notes, but turned out to very likely be a 25 year old in Christchurch with a penchant for deceptive marketing.

Lastly, the most obvious tell-tale sign is the response time - if it takes more than a couple of seconds for the system to respond to your message, then there’s probably a human behind it running a live chat operation. Systems like Intercom just provide an interface for human customer support agents, and can tell you what the average response time is for that company (minutes, hours, or days). This is actually a good thing - human responses tend to be more accurate even if they are slower.

For now, chatbots still have many errors. While many vendors claim to have systems that can beat the Turing Test, when used in the real-world, there can be many contextual factors that give the game away. Fully automated chatbots struggle with maintaining contextual conversations with users, and unless they know when and how to escalate conversations to human operators, they frustrate customers and lead to lost sales. We still have a long way to go before we can hand over all of our customer interactions to the chatbots. The key lesson here is that humans still need to be kept “in-the-loop” - don’t move to full automation too quickly!

Keep an eye out on our blog, as we will continue to cover chatbots in more depth, including the top providers in the market right now and how to pick one for your business. We will also be writing about other forms of artificial intelligence and machine learning, focusing on practical and real-world applications of this exciting technology.

The Blog