AI Hallucinations Explained: Why It's Not a Bug but a Feature
Talk
In this episode, Ian Griffiths delves into the concept of AI hallucinations, explaining how it’s commonly misunderstood in the industry.
AI hallucinations refer to the phenomenon where artificial intelligence systems generate seemingly plausible but factually incorrect outputs. Using examples from the legal and software development sectors, Ian argues that this behaviour should not be seen as a bug but rather as a feature that signifies AI's ability to contextualize language. AI, according to Griffiths, excels at creating conceptual frameworks to understand sentences, even when those frameworks describe events that never occurred.
Mislabelling this as hallucination leads to unproductive attempts to correct a behaviour that is integral to AI's functionality. By accepting and working with this aspect of AI, systems can be designed more effectively to harness AI's true capabilities.
- 00:00 Understanding AI Hallucination
- 00:27 Examples of AI Hallucination
- 01:55 The Misconception of AI Hallucination
- 04:08 Human Perception vs. AI Reality
- 05:43 AI's Contextualization Power
- 06:16 The Jimmy White Example
- 14:35 AI's Cultural Knowledge
- 17:04 Practical Implications and Conclusion
Transcript
You have misunderstood hallucination in AI. It's a phenomenon our industry just gets wrong, and I'm gonna explain what it is that we get wrong and why this will be stopping you from getting the most value out of AI today. To begin with, what do we mean by hallucination? It's the term that we use when an AI seems to just make stuff up. So there's some quite well-known cases of this. There was the law firm that asked a chatbot of some kind to find some case history that was relevant to the case they were working on, and it produced a load of what looked like perfectly plausible and applicable precedents. And the slight problem was that they hadn't actually ever happened. These were completely fictional, completely fabricated cases, and the lawyers got into a certain amount of trouble for not bothering to check that the thing the AI had found was not real at all. So it just made it up. It looked plausible, but it made it up. Another example: if you are a software developer, you've probably been using AI coding assistance, something like GitHub Copilot, and you will almost certainly have had the experience where you ask it to write some code for you.
It emits something that looks great. You have a look over it and go, "Yeah, that seems to be doing exactly what I want," only it doesn't work because it's tried to use an API that doesn't actually exist. It's built like fantasy code - the code you would like to write if only the library worked in the way you want it to work, rather than code that actually works. Now, this doesn't always happen, but it certainly can happen. So the broad theme here is one of plausible fabrication. The AI will sometimes just come up with something that seems completely reasonable, that looks absolutely realistic, but which it turns out to have made up, and that is what we describe as hallucination.
Now it's a colourful term, but I think it's problematic because it's actually harming our ability to exploit AI properly. 'Cause the problem with calling it hallucination is that it positions this as a bug, as a problem. And I'm gonna argue that if you fully understand what is happening when AIs hallucinate, you'll realize that's not what they're doing. And actually that behaviour is the main value that AIs today have to offer. Now that might seem like a bit of a daft statement. Why on earth would I be saying the bogus fabrication of information that's just not real - how's that useful? How can I be calling that a feature, not a bug? I think the key is to understand why it does what it does and why that is actually just the manner in which the value the AI has to offer emerges. And the insight that you get from this is helpful when it comes to understanding how to use AI. If you're designing AI-based systems, or even if you're just using AI, it's useful to know what it really does so that you don't try and use it in ways that aren't gonna work that well. So the problem in particular with failing to recognize this behaviour - with calling it hallucination and saying it's a problem - is that we then try and respond by trying to fix it. And in practice that really means we end up trying to bludgeon the AI into submission. We try and bend its behaviour until it looks what we want it to look like.
We end up trying to do things like ground it in objective reality as though that's actually possible. And for reasons I'm going to explain, you can't really do that. You can try and twist its behaviour and get something that sort of looks like that. You can certainly improve matters. You can certainly make it less likely to give you made-up answers, but I think your life will be better if you accept what it is that AIs do. 'Cause then you'll design better solutions that are able to work with it rather than trying to work against its fundamental nature. And the fundamental error is this: we as humans misidentify what today's artificial intelligences really are. We tend to project characteristics onto them that they don't actually have, because as humans we are predisposed to see other people. We are social beings. We naturally tend to exist around other people, and so many of the entities we interact with really are other people. And we tend to see people-like behaviour even when it isn't there. We have a bias built in. We tend to assume something is acting with agency, with intelligence, when it might not actually be doing that. Anyone who has used a computer will know this. It can be sometimes very hard to remember that the computer is not, in fact, a vindictive little git that's out to ruin your day. It's just a machine following a deterministic series of instructions. But we tend to perceive it - we tend to, as humans, we tend to project human characteristics onto it because that's just how our brains operate.
So we do this big time with AI because they just basically break our brain because the way we interact with 'em is to have what resembles a conversation. So it takes incredible effort not to perceive it as an intelligence. That's just - we're interacting with it in ways that we normally only interact with other humans. And so it's natural for us to perceive them as being humans, but they really aren't. The current AIs we have really are a quite specific cognitive function. They're not really fully autonomous entities in the way that we perceive them. And the specific function that they are able to perform well - and this is actually the big leap forward, this is the thing that AIs today enabled that was just completely out of reach half a decade ago - is their ability to contextualize language. They can take a sentence and work out what each of the words in that sentence is getting at within the overall context. So lemme give you an example of what I mean by this. I'm gonna give you a sentence. So my sentence is: "What year did Jimmy White win the World Championship?" We're gonna look at this sentence for a little while, so I'll say it again: "What year did Jimmy White win the World Championship?" That's our text for today. Some of you will know what that means.
I've split my audience. Some of you will know exactly what this refers to and will know precisely what I'm getting at. And some of you will go, "Who on earth is Jimmy White?" Now, whichever of those two camps you happen to be in, I would suggest that you try and imagine what it's like to be in the other half. If you know who Jimmy White is, just think about what people who don't know who he is make of this sentence. And if you don't know who he is, think about what your experience of this sentence is and how that differs from someone who actually has the full information available to them. Either of you will have no - either of these groups, whichever group you're in, you'll have no trouble parsing certain bits of this sentence. So for example, if we look at the phrase "World Championship," you don't need to know who Jimmy White is to get the idea that probably refers to some kind of sports competition. Or maybe it's not technically a sport. Maybe it's chess, but it's something that works like a sports competition. It's gonna be some sort of tournament or championship where people meet and compete and there are winners and there are losers, and eventually there's an overall winner. So that's an idea. That's a conceptual framework that pretty much all sports competitions fall into and other kinds of competitions as well.
And the phrase "World Championship" lights that idea up in our heads. That's part of what we do when we understand this sentence, but there are other words in here that you could also pick up on like "year" and "win." And these are gonna tell you that actually it's not just a sports competition we're talking about, but we're actually talking about a yearly sports competition. And also this sentence specifically concerns the winner of that competition in some particular year. So we've enriched the context that we got from the phrase "World Championship." So we've gone from that basic idea of sports competition and said, "Okay, there's some more we can say about that because of these two words here." And obviously those words in isolation wouldn't have those meanings at all. It's the connection between those words and this other phrase in the sentence that gives us the full context. This is what we do when we try to understand language. Now, even if you don't know who Jimmy White is, you will still have enough context here to go, "Presumably he's the competitor, right?" So whatever competition this is, Jimmy White presumably is the competitor, has to be the competitor. There's no other way to make sense of this sentence other than to say that he is a competitor in some yearly competition, and we're asking what year he won that competition. Might not know what the competition is or what his game is, or what his activity is, but we can still work all of that out.
Now, those of you who do know who Jimmy White is will have some additional context. You'll say, "Okay, he's a snooker player. Obviously he's one of the best snooker players in the world." And that in turn says we're talking about the World Championship. You may have spotted, I've misnamed it - that's not quite the right name for it, but you know what I'm talking about. If you're a snooker fan, you know perfectly well that what I mean is the World Snooker Championship, the event that takes place every year in a venue called The Crucible, which as you very well know if you're a snooker fan, is in Sheffield, which is a city in England. So that's all additional context that will be available to you if you happen to know who Jimmy White is.
If you follow the snooker, you'll get all of this too. And so armed with that information, you know that this question concerns the year in which the British snooker player Jimmy White won the World Snooker Championship at The Crucible in Sheffield in England. And you'll know one more thing: you'll know that never happened. You will know that the subject of this question is a thing that has never occurred. Jimmy White is sometimes referred to as the most successful snooker player never to win the World Snooker Championship. He has come second on many occasions, he's won other championships, but he has never actually won the World Snooker Championship and has now conceded that he probably never will.
This concerns a thing that didn't happen. It's a plausible concept. The concept of Jimmy White winning the World Championship is absolutely plausible. Anyone who comes second many times in a row is definitely a contender to win, and certainly he was working towards that. The idea of there being a year in which Jimmy White won the World Championship was certainly foremost in Jimmy White's mind as he worked to try and achieve that. So is it fair to describe this understanding process as hallucination? If a snooker fan reads the sentence and goes, "Oh, it's a trick question. He never won it," obviously, they understood the question well enough to formulate the concept of Jimmy White winning the World Championship. Are they hallucinating when they do that?
I don't think they are. I think they're just understanding the question. I would say that it is absolutely necessary to fabricate a conceptual construct that didn't happen. You are building an idea in your head of a thing that never happened, and you have to do that in order to correctly understand this question. And if you haven't done that - if you haven't done this thing where you fabricated a plausible concept that actually never happened - if you haven't done that, then you haven't understood the question. And yet this is exactly the process that we describe as hallucination when our artificial intelligences do it.
So why do we think it's something wrong? Why do we think it's a failure? Why do we accuse our AIs of hallucinating when they're just doing the same thing that a human would do when faced with a sentence? They just construct a conceptual framework that represents what the ideas in the sentence are. Part of it is just because of how AIs are set up to interact. They are trained to receive questions and answer them, so they've been trained to basically output the conceptual framework they create in the form of an answer to a question. But they've actually really been trained to make stuff up. The training is all about training them to predict a plausible next word in a sentence. So these things essentially chunk up one word at a time or sometimes a piece of a word at a time, and they're trying to work out what's the most plausible-looking answer, and plausible-looking next piece of the sentence based on the sentence so far. And they've been trained on numerous examples and they've over time brought in information about what a sports competition is, who Jimmy White is, what snooker is, where snooker competitions are held, and all these kinds of ideas get embodied in the thing. And then it just emits that idea. It creates in sentence form because that's what it's been trained to do. And so this is what we've made them do. So there's no surprise that they do it. The value here is all in the contextualization. The value is not in the actual answers they give, per se. The answer's just a way of packaging up the work that the AI has done.
So what is my point with all of this? So the first thing is that AIs are amazing at contextualizing words. The modern AIs that we have today, the great leap forward that they have made is the ability to take a bunch of words in a sentence and construct a conceptual framework around that. Now, exactly how they do it is buried in the guts of all of the neural network that makes up the AI. But to deny that they're doing that seems to be flying in the face of the evidence. It seems very clear that is exactly what they're doing. They are able to comprehend, they're able to build some sort of representation that we might call comprehending what the individual words mean in the context that they appear.
And this is - they do this hugely better than anything we had before the current crop of AI, starting with the large language models and the stuff that's followed from that came along. So that's the first thing. The second thing is that to be able to do this, these AIs have to embody cultural knowledge, they have to have general knowledge. This was always the hard problem to solve with natural language. It was the sort of "you have to boil the ocean" type problem, which is that in order to understand the sentence, you need a great deal of general knowledge and a great deal of cultural understanding of how words are used and what their meaning does as you introduce different concepts into the sentence.
And that is the thing these modern language models actually do for us. They have embodied in all of their weights and settings and parameters - they have in quite an obscure form, embodied a great deal of general knowledge. And here comes the critical point: that knowledge, those facts, those cultural information that inform its ability to parse the sentence are the input, not the output. So in processing a sentence, the AI uses its cultural knowledge that's baked into the model, but that's basically an input to the process of producing an answer. It's not trying to produce facts in the output. Facts are sort of part of the input, but the output is just a contextualized sentence.
It may have been phrased in terms of a response to a question. It may look like it's stating facts. But that isn't actually what it's doing. It's giving you a representation in natural language form of the construct that it's made, that represents the meaning of the sentence that you gave it. And it may well do that in the form of an answer to a question, just because that's how it's been trained to respond. So hallucination - this thing we call hallucination - it isn't. It's not hallucination. It is the necessary construction of context that you have to do to understand a sentence. When you parse the sentence "What year did Jimmy White win the World Championship?" you have to construct the idea of "there was a year in which Jimmy White won the World Snooker Championship," even though that never happened. The construction of that plausible but false thing is necessary simply to understand the question. You are not hallucinating when you do that. You are just understanding the question, and that is what AIs are doing. So what do we do with this insight? What practical value is this change in opinion gonna give us? So the first thing to say is that we don't have a general intelligence, a general artificial intelligence today. There's lots of people who really want this to be true, and today it just isn't. So handing everything off to a large language model and hoping that it will somehow apply its general intelligence and do the right thing is not likely to yield good results in general.
At best, it may fool you into thinking it's doing that because it's been trained up on enough examples that the thing you give it resembles something else it knows it's internalized, and it can regurgitate what appears to be intelligence without really understanding it. Now it is worth saying that a lot of the more recent enhancements in functionality actually take advantage of the point of view that I'm putting forward to give you an impression of general intelligence. They'll often have multiple phases of processing. They might use a language model to understand in some sense the input you've given it and then run some different process to build a plan of how to respond to that, and then take the output of that and feed it through another model.
Often things have gone through several different stages, each of which is playing to its own strengths, and together they come closer to resemble proper intelligence or agency or intent, but an individual LLM, the individual models that we get to interact with really don't have general intelligence today. Plenty of people believe that if you throw enough carbon emissions at it, you will eventually get one big enough that it will work. Maybe, maybe not, but today we don't have that. The next thing to be aware of is that this thing that they do - plausible fabrication - that's essentially their job. So if you say it's a bug when they do that, this is what it does. That's essentially the main thing that a large language model or one of the successors to that today will give you is the ability to fabricate a plausible, conceptual understanding of the sentence that's been given.
And they may phrase that back to you in the terms of an answer to your question, but essentially what they're actually giving you is a plausible fabrication, and that's what they're built to do. And critically, maybe that's fine. That may actually be all that you need. If you understand that's what they're doing, then maybe you can design your systems on the understanding that's what they do rather than trying to build a system where you try and deny that this is true. You just go, "That's what we've got. That's a useful thing. Let's build our systems to work this way." So for example, maybe one way to build a system is to use a model to adapt natural language input into some structured form that you then feed into a service that isn't AI-driven, that just uses conventional code to do some things that you know how to do, and then maybe feed the results of that through another language model to interpret it back to the user.
So in this world, language models are really acting as adapters. They take a user's request in natural language, often mediated with your natural language description of what your structured service has to offer. And then the AI - most modern AIs have ways they can end up making calls out to these structured services, and then you can feed the results into something else. So that's understanding the AI is an adapter, whose job is to pull out information, put it somewhere else, and then do more things with that. That's more likely to be successful than just hoping that the AI will magically solve your problems for you. Hallucination's a bad description 'cause it's actually describing the very thing that AI does best, which is constructing something that represents the meaning of the language it's been handed. That's literally what they're really good at. That is the thing that we've advanced on over in the last half decade. That the thing we do really well that we couldn't do before is contextualizing language. And hallucination, as we call it, is just an artifact of how we present the results of that. It's actually literally the thing AI does well, and you will get the most value out of AI if you understand that.
My name's Ian Griffiths. Thanks for listening.