Tech Overflow

AI, Without The Hype: ChatGPT and LLMs. Part #2

Hannah Clayton-Langton and Hugh Williams Season 1 Episode 7

Send us a text

Finally, a podcast that explains how AI, LLMs, and ChatGPT work without any hype, fluff, or hyperbole. This episode is aimed at smart people who aren’t in tech and just want to be able to understand the basics. Join host Hannah Clayton-Langton as she discusses the topic with former Google VP and OG AI expert, Hugh Williams.

We start by separating AI, machine learning, and LLMs, then explain why generative systems are not search. Instead of retrieving pages, an LLM synthesises new text using patterns learned from trillions of tokens. That leap was unlocked by transformers, the architecture that parallelises processing and models relationships between words through attention. Add weeks of GPU-heavy training in massive data centres and you get astonishing next-word prediction with long-range context.

Then comes the human layer. We talk through reinforcement from human feedback that nudges models toward helpful, safe behaviour, and the safety heuristics that block harmful queries or intercept trivial ones. We also get candid about limits: hallucinations that produce confident nonsense, bias from data and raters, weak arithmetic unless the system calls an external tool, and uneven image generation that’s improving fast. Along the way we share practical tips: how to compare outputs across models, when to fact-check with a second system, and why grounding responses in reliable sources matters.

If you’ve heard about trillion-token training runs, NVIDIA GPUs, and “stochastic parrots” but want a clear, human explanation, this one’s for you. You’ll learn how LLMs actually work, why they feel so capable, and how to use them at work like a fast intern whose drafts still need your judgement. Enjoy the deep dive, and if it helps you explain AI to a friend, subscribe, leave a review, and share your favourite takeaway with us.

Like, Subscribe, and Follow the Tech Overflow Podcast by visiting this link: https://linktr.ee/Techoverflowpodcast

Hannah Clayton-Langton:

Hello world and welcome to the Tech Overflow podcast. I'm Hannah Clayton Langton, and after several years as a non-engineer working in tech companies, I decided that I wanted to understand more about what was going on under the hood.

Hugh Williams:

Yeah, and my name is Hugh Williams. I'm a former Google vice president, also a vice president at eBay, and a pretty senior engineer at Microsoft. And my job here in the podcast is to help demystify tech for the smart listeners out there, and I guess also for you.

Hannah Clayton-Langton:

Exactly. So we're the podcast that explains technical concepts to smart people.

Hugh Williams:

So we're here in uh person, finally, again, Hannah.

Hannah Clayton-Langton:

Yeah, so Hugh has made it to London. He's looking pretty good for the jet lag. And we're recording an in-person episode today, which is something we love to do but don't get to do enough.

Hugh Williams:

Yeah, and we're gonna get a chance again on uh Friday as well. So two in one week. It's gonna be pretty exciting.

Hannah Clayton-Langton:

It's awesome, and I'm especially excited because today I found out is Ada Lovelace Day. So those who listened to our first episode on computer engineering will know that Ada Lovelace was the first computer programmer, or often known as the first computer programmer, despite being around in the 1800s. And today is a day celebrating and commemorating all of the women in STEM and all of their achievements today and all the amazing things I'm sure that they will bring to the industry in future.

Hugh Williams:

Fabulous. And we're going to dig into a deep computer science topic today. So I think it's uh quite appropriate given it's Ada Lovelace Day. So large language models, part two of our uh AI series, Hannah.

Hannah Clayton-Langton:

Large language models, AI. I think before we go into the detail of LLMs, as I'll be referring to them, let's just remember what we took away from our last episode.

Hugh Williams:

So we we spoke a lot about the field of artificial intelligence being around since the 1950s, very, very broad field, but I think we spent most of the episode last time talking about machine learning.

Hannah Clayton-Langton:

Yeah, and one thing that I think is super important to lay out is that when we talk about AI, most people are actually talking about large language models or LLMs, uhKA ChatGPT, which is one of the consumer products. There's a few others like Claude out there. I took away last week that artificial intelligence, much broader than that. Plenty of applications that will be in use in all of the technology that we use day-to-day. And when we're talking about ChatGPT or adjacent products, that's when we're really just talking about LLMs.

Hugh Williams:

Yeah, exactly. And LLMs, Hannah, as you know, are part of machine learning. So we've got artificial intelligence, that big broad field, machine learning's part of that. Large language models are a type of machine learning. So that's our topic for today.

Hannah Clayton-Langton:

Awesome. And another point that I find particularly neat here when we talk about AI and LLMs is that LLMs are the first time that AI isn't like hidden away under layers of computer. And it's basically exposed with a user interface, of course, directly to consumers and folks to use as they see fit, which is a bit of a revolution. We talked about like the iPhone being sort of an adjacent revolution in our last episode.

Hugh Williams:

Yeah, I think you know the smartphone is really uh the first time everybody had a computer in their pocket. And I think these large language models are a bit the same as the first time consumers have had access to AI because, yeah, you're right, Hannah. You know, AI really pre-the-large language models was something that large corporations use to achieve tasks. So whether it's you know doing fraud detection on your credit card, whether it's ranking at Google, uh recommending the clothes you should buy, whatever it is, sending you emails. Um, those things were AI systems built by large corporations to do one specific task. And this is the first time that consumers have had access to AI in their pocket. And also it's a very different kind of AI, right? Because it's a it's a generalized AI. So this is something that can carry out lots of tasks, probably even tasks that the people who built it didn't design it for.

Hannah Clayton-Langton:

And that generalized point is interesting because, and I think it's a good segue into some of the technical detail, because most folks are using Chat GPT or equivalents as sort of a replacement for Google. But one thing I took away from some of our prep for this episode was that the way that this technology works is fundamentally different to the way Google or another search engine works. So maybe we start with that as our segue into the technical stuff.

Hugh Williams:

Yeah, spot on. I think that's a really good point, Hannah, because if you're using something like Google, what it's doing is it's processing the query that you give it, and then it's retrieving documents that might be great answers for your query. So you're actually getting back documents that have already been created and organized in an index. So it's a little like Google's taking you around the library and showing you the books. These LLMs are completely different. So the answers that you're getting to the questions that you're asking are synthesized text that's been created on the fly by the LLM. So this isn't text that exists, it's much more like having a conversation with a smart analyst or a smart associate or a smart intern, and that quote unquote person that you're talking to, if you like, is actually creating the text in response to your query. So that's a very, very different experience to Google.

Hannah Clayton-Langton:

And something else I've picked up as sort of a key differentiator in our previous episode and some of our prep work is the scale of effort that's required to sort of create one of these things. Which again, as a as a lay person just downloading the cool new app that everyone's talking about if that's ChatGPT, I don't think you necessarily are naturally giving like credit to what it takes to build one of these things. Can you sort of scale that for the listeners?

Hugh Williams:

Yeah, look, I mean, we talked about sort of, you know, quote unquote old school AI last time. I mean, we were really talking about AI models that are trained using a few computers perhaps in a few hours or a couple of days, and they probably take millions of examples to come up with the model that detects credit card fraud or or ranks in the search engine. The scale of LLMs is just so incredibly, incredibly different. I mean, I'll use the word token a little bit later on, but you can just think words for now. These LLMs are trained on hundreds of billions or maybe even trillions of words to be able to generate the text that they generate. I've heard estimates that say that OpenAI's latest GPT-4, so the Chat GPT that you're using today was trained on about 13 trillion tokens, about nine trillion words, which is a lot of zeros. It's a bit like reading every single article, book, and post on the internet many times over. I did a little bit of a back of the envelope calculation, actually, Hannah, as I was coming in. I think it's a little bit like reading one book every second for about 15,000 years.

Hannah Clayton-Langton:

Okay, so it's trained on quite a lot of information.

Hugh Williams:

Yeah, and that training takes a very, very long time, weeks or months, and costs an enormous amount of money. So to create one of these models that we're that we're using all day long is probably, you know, 50 to 100 million dollars gets spent on using infrastructure and electricity just to create this model that can generate this text. So this is a vastly different scale to the AI that we we were talking about last time.

Hannah Clayton-Langton:

Okay, so huge amount of cost involved, huge amount of effort, huge amount of humans involved, right? In terms of building some of this.

Hugh Williams:

Yeah, absolutely. And you wouldn't want to get it wrong, right? Like you'd if you're gonna invest that amount of money, you'd want to be pretty sure that you're gonna get the outcome that you want to get through the training process, because this is as much as making the next James Bond movie, uh, investing in one of these uh in one of these new models.

Hannah Clayton-Langton:

While we're on the topic of the training process, is that deep learning, if I remember correctly from the last episode? That's really what we mean when we talk about deep learning.

Hugh Williams:

Yeah, that's that's really the heart, if you like, of large language models. It's about feeding in trillions of words or tokens, and then it's about discovering patterns within that text. And we call those parameters. Uh, and these systems have billions of parameters. So these systems they're they're called neural networks. When the neural network has lots and lots of layers, we refer to it as being quite deep, and that's where this idea of uh deep learning comes from. So it's been around a while, but really what it's about is taking this vast amount of data, so these trillions of words or tokens and learning in a very sophisticated way the patterns that occur within that data and representing those patterns in a in a model.

Hannah Clayton-Langton:

Okay, so one follow-up question and then one observation. Is all of that data that you mentioned going in, is that when we talk about unstructured input?

Hugh Williams:

Yeah, yeah, I think that's a that's a good way to think about it. Hannah, if we think about the AI we talked about in our last episode, I'd say it's a lot more structured. So you remember talking about things like free shipping and shipping cost at eBay as being one possible parameter that could go into our model. With these large language models, we're really just giving it unstructured text. So we're giving it all of the text that we could possibly find. So all of the World Wide Web, all the books we can find, the source code, whatever it is that we can find, we're giving that all to it, and we're asking the deep learning system to actually discover the patterns all by itself. So the data's quite unstructured, and the system goes about discovering the patterns within the data. So we're not telling it what the parameters are, we're letting it discover those parameters itself.

Hannah Clayton-Langton:

Well, and this is the this was my observation. I'd be interested to know your engineering take on it. So I've taken away that the deep learning like consumes a huge amount of information and then it basically figures out on its own what the patterns are.

Hugh Williams:

That's exactly right.

Hannah Clayton-Langton:

And I've heard a few engineers, I think yourself included, describe it as a black box and be like super excited about the magic or the black box, which I agree is really exciting, but I find it kind of counterintuitive because every engineer I've ever met like won't sleep until they understand exactly how everything works in a super intricate way. And then when it comes to LLMs, they're sort of just like, oh, this is amazing. We have no idea what's going on, which like I find to be very un-engineery.

Hugh Williams:

Yeah. And if I go back to my time at Microsoft, we used to have all these diagnostic tools for uh for Bing. So if we had a query and we're wondering why the results that we were seeing showed up, we could actually go and look at this tool, and this tool would sort of diagnose for us what were the likely signals that were causing a particular result to show up. So often we'd get an executive who email us and say, I ran my query, which was my name, very common. We've all searched for our name, let's be honest. And uh, you know, this this bogus result turned up as the first result, what's going on? We used to have these diagnostic tools that we could use to understand what our ranker was doing, and then we'd explain back, oh, it's really, really sensible, and here's what actually happened. But today, with these large language models, that's pretty much impossible to do. In fact, somebody said to me the other day, how does it summarize a document? And I said, Well, nobody knows, really. So when you say, please shorten this document or summarize this document or turn this document into bullet points, it's just seen enough of examples of that in the vast amount of text that it's seen that it's able to carry out that task, right? So it's seen examples of a long document shortened to a shorter document, it's it's seen an example of uh an essay turned into PowerPoint slides, whatever it is, it's seen enough examples of that in the trillions of words that it's seen that it's able to do that. So you give it a simple instruction like summarize or shorten, and it can take the following content and know what to do with it. So it's a little bit like having an intern, right? Like if you if you sent the intern an instruction, you said, look, can you please summarize this document for me? After they've done that a couple of times and you've given them a little bit of feedback, you can give them a third document and they'll do a pretty good job, right? And so that's exactly what's going on with this large language model, is it's just able to do it, and nobody's really able to explain exactly how or why.

Hannah Clayton-Langton:

I think there's an important distinction which maybe we'll talk about later, which is it understands patterns, but it doesn't necessarily apply comprehension. Yeah, exactly. We'll talk about some of the fallibilities of it maybe further down in the episode, but it's just super good at identifying patterns, and you feed it enough source and training data, you get something that feels like you're talking to a human, but actually it's just good pattern identification fed with a bunch of data that it can sort of preempt the answer that you might want.

Hugh Williams:

Yeah, exactly. I mean, I think in in tech circles we say like it's a really advanced stochastic parrot, which is another way of saying it's a statistical parrot. You know, it's like the old uh monkeys with typewriters. You know, if you have enough monkeys with typewriters and they hit the keys, it eventually end up with the works of Shakespeare. I mean, these are highly tuned statistical monkeys that are capable of pressing the right keys at the right time and churning out output that seems to make sense, but it's just it's just generating data based off the patterns that it's seen.

Hannah Clayton-Langton:

And how are we making sure that the monkey that types out Shakespeare is the one that we're listening to? Like, how are we feeding back to this model to check that it's identifying the right patterns?

Hugh Williams:

Yeah, great question. So we'll we'll come back and talk about transformers in a second because that's an important piece of technology that sits within the field of deep learning that's made all of this possible. So we'll have to come back and talk about that, Hannah. But once this training process is done, so we've done our deep learning with this transformer technology, what actually happens inside these large companies is that the system is trained one more time and it's trained using human feedback. And we spoke a little bit about that in our last episode, but but let's imagine you you're working in one of these large companies, you're at OpenAI or Anthropic or Microsoft, wherever you are, and uh you finished this training process, took weeks, months, cost an enormous amount of money. What you'll then do is you'll have a series of questions that you'll ask the model, and the model will churn out multiple answers to those queries. You'll give those answers to human judges, and you'll ask the human judges lots of different questions. You know, one question might be rank these from the best answer to the worst answer. Another question might be, you know, identify any safety issues that you see in any of the answers. You might say, which style do you prefer? So you can ask humans all sorts of questions about the output, and then you can take and collate all of that output that comes from the humans and use that to adjust the model if you like. So the model gets a little bit of feedback about what are the preferred things that it should be doing, and that adjusts the weights within the model, and the model becomes more polite, more friendly, safer, all those kinds of things and starts to produce answers that that humans like a lot more. So you can't just deploy these models, you actually have to, you know, train them with humans a little bit afterwards.

Hannah Clayton-Langton:

Okay, and that essentially becomes another pattern recognition that feeds the smarts inside of it.

Hugh Williams:

Yeah, exactly. So these systems without that piece of human feedback at the end are really just giant pattern generating machines. So they'll just generate text. But with this additional step of providing human feedback, they learn how to, you know, adapt to human style, human preferences, they get more chatty, they get safer, they get more reliable in answering the questions. So there's an enormous human effort that goes on at the end to kind of adjust the weights within the model to get them to do the things that they do today. So that's really what happens in the consumer product, if you like. So when you're using something like ChatGPT, you're not just using the original model, you're using something that's been adjusted and put through scenarios that make it a lot more human-like.

Hannah Clayton-Langton:

That is not something that I was expecting to form part of this whole process. So I think that's a really cool insight. And I imagine that that will be news to a lot of the listeners as well, even those who are using ChatGPT quite a lot.

Hugh Williams:

So the other thing I'd say too, while we're on that topic, is there's a lot of uh what we call heuristics on top of these systems, which is basically like human-written rules that stop the system or cause the system to behave in a certain way, right? So you can't today successfully ask these systems to do something illegal. So you can't say, hey, teach me how to make a bomb. Um you can't ask them to, you know, for self-harm information, you know, things that are illegal within your jurisdiction, whatever it is, it'll just say, sorry, I can't do that. And most of that is done with handwritten rules. So there's handwritten rules looking for certain keywords and certain patterns, and when that happens, the question that you're asking is is intercepted and you get a standard canned answer back. So there's an enormous number of rules sitting on top of these systems, in addition to the human feedback that teaches it to be more human-like. So there's a lot of sort of human effort that goes into getting one of these systems from sort of being in its wild state, if you like, into being a consumer product that we can put in your hands.

Hannah Clayton-Langton:

Okay, that is super interesting. And I feel like I jumped us ahead. So we were talking about training the model, which happens using deep learning. Uh, and then we talked about Transformers, which I I know from my prep is what the T in chat GPT stands for. Do you want to walk us through that now?

Hugh Williams:

So there was a landmark paper that came out of Google in 2017, which is called Attention is All You Need. And I'm sure some of our listeners will have heard of the paper. It's one of the most famous papers in modern computer science. And it's really a landmark paper because it made this idea of deep learning architecturally something that could be used to process trillions of words and identify billions of parameters and build things like ChatGPT, Claude, Gemini, and and so on. So deep learning itself was a very important field, but it was really used for things like natural language processing and image generation. So it was sort of sitting in a corner of computer science doing really interesting things, but it was never able to be scaled to web scale to build a product like this. But this transformer idea blew that up and made it possible so that we could we can build the products that we use today.

Hannah Clayton-Langton:

Yeah, okay. So just to play back so far, transformers are one of the key aspects that underpin the application of deep learning to the scale that is required for an LLM.

Hugh Williams:

Yeah, you've got it.

Hannah Clayton-Langton:

Okay, and the deep learning existed in other, possibly smaller applications, sort of deep inside smart bits of technology, but it had never been scaled to the point of costing as much as a blockbuster movie to train.

Hugh Williams:

Yeah, exactly. And so for people like me, you know, I was working on search at eBay and whatever else. Deep learning was just a an intellectual curiosity that was sitting in research, really, you know, was doing things like identifying objects in images using lots of processing, but it was not an idea that was capable of being used at the scale that we were working. But with this transformer breakthrough, that all that all massively changed.

Hannah Clayton-Langton:

And before we get into the detail of transformers, I that's a bit of a pattern that I've seen when we talk about different technical concepts, is a lot of them start as like intellectual theories that people get excited about, and then slowly the thinking develops, and then you might find use cases that uh interface with like modern life. Is that fair?

Hugh Williams:

Yeah, I think that's I think that's absolutely right. You know, suddenly computers can be miniaturized enough that you can build a smartphone and put it in your pocket. You know, at some point computers were just something abstract concept that sat in giant rooms in defense installations and whatever else. So you need these kinds of breakthroughs for these kinds of things to happen for sure.

Hannah Clayton-Langton:

Okay, now tell me more about transformers.

Hugh Williams:

Okay, I'll do my best, Hannah. A couple of things about transformers. So the first thing is that a transformer can understand the relationships between the tokens or words that it is processing in a way that's pretty cool. So I'll make up a really simple sentence, you know, the cat ran around the room chasing the mouse, and then she sat on the mat. If you just had regular deep learning, there's a couple of problems with regular deep learning. So the first is it can only process a word at a time through our sentence. That's annoying for us computer scientists. We want to do things in parallel, so we want to be able to throw lots of computers at the problem and process all the words at once. We don't want to be going left to right, that takes forever. So deep learning had that problem. Transformers completely changed that and allowed many computers to be thrown at the problem, and each computer could process a word. And so suddenly we could process all the words in that sentence at the same time, not just one word at a time throughout the sentence. But the the really big breakthrough in Transformers, besides this sort of speeding up aspect, was that the transformer could consider the relationships between the words. So in old deep learning, by the time we got to um she sat on the mat, we'd have forgotten who she was. But with transformers, we're able to understand which words in the sentence influence which other words. And so by the time we get to she, we can say, oh, she means the cat. And so it's like having a highlighter if you like. You can go back through all of the text, vast amount of text, and you can highlight all of the words that are related to a current concept that you're trying to process. And then that allows you, when you're actually generating the text later on, to generate more plausible text because you understand the context of each word. Those words don't just exist in isolation.

Hannah Clayton-Langton:

And what does that mean in real terms for a chat GPT user or an LLM user? Is that what makes it sound like human and credible?

Hugh Williams:

Yeah, so it's improved enormously the accuracy of the text generation because it's able to understand context much better. So it's not just like autocomplete that we're used to on our phone or maybe in Google or when we're using Microsoft Word. So it's not just generating the next most probable word given the last word that you've typed. It's able to use all of the context of the essay that you're writing to generate the next word. And so the chances of it generating a really plausible word go up enormously because it's able to understand all of the relationships between the words.

Hannah Clayton-Langton:

And for the avoidance of doubt, this is not because the computer can understand anything, it's because it's using an enormous amount of information gone before to recognize patterns and therefore have a better guess at what word you'd want to come next.

Hugh Williams:

Yeah, that's it. I mean, what these systems are doing is they're trying to generate text. And given the set of words it's generated so far, it's trying to generate the next most probable word. And if it's got enough context and an understanding of the relationships between the words, it can do a much better job of generating the next word. So it's a it's a very, very advanced autocomplete, if you like, that understands all of the context of hundreds, thousands, perhaps tens of thousands of words when it generates the next word rather than a simple autocomplete, which is really just looking at the last word or two.

Hannah Clayton-Langton:

Okay, two questions then. Is the magic of an LLM basically the power and smarts of the deep learning combined with the user relevance of the transformer's output? So like it takes something that's super smart at patterns and it combines it with something that can speak to people in a way that feels relevant. Is that sort of deep learning plus transformer equals LLM?

Hugh Williams:

I think that's right. Hannah, the other thing I'd throw in there is there's a vast amount of data that went into training this. So it's it's really big data, plus, as you say, deep learning, plus this transformer technology, and then what we talked about earlier on, which is the heuristics and rules and the feedback that teaches the model how to be a little bit more human-like. So there's really a few components that go into it, but all of that put together gives you the consumer products like ChatGPT and Claude that are in our pockets today.

Hannah Clayton-Langton:

And what's kind of reassuring for me there is that it feels like the cinch or the thing that really enables these models is actually human input. So like the heuristics and the human feedback, which I didn't know of, but it makes me feel a little bit less like computers are fully going to take over the world.

Hugh Williams:

Yeah, absolutely. You know, the human in the loop piece is really, really important because we have to teach it how to be more human-like, not just generate text based on all the text that it's seen. But we should also worry about the bias that comes with that. So we're not just getting the bias that comes from the text that goes into it, we're now also getting the bias from the humans who are giving it feedback and teaching it to be more human.

Hannah Clayton-Langton:

Okay, and potentially basic question, but the humans that are giving it feedback, are they like the software engineers that are building it, or are they a random, or maybe not random, a conscious cross-section of people like basically off the street who you ask to review the models' outputs?

Hugh Williams:

Yeah, absolutely the latter, Hannah. I can't imagine the software engineers wanting to do the former, you know, if they're like.

Hannah Clayton-Langton:

Well, there would be some biases definitely in there if they were.

Hugh Williams:

They'd be like, can we just get this over and done with? I want to get back to writing code. But no, no, it's uh it's a hired workforce, typically, you know, um paid per task or per hour. And the trick here is you've got to write down the task really, really carefully that you want them to do. You've got to train those people so they can perform the task, and then you've got to provide tools so that they can actually provide their input, collate that and take that back to the system and managing the quality of these people, you know, the workforce itself, paying it, you know, it's a big, big project in itself. But this is happening at a an industrial scale, huh?

Hannah Clayton-Langton:

Huge operation. And then the people who write the heuristics, are they like the product managers?

Hugh Williams:

I I would say, um, I'm guessing a little bit, but there'll certainly be product managers who are thinking about the why and what of the heuristics. But uh the heuristics are probably written, you know, in a in a language by you know people who are trained in writing heuristics. Because you know, you want to have a safety team looking at safety issues who understand the safety issues, and they're probably trained to use these tools to write the heuristics and maintain the heuristics and stay on top of the kinds of issues that are coming up as laws move, as people try and hack these systems, as new models come out with new issues. So you'd probably have a specialist safety team that's writing heuristics that are related to safety, for example.

Hannah Clayton-Langton:

That sounds like a super interesting job. Okay, and the T in chat GPT stands for Transformer, is that right?

Hugh Williams:

That's right.

Hannah Clayton-Langton:

Okay, what about the G and the P?

Hugh Williams:

Um so G is generative, which basically means it generates text. Um and the P is pre-trained, which basically means that a training process happens. So GPT, though, it's worth mentioning, is a trademarked proprietary term for open AI, but you can just think of that as meaning large language models. So the only folks who are using GPT as a term are the folks at OpenAI. Everybody else just says large language model.

Hannah Clayton-Langton:

And all large language models are generative predictive transformers.

Hugh Williams:

Yeah, yeah. I think uh OpenAI would say that. Generative pre-trained transformers.

Hannah Clayton-Langton:

What did I say? Predictive. Generative pre-trained transformers.

Hugh Williams:

Yeah, it's a common mistake, actually. People say predictive.

Hannah Clayton-Langton:

So the an LLM is like the generic version of that, and then like ChatGPT, I presume, will purport to have loads of really cool proprietary smarts that makes those unique, just as a car is generic and then like Mercedes has a brand.

Hugh Williams:

Yeah, that's right.

Hannah Clayton-Langton:

Okay, so we've covered the training behind the model, and we've covered how it's creating like human like output. But what about as a user when I'm asking it to like plan my holiday to Florence? What's going on there?

Hugh Williams:

Holiday to Florence sounds good.

Hannah Clayton-Langton:

Yeah, I should book one. I better have one boat.

Hugh Williams:

You probably need a holiday.

Hannah Clayton-Langton:

Yeah.

Hugh Williams:

Yeah, yeah, yeah. One thing I did want to mention along the way is sort of how this training happens using this transformer technology. So we should talk about maybe GPUs and data centers for a second. So it costs 50 to 100 million dollars to do this training process. One of the big costs in there is the hardware itself that's used in the training. So I'm assuming you've heard of GPUs.

Hannah Clayton-Langton:

So GPUs are what NVIDIA produces, is that right?

Hugh Williams:

That's right.

Hannah Clayton-Langton:

Okay. Yeah. So tell me why they're so valuable because I hear a lot about NVIDIA, but I am not super aware of what's so special.

Hugh Williams:

I feel like NVIDIA is one of the companies that's really hit the jackpot. It's sort of owned the gold mining technology when suddenly gold became popular or something. But uh the GPUs are graphical processing units, so or graphics processing units. And a graphics processing unit, you know, traditionally was a card that you put into a high-end personal computer under your desk when you had an application where you needed high-powered graphics. So you're a gamer, you want to play games, and you know, you need you know high-speed graphics uh on your desktop computer, you'd put a GPU card in there, a graphics card in there. Also, people, architects, those kinds of people used to use GPUs. Yeah, CAD and those kinds of things.

Hannah Clayton-Langton:

So And is that because you just so I can check my understanding, is that because there's like a good amount of computing power that's required to create high-quality images versus like text or something?

Hugh Williams:

Yeah, graphics is a very unique thing where there's lots of things done in parallel, lots of things done at the same time, and it requires very specific kinds of maths to do things like rotate shapes, have shapes move in front of each other, those kinds of things. Graphics requires, you know, vector arithmetic. So that's really doing things that help shapes move in front of another shape, spin shapes in real time. So things that, you know, it's fast sort of matrix maths, and that was you know really, really important in graphics. It turns out, though, that maths is incredibly important in the transformer technology.

Hannah Clayton-Langton:

So they did you the same type of math?

Hugh Williams:

Yep. Okay. Yep. So to do the transformer computation at scale, you want parallelisation and you actually have to parallelize this vector maths. And it turns out GPUs are perfect for that. So these very expensive graphics cards turn out to be the gold mining shovels, if you like, for training large language models.

Hannah Clayton-Langton:

Okay, and just a bit of a throwback to our product episode. Slack, like the the program that people use for communications at work, didn't that also come out of a gaming company?

Hugh Williams:

Yeah, yeah. I think uh the legendary story is that uh those folks were uh gaming company and then they they built a tool on the side to allow themselves to communicate better within their company, and then uh the gaming side of the business didn't go so well, they realized that the chat tool was uh was pretty cool, and the rest is uh the rest is history, I suppose.

Hannah Clayton-Langton:

So there's some pretty valuable accidental offshoots coming out of gaming companies.

Hugh Williams:

Yeah, absolutely, absolutely. And you know, gaming is um it's pretty hard maths, it's sort of the the essence of it, and GPUs are really the the tool that's used for that hard maths, and it turns out that that maths is also pretty useful in large language models.

Hannah Clayton-Langton:

So you have a bunch of GPUs which cost like 40 grand or something.

Hugh Williams:

Yeah, of that order.

Hannah Clayton-Langton:

In USD.

Hugh Williams:

Yeah, absolutely. So, you know, 20 to 40,000 bucks per GPU card, and uh, you know, these data centers are absolutely full of them for this training process.

Hannah Clayton-Langton:

Just for the training process.

Hugh Williams:

Yeah, they're using the the evaluation as well. So when you want to buy your holiday to Florence or whatever it is that you're asking about, or help me cook a recipe or whatever those kinds of things, the GPUs are used in that as well, but you need vastly more infrastructure for the training than you do for the inference, which is the the what we'd call the the question-asking piece of this.

Hannah Clayton-Langton:

Okay, because there is a whole topic of conversation around like the compute power required by LLMs and like the environmental cost of it and the financial cost of it, but the main consumption of that compute happens in the training phase.

Hugh Williams:

Correct.

Hannah Clayton-Langton:

Okay.

Hugh Williams:

So training takes weeks or months, probably costs 50 to 100 million dollars to do. When you type a question, like, you know, help me plan my itinerary to Florence, that probably costs a very small fraction of a cent.

Hannah Clayton-Langton:

Okay, because sometimes I feel guilty asking Chat GPT stuff, but it sounds like that's not where I should be feeling guilty.

Hugh Williams:

No, though I was seeing a little bit of uh bit of maths coming into the episode about how much you know saying hello and thank you costs when people do that all the time. Because people feel like they have to be polite to this.

Hannah Clayton-Langton:

I had heard this and it generates like a massive environmental like impact, people just not wanting to be rude to a computer.

Hugh Williams:

Well, it doesn't actually now, but I'll do the back of the envelope maths and then we can talk about why it doesn't now. But I kind of figured out that if there was a hundred million people a day saying please and thank you, that that would work out to be a few million dollars a year in compute cost for any of these companies. So it's not, you know, it's not trivial. But what I've also figured out is that the The companies are now intercepting those queries in some way and uh just generating pre-canned responses. And they can do that in one of two ways, or maybe even three. So one is you could just do it in the app. And we talked about apps in one of our episodes. You could just say, you know, Hannah typed in thank you, and you could just respond with the app and say no problem at all, or give a thumbs up sign without ever actually sending it to an LLM.

Hannah Clayton-Langton:

That sounds sensible.

Hugh Williams:

Yeah. Second thing you could do is you could just do that in the cloud. So you could you could do that same thing as soon as the request arrives, you could inspect it for some common words and just turn it around and send it straight back. The third thing you could do is you could have a different model that's a cheaper model, if you like. Okay. Um, that's capable of doing really trivial tasks, and you only send the hard tasks to the to the larger model. So there's lots of different ways of intercepting those kinds of trivial tasks.

Hannah Clayton-Langton:

But regardless, you're not like calling out to the huge model to come up with a response to something that's basically like an afterthought and inconsequential.

Hugh Williams:

Yeah, exactly. We should make it a little bit like uh Marvin the Paranoid Android for those people who've uh watched Hitchhiker's Guide to the Galaxy.

Hannah Clayton-Langton:

Well, that's over my head, but I was gonna ask you, is that an example of heuristics then or not?

Hugh Williams:

Yeah, it's a perfect example of a heuristic.

Hannah Clayton-Langton:

Okay, there we go. Yeah, yeah, love it. Okay.

Hugh Williams:

Hey, this series is working.

Hannah Clayton-Langton:

So um that makes LLM sound pretty awesome, which we know they are, but there's also a bunch of stuff that they're not so good at.

Hugh Williams:

Yeah, so they hallucinate, I guess, is probably their biggest problem. You know, hallucinates, I guess, a term that it's got in the public consciousness. You know, what it really means is it with great confidence makes up things that aren't true.

Hannah Clayton-Langton:

I know some people that do that. Yeah. Yeah, yeah, yeah. Humans that do that. But yeah, so is that because it's found a pattern or it thinks it's found a pattern and it's generating an answer that's based on pattern recognition but not comprehension of what it's saying?

Hugh Williams:

Yeah, exactly. So it has no ability to go and fact-check the things that it's producing. It's just producing things that are highly probable given the data that it's seen. So when it confidently says that Winston Churchill invented the internet, you know, it's just doing that because that seemed like a plausible pattern that should be generated.

Hannah Clayton-Langton:

And could you get an LLM to fact-check itself, or does that get a bit meta?

Hugh Williams:

I think that's getting a bit meta, but I I could see that happening in the future. You know, we're probably not too far away from having some other technology fact-check the LLM. But one common thing that uh our listeners can try if you've got access to multiple LLMs is to take the output of one LLM and give it to another LLM and ask it to fact-check it. So that's a way of using the LLMs to keep an eye on the LLMs, and often you can uh get rid of the falseness that that comes out of the LLMs by doing that.

Hannah Clayton-Langton:

And they do all say, like somewhere in the user interface, you must check the output of these models. Like it's not necessarily face in fact.

Hugh Williams:

Yeah, absolutely. In fact, Deloitte is one of the big consulting companies that I'm sure all of our listeners have heard of, are in a lot of trouble in Australia right now because uh for $440,000 they authored a report, and an Australian academic read the report and uh figured out that large slabs of it was generated with AI, including a bunch of citations that were made up. So there were citations in the back that didn't actually exist that just seemed like plausible sounding uh citations, and same with some of the footnotes that were created. So uh it's all over the news. Uh certainly a big story in Australia right now. But certainly, you know, if you're using these tools in a professional environment or even a non-professional environment, you should take great care to make sure that the output is actually true if that's important to you.

Hannah Clayton-Langton:

That is super interesting, and uh, this is more of a tease for a future episode. I was talking to some engineers at work about how they use LLMs to write code and whether or not it saved them time. And they were like, at this point, it saves us time writing code, but it generates more work reviewing code. I guess that's maybe why people often liken it to like an intern or like a grad, because they can do a bunch of the work for you, but it you can't take it as red and you've got to spend some time checking that it's correct and good and factual.

Hugh Williams:

Yeah, that's right. I've been coding with uh clawed code quite a lot at home. I've certainly learned to pause every hour or so and really holistically look through the code and try and understand, you know, what I've created and where it can be cleaned up and you know, carry out a lot of manual intervention. So I'm certainly saving time in writing the code, but I think your engineers are right. There's more time now in, you know, inspecting the output, which I suppose is a little bit like generating text, writing an essay, an email, whatever else it is, it's saving you an enormous amount of typing, but you've still got to go and review it pretty carefully. And you're probably spending more time reviewing it than you would review your own text.

Hannah Clayton-Langton:

Okay, so they're not good, or they they have this sort of propensity to hallucinate. Yep. What else are they limited in, LLMs?

Hugh Williams:

They're not great at math, which I guess is is not surprising.

Hannah Clayton-Langton:

See, to me as a non-engineer, that is surprising because I thought computers were really good at math. So tell me why LLMs aren't good at math.

Hugh Williams:

So maybe let's go back to why computers are good at math. So if we if we go back to our first episode, which I hope most of our listeners have had the chance to listen to, you know, we're talking about programming and really writing deterministic kinds of steps and logically breaking things down. And so if we're writing code like that in Python or whatever programming language we're choosing, you know, that's when the computer is going to be great at maths because we can write down sort of logical steps and ask it to do particular mathematical things.

Hannah Clayton-Langton:

Ah, I see. And then if an LLM is recognizing patterns, it's taking an inference, and that inference could be wrong.

Hugh Williams:

Yeah, absolutely. So when you say to the LLM, multiply these two numbers together, what it's it's not really multiplying those two numbers together, it's producing the text that's most probable given you've given it those two numbers with a multiplication sign in the middle. So it can be off, wildly off sometimes. But what's happening now, um, and OpenAI have built this into ChatGPT, is it's got a maths mode and it kind of can now detect when you're trying to do things like that. So if you try and multiply two large numbers together, their consumer product will say, Hey, I think this user's trying to multiply numbers together, and it'll actually go and run some Python. So instead of using the LLM, it'll actually go off and execute a maths module, and that maths module will do the maths for you and give you back the answer. So it's not actually the LLM doing it, they're just intercepting the query and sending it to a different module.

Hannah Clayton-Langton:

Okay, and I've also noticed that it's not always great at image generation, or at least I use a few different LLMs, and some of them are better than others, but like why is that?

Hugh Williams:

So image generation is a slightly different problem, but it uses the same technology. So the conversational LLMs that we're talking to, that's that's one big model that's trained off tokens or words, and we've we've spoken a lot about this in the episode. There's also a separate model that's built for image generation. Same kinds of principles. You feed in enough images, treats those as pixels, tries and finds patterns in them, and when you ask it to generate an image, it's able to take the patterns from all of the images that it's seen and generate a plausible image. But it's a you know, technically a separate system to the system that generates the text. And it's a field that's evolving really quickly. I'm sure that most of our users have tried it with ChatGPT. They probably tried it a year ago, and it would generate all sorts of weird and wonderful stuff. It certainly couldn't generate text very well. Today, it's a lot better, it's able to generate text a lot more accurately. So it's definitely a field that they're investing in, and it's you know it's getting better and better uh every week.

Hannah Clayton-Langton:

Well, our logo was a was it ChatGPT used?

Hugh Williams:

It was Hannah. Yeah, yeah. I'm I'm not an artist who could produce a logo, so I took uh took your headshot photo, uh, my headshot photo, and told it to produce a logo that uh in it in the style of Tintin.

Hannah Clayton-Langton:

Tintin. I can see it now. I never knew that. I never knew that. Okay, and actually, here's a little peek into how this podcast works. We leverage LLMs a lot for this podcast.

Hugh Williams:

Oh, we certainly do.

Hannah Clayton-Langton:

And it's kind of meta because I was like reviewing something that an LLM had made for us based on like a some pre-work we'd done, and it was saying don't always trust the outputs of LLMs because they're not always right. And I was thinking, oh my god, the LLM is telling me.

Hugh Williams:

So you don't have to trust the LLM for the LLM episode. Yeah, yeah, yeah.

Hannah Clayton-Langton:

All right, so that's probably a good place to stop on LLMs.

Hugh Williams:

Yeah, I reckon we've covered a lot today, Hannah. It was a really, really fun chat. I'm glad we got a chance to do this one in person.

Hannah Clayton-Langton:

Yeah, this would have been hard at like 7 a.m. and 7 p.m. virtually. So definitely a good one to cover in person. And I actually feel like I did Ada Lovelace proud.

Hugh Williams:

Absolutely. I think we uh we certainly got a long way into a pretty tough topic, but I hope very much that our listeners got something out of it and then they can go and explain LLMs to uh friends and family.

Hannah Clayton-Langton:

Yeah, exactly. Like super technical, but so relevant that I think it's worth going on the journey to understand it in a little bit more detail. I'm sure there's plenty more we could be going after in that space.

Hugh Williams:

There's a ton more we could do. We should do an episode on uh AI and coding. We could talk about ethics, you know, the morals of AI, what's going to happen next, personalised AI. There's a there's a ton we could do maybe later in series one or even in series two if we get to a series two.

Hannah Clayton-Langton:

Yeah, okay, let's do that. So, listeners, if you like what you've heard today, you can like and subscribe, and of course, leave a review wherever you get your podcasts.

Hugh Williams:

Yeah, that would be great. And then if you want to learn more about the show, you can head to techoverflowpodcast.com. And I'm busy posting on LinkedIn, Hannah. I think I might be I might be beating you on X and Instagram.

Hannah Clayton-Langton:

But we are on there too. I need to think about how I can leverage an LLM to do some of that work for me.

Hugh Williams:

Yeah, absolutely. We'll build an agent to do that. That's another topic we can talk about agents. Agentic AI, yeah, we definitely definitely need to do that again.

Hannah Clayton-Langton:

Okay, definitely a third episode coming. But for today, this has been the Tech Overflow podcast. I'm Hannah.

Hugh Williams:

And I'm Hugh.

Hannah Clayton-Langton:

Thanks for listening, and we'll see you next time. Yeah, bye. Bye.