Tech Overflow

You’re Not Searching the Web (How Google Search Really Works)

Hannah Clayton-Langton and Hugh Williams Season 1 Episode 7

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 48:37

Google Search feels like magic because it is solving an impossible problem on your behalf: you show up with a complex information need, type a couple of words, and expect a great answer almost instantly. We unpack what’s really happening in that split second, from the early days of cluttered 90s search engines to why Google’s clean interface, speed, and relevance changed everything.

We walk through the core machinery that makes web search work: crawling (and why it has to be “polite” to websites), building and refreshing a copy of the web, and using an inverted index so results can appear in around 200 milliseconds. From there we get into ranking, including PageRank and why links became a proxy for credibility, plus the constant battle against spam and how SEO sits in a tricky grey zone between good practice and gaming the system.

Then we zoom out to what’s changing now. Human judges still evaluate search results at industrial scale to improve quality and train machine learning systems, while query understanding rewrites and repairs your input so the ranker has a fighting chance. Finally, we tackle the shift away from the ten blue links era as AI summaries and LLMs like ChatGPT reduce clicks, introduce new trust issues, and force new monetisation choices that could reshape search again.

If you enjoyed this deep dive, subscribe wherever you get your podcasts, share the episode with a curious friend, and leave us a review so more people can find the show.

Like, Subscribe, and Follow the Tech Overflow Podcast by visiting this link: https://linktr.ee/Techoverflowpodcast

Hugh Williams:

You have what we call an information need.

Hannah Clayton-Langton:

Don't show me the list of links, just send me to the most link that you've got.

Hugh Williams:

When you search at Google, you're not actually web, you're searching a copy of the web.

Hannah Clayton-Langton:

I have no idea. Hello world and welcome to the Tech Overflow Podcast. I'm Hannah Clayton Langton.

Hugh Williams:

And I'm Hugh Williams, and we are the podcast that explains technology to curious people.

Hannah Clayton-Langton:

Curious people indeed. How are you doing today, Hugh? I am well, Hannah.

Hugh Williams:

I am well. How about you?

Hannah Clayton-Langton:

I am looking forward to my ski trip next week. I'm still into this idea. We've been talking about it offline of like sport overflow, or like you guys tell us what sectors or industries do you want on? Because you could once you start thinking of it, you're like, ooh, I could do everything of a sport overflow would be good. Or vinyl record overflow, maybe. Well, I'm not too educated in either. I would know less about sports than I do about tech. Give me another one for the list. Speaking of tech, we've got a super cool topic today, and one of which I think you are pretty much a master, which is search.

Hugh Williams:

I can't wait to talk about this. How does Google search actually work? And I think this will be fun.

Hannah Clayton-Langton:

I think it's gonna be great. And just to remind us of your credentials. So you worked for Google, although mostly on Google Maps, I but you also spent a lot of time on Microsoft on Bing, which is a competitor search engine to Google Search. Is that right?

Hugh Williams:

I'll be very honest. I mean, I haven't really used Bing since I left Microsoft. I think Google, Google's a better search engine. Um we can talk about why later on. Yeah, we definitely should. But um the set of people who built Google and are still Google, and the set of people who built Bing and are still Bing are, you know, the same crowd and people move between two companies. So uh, you know, lots of the same things happening, but one big victor in uh in Google search.

Hannah Clayton-Langton:

And not to take us too far off track at the beginning, but is there like a geographical component like is being bigger in geographies?

Hugh Williams:

Uh I don't think so.

Hannah Clayton-Langton:

Okay. So Google just has like 90% market share everywhere.

Hugh Williams:

Yeah, at least.

Hannah Clayton-Langton:

For now. At least we can talk about that later.

Hugh Williams:

Yeah. Yeah.

Hannah Clayton-Langton:

How Chat GPT is changing the game. Okay. And um, I guess Ask Jeeves isn't around that much anymore. I think I'm kind of aging myself. But I remember back in the 90s, at least in the US where I was living, Ask Jeeves was another competitor web search

Hugh Williams:

Yeah, there were a lot in the 90s. Um, if you sort of wind back the clock to kind of the the I mean there was Alta Vista, InfoSeek, Excite, Ask Jeeves, Smart. I mean, there was a there was a long list of you know, mid to late 90s search engines, and they pretty much all faded away. Of course, Yahoo had their own search engine. Oh, yes, yeah, yeah, yeah. Um pretty much all faded away when Google took away all their market share. And obviously, you know, a little bit later on after Google was founded, they came up with their incredible ads business and became the first company to really make money out of And, you know, the rest, as they say, is uh is history.

Hannah Clayton-Langton:

Well, I've been reflecting a little bit on the history of the for prep for some of this season. And I remember when the internet was called the information Like I don't know that it's really called that anymore, but was a sign-up at my school library that was citing the internet as the information superhighway because before we were for like literally everything and we had apps for It was a place where you went to seek out information.

Hugh Williams:

Yeah, absolutely. You used to go to Yahoo's directory and follow all those links to find web pages you were interested in. The search was pretty reasonable, and then were there a of very, very popular sites like nasa.gov that we all used go and uh marvel at the pictures that Voyager 1 and Voyager 2 had taken. And if we take a flashback back to uh episode one of 1.

Hannah Clayton-Langton:

Yeah, okay, yeah. I mean, I can't say that I was intellectual enough in the 90s. I to be fair, guys, I was uh between the ages of not and nine in the 90s, but I was definitely not on nasa.gov, whatever just said. But you were definitely smart enough to be using that. Okay, and throwback even further, before we had web search, had encyclopedias. So it was a pretty big advent of new technology. I'm sure the quality of the results is, I mean, the improvement over time and the continued improvement is massive, but that was a huge game changer. Like type in anything and get back information. Blue links.

Hugh Williams:

Yeah, absolutely. And you know, get back 10 relevant blue links in, you know, 200 milliseconds or so, you know, one fifth of a second. Lots of infrastructure powering, you know, pretty relevant, very, very fast search results.

Hannah Clayton-Langton:

Pretty cool. Okay, so let's start at the basics. Like talk me through how you as an expert would describe a engine or like the why if we take a product lens on search.

Hugh Williams:

Yeah, good place to start, Hannah. So I think probably the best way to think about it is maybe just think about how you think about using search, right? So I think you you walk up to your favorite search engine, and you have in mind something that you want to figure So you've got what we'd call in uh in the field of retrieval, which is what this broad field's called, you have what we call an information need. So uh I know we were you're talking about getting a to Florence in a couple of times.

Hannah Clayton-Langton:

I've never actually been, but I'll manifest it.

Hugh Williams:

Yeah, maybe maybe you'll do this on ChatGPT now, but let's you're doing it on um Google. You've got in mind a wonderful holiday, the amount of want to go, who you want to go with, where you want to stay, time of year you're thinking about going. I mean, you've got a big complex information need in your you've got a price point, all kinds of stuff. And you walk up to Google, you type something into box. The median query. So if you sort, you take all the queries and sort them from the shortest query to the longest query, and you look at the one that's sitting there in the middle is two words. So you're probably typing Florence holidays.

Hannah Clayton-Langton:

Yeah, cheap hotels or I don't know, fancy hotels or

Hugh Williams:

Yeah, users are learning to type longer queries now. I think probably because of things like ChatGPT and because they're starting to ask more questions with Google and get answers. But the queries are pretty short, right? So you're gonna you've got this complex need in your head, you turn up in two words, maybe, maybe three, and you press enter, and then 200 milliseconds later, what Google is to do is it's trying to come up with answers to this complex information need that you've got in your head, right? And that's that's a hard project, right? So you've you've only given it two or three words, this very complex idea that you've got, and you're giving you know, one fifth of a second to come up with answers that it thinks might be reasonable. But the problem it's trying to solve is you've got an need, it's got a vast repository of resources it could you, and what it's trying to do is read your mind.

Hannah Clayton-Langton:

Yeah.

Hugh Williams:

Really try and figure out what is that information need that sits under this two-word query and do its best very, very to give you answers that you can click on or or see on the these days. There's a lot of that going on that meet your information and help you progress with this uh with this task that carrying on.

Hannah Clayton-Langton:

I'm surprised that the median search is only two words. That really, that really surprises me.

Hugh Williams:

Yeah, and the average, the mean average would be, you know, uh still under three.

Hannah Clayton-Langton:

Wow. Okay. Yeah. It's it's mad though, the vast array of different things that I might just pop into Google. Like, especially nowadays, we all have smartphones. You're like fact-checking stuff that comes up in conversation or like literally Googling something as someone's talking about it. So yeah, it can literally be anything.

Hugh Williams:

Yeah. And look, a typical user probably runs four to five queries a day. So people are, you know, using it pretty frequently, probably not as frequently as they're using, say, their social but you know, they're using it pretty frequently. And you know, Google's probably getting, uh, let's say, close to 100,000 queries a second.

Hannah Clayton-Langton:

Wow.

Hugh Williams:

From across the world, you know, many, many trillions of a year. So there's lots of people out there. Most people on the planet use it, most people on the planet use it several times a day, and uh, you know, all sorts of getting poured into it.

Hannah Clayton-Langton:

Well, 90% mortgage share is crazy for anything, right?

Hugh Williams:

Yeah, absolutely. One day in the future, maybe it's in season four or we should talk about monopolies in tech and uh those things. And certainly there's a lot of people who, you know, are about big tech and monopolistic kind of behavior. But Google, Google's won the game, right?

Hannah Clayton-Langton:

But they've won it for like 20 years, more, nearly 30, right? I think we were talking the other day and they've been around since 1998.

Hugh Williams:

Yeah, exactly.

Hannah Clayton-Langton:

And they've been dominant for shorty most of that period.

Hugh Williams:

Yeah. In search. All of that period, really. They they do they took off um and they've never they've been challenged really. I mean, we had a real shot at it at Microsoft, um and still trying. Um, but uh, you know, they're a tough competitor and very, good at what they do. And maybe if we just go back to that that launch date, you way back in the late 90s, you might sort of go, well, what did they actually release and why did it sort of crush everybody else? And I'd say, look, everybody remembers the super clean Yeah. The interfaces of the search engines that preceded it were cluttered and ugly. They had banner ads at the top, you know, those were big, horrible interfaces. Google launched something that was mostly white. They had the cool Google logo, the multicolor logo, and was clean. It was also very, very fast. And so, you know, you were astounded at the time how fast could answer your questions. And then they did a killer thing, which I think is being to history a little bit, which is they would summarize the that were the answers and just show you where the words you typed in existed within the document, which we call query summaries, if you want to get into the technical terms, but nobody else before them had done that. So what the old search engines used to do was they just you the first paragraph or so from whatever web page it was. And so you'd look at this blue link and you'd wonder, why is this here? Why is this relevant to me? And then Google was smart enough to show you the parts of the document that actually mattered. And so suddenly their page, their search results page, just super, super relevant to the user. And you're like, hey, this is a way better experience. It's cleaner, it's faster, the page looks really me. And they didn't put banner ads all over the thing.

Hannah Clayton-Langton:

They also did that fun thing where, well, they still do it, course, where on certain days of the year, or like whether historical or like for the Olympics, they'll have like a fun on the Google logo or something. That used to be like at work, you'd be like, Oh, have you maybe less so now? Maybe I'm doing more meaningful work than I was when I

Hugh Williams:

Yeah, and we don't go to the Google homepage like we used to. Yeah. You know, we always used to go to Google.com to start our but most of the time these days, we just go to our in the address bar we type in a query and press enter. So you're kind of bypassing the Google homepage these and then obviously it's often embedded into other products you might be using. So you kind of miss seeing the log.

Hannah Clayton-Langton:

Yeah, because I'm in Chrome, I'm a huge Google Enterprise We don't need to go into that now, but I will give them some marketing. Um, when I'm punching stuff into my Chrome tab search bar, I that that's linked to Google because it's a Google

Hugh Williams:

Yeah, absolutely. We should do an episode on Chrome at some point. I've got a couple of friends who worked on Chrome in the days, and we could uh we could talk a lot about Chrome.

Hannah Clayton-Langton:

Okay, I can give them my effusive praise. And um, Google have axed the I'm feeling lucky button, right? Haven't they?

Hugh Williams:

Yeah, they have. They have uh long gone now, I think. It was a cool idea at the time, just sort of send you to a uh web page.

Hannah Clayton-Langton:

So the idea was like, don't show me the list of links, just send me to the most relevant link that you've got. And I mean, it's the name is self-evident. It's like putting your trust in them to direct your search the most relevant place. Yeah.

Hugh Williams:

And I think if you really believed, if you were working at that you had the most relevant search engine and you were often than not getting it right, then I think you'd have the courage to put that on the on the home page and give people uh a way to shortcut past the uh search results page.

Hannah Clayton-Langton:

And did you, I mean, they must have been forgoing ad revenue something. Is that why they removed it?

Hugh Williams:

Uh I think they they did it before they had an ads business. And then, you know, it was a it was a pretty cute popular And I'm guessing at some point they removed it because either at A it wasn't being used or B they were forgoing ad revenue. And I think they've become more and more about revenue the years. But certainly when they started it, they were real purists. I mean, they didn't have an ads business and they just wanted to deliver a great search product.

Hannah Clayton-Langton:

Fair enough. Okay, so let's talk about the great search product and maybe back to ads later if we have time. Can I just ask the original search? So I'm like again belying my age range here very, very much. But um, I used to get taught at school like how you Google quotation marks and then like a plus sign and the other thing you want in the results board. And sometimes you could put like a minus sign of the words you didn't. Does that work? We're past those days.

Hugh Williams:

That all still works. Um, that all still works. Um, so you can certainly use these as what we call advanced query operators in Google. You've you come up with some good examples, quotation marks, the plus, the minus sign, all still works. I'd say though, today, and it's probably been true for the history of Google, is you know, only the very advanced users know about those features.

Hannah Clayton-Langton:

Oh, well, that was me in sixth grade. So I've obviously been advanced in my tech use for a yeah.

Hugh Williams:

But you're definitely a top one percent. You're definitely a top one percent. I'd say, you know, the history of me working on search at is, you know, I work with these young, smart product managers, and they'll have some advanced idea like that, and I'll of look at them and I'll say, look, you're gonna you're gonna ship this thing out to customers, and way less than one of the customers are gonna use it. And they look at me and say, no, no, no, no, no, no, this is a fantastic feature. I'm gonna do thumbs up, thumbs down on the results, and get all this feedback. And then they usually come back six months later and say, you were kind of right, nobody uses advanced features. So I think people just expect to be able to type the words into the search box, press enter and get amazing results a fifth of a second.

Hannah Clayton-Langton:

Okay, so we're not looking at developing our use through new features. We just want the same thing we've wanted since the beginning of Google's arrival, which is relevant results.

Hugh Williams:

Yep. Okay. That's it. And look, you know, there's there's certainly some that they've added over time, like autosuggest, for example. Yes. Um, I was kind of part of the autosuggest wars between Yahoo Bing and and Google, and we're all innovating in that. But you know, it's a it's a very standard feature these days, you'll find in most search engines. And obviously that helps you a lot. You know, you might type in a couple of words and you might see in the drop-down a query that's better than the query going to type, and you click on that, or it's a fast way to complete the query. And so there's some advanced features like that that I save users typing and therefore they will use them or help express their information need in a better way, and therefore they will use them. But people don't like plus minus in quotation modes.

Hannah Clayton-Langton:

No, but one thing that makes me very lazy in how I type, on my phone, is did you mean? Is that what it's called? Did you mean where you put in like an egregious typo and it knows what you meant to type?

Hugh Williams:

Yeah, absolutely. And they're getting better and better and better at So these days it's much better at just quietly correcting mistakes that you've made, given that it's context about and the vast experience they have and the teams that work on that. And so these days, you know, you don't see as much did you as probably you did 15 years ago. But it's an incredibly cool feature.

Hannah Clayton-Langton:

Yeah, it is, but although it does allow me to be very lazy. it working under the hood if I am typing in holiday to What's it pulling from?

Hugh Williams:

Yeah, good question. So the first thing, and and I I I think lots of our will know this, Hannah, but when you search at Google, not actually searching the web, you're searching a copy of web. Right. So Google doesn't take your query and go out and try and websites that might match the query and wait for those to respond or however you might think about it. But what Google is doing is it's what we call crawling the So it's going out and it's bringing back a copy of every resource that it can find on the web, and it's bringing that back and storing that in its vast data centers on the in those data centers. And so Google has a copy of the web. And when you go to Google and run a query, what you're doing is searching that copy of the web. So they're building this copy in advance and they're deciding what pages are interesting and what pages aren't They're storing this in a vast data center, and then they're building what we'd call an index on top of that. We can kind of talk about that in a second, that allows the search process to take place. So we need this thing called an index to be able to find pages that are relevant within their data center because you don't want to be for every query, you know, you type to Florence, you don't want to be looking through every web page in their copy that's stored in the data centers. You need some way to kind of shortcut that.

Hannah Clayton-Langton:

Okay, I had no idea that that's how that's working. I feel like you made it sound like everybody knew that. Maybe everyone except me knows that. But I had no idea. Okay, and so how often are they taking a copy of the web?

Hugh Williams:

Yeah, so that's a super good question, Hannah. So the answer is it depends.

Hannah Clayton-Langton:

So that's always the answer.

unknown:

Yeah.

Hugh Williams:

Let's pick a web page. So let's say we mentioned at the top of the show nasa.gov.

Hannah Clayton-Langton:

Yeah.

Hugh Williams:

Right. So let's imagine the Google crawler as it as it's known, out and gets a copy of the home page of nasa.gov and brings that back. So we've got our first copy of it. Yeah. It's got to then decide when should I go and get another of that? Sure. And make some decision. And then if it goes and gets that new copy and brings it it's exactly the same as the copy that it had, then we know we've wasted resources and wasted time because we didn't anything new. We should have gone and got some other page because we've got a fixed amount of crawling resources available to us.

Hannah Clayton-Langton:

And crawling, that's a technical term.

Hugh Williams:

Yeah, yeah. Basically, it just means going and fetching stuff from across the web.

Hannah Clayton-Langton:

Okay. Um sounds slow though, which is weird, but okay.

Hugh Williams:

Yeah, it is because websites don't like the Google bot of pounding the website, right? So you can consume an enormous amount of resources at if the Google bot turns up and tries to fetch, you know, of pages in a short amount of time. So there's a politeness that these crawlers exhibit. So they'll only fetch at a certain pace, and that's kind of an agreed pace that's reasonable for all for all websites. Oh wow.

Hannah Clayton-Langton:

Okay.

Hugh Williams:

So they kind of crawl along. But the word crawl actually comes from this idea that go and fetch nasa.gov, what you'll find in the nasa.gov is links to other pages, right? So you'll see links into pages about different space you know, the team at NASA. Maybe you'll see some links that go off to you know some space organization. And so those links are also links that the crawler could go and fetch. And so it will cue those up at some point and decide to go fetch those. And that's why they call it crawling, is because it's kind you know, retrieving things and then going out and more things based on what it's discovered.

Hannah Clayton-Langton:

Okay, so there's an element of dynamism in that they would crawl a news website more frequently than a home shopping

Hugh Williams:

Yeah. Just back on the nasa.gov example, right? So we fetched it once, we've gone back and we fetched it It's either changed or it hasn't. If we if it hasn't changed, then why did we do that? We should take more time now before we go and get another So we'll we'll slow down how frequently we're getting it. If, however, we went and got it and it had changed, we'll oh, maybe we should get this a little bit more often. And so we'll probably speed up how often we go and get And so we're really adjusting what pages we visit based behavior of those pages. And as you say, if it's a news site, you probably want to be hitting that pretty frequently, looking for the news just in. If it's uh my home page, which I don't think I've updated year.

Hannah Clayton-Langton:

qwilliams.com, is that right?

Hugh Williams:

qewilliams.com.

Hannah Clayton-Langton:

Huewilliams.com, guys, check it out.

Hugh Williams:

It's pretty boring. It doesn't change very often. Um, then you wouldn't want to be fetching that too often, Because that's a waste of resources.

Hannah Clayton-Langton:

And was there like a very early version of Google that wasn't this dynamic and just did like a once a day crawl and fetch, then over time it's gotten smarter.

Hugh Williams:

Yeah, look, the crawling team's been working hard since the late 90s and it's a big team and it's still innovating a I'll give you a couple of examples of things that are hard, right? So let's imagine that uh there's a website with a calendar on it, lots of websites with calendars. I want to book a doctor's appointment, you know, and you can press next. If the crawler goes and gets the home page to the calendar, it's gonna see that next link and it's gonna go, oh, there's a next link. I should go and explore that. And it goes and explores that and gets the next month, and it gets back another link to the next month, and we can end up uh we can end up here for infinity, right? Because we're just gonna keep pressing the next button and end up in some 2047 or something.

Hannah Clayton-Langton:

Yeah, yeah.

Hugh Williams:

And so at some level, you have to make a you have to try and understand what the page is, what's worth crawling, is it a dynamic page? These kinds of things. And so there's hundreds of problems like this when it comes to crawling the web. You know, dynamism is one aspect of it, but how the pages are structured, what's on the pages, how useful is the page, all these kinds of things go into the equation of of crawling the web. It's a super hard problem.

Hannah Clayton-Langton:

Okay, I have a couple of questions before we move on. So do they store the old copies of the web and like for how

Hugh Williams:

Yeah, they do. They do. I mean, they're not the internet archive, so they're not to store them for all time for posterity's sake, but they do, and that allows you to compute things like how fast the pages are changing. So if you've got a history of a change of a page of the homepage, then you can compute some kind of trajectory of much it's changing. And so there's lots of benefits to keeping all the pages

Hannah Clayton-Langton:

That makes sense. And then is that kind of like regression modeling, basically? Yeah, that kind of thing. And then does more crawling generate more compute and cost more? Like as the world's gotten faster than it was in 1998, is that a more expensive thing for Google to be doing?

Hugh Williams:

Uh no, it's probably got cheaper just because internet has got cheaper. And um, you know, if you sort of think about a computing they had in 1998, I mean, that sort of scale of computing is now almost free. So the the technology's got a lot better, the internet's got a lot faster, the ability to process things has got a lot So I'd say overall costs have gone down enormously, but on other side of the ledger, the web's grown enormously. Yes. I was you know, I was in a lecture in 1994 with this guy Ross Wilkinson, this um professor from the university I was And uh he got up and he said in 1994, one day there will be over a million pages on the World Wide Web. And you could have heard a pin drop. Everything's like a million pages on the World Wide Web. And you know, of course, now many, many, many billions, yeah, many, many billions of real pages. And then obviously with these calendar type examples, you infinity of pages that aren't meaningful.

Hannah Clayton-Langton:

Wow. Okay, so Google isn't searching the web, it's searching a recent copy of the web, but it is live processing my search right? That happens in response to my query.

Hugh Williams:

Yeah, exactly. Exactly. And so we mentioned this idea of an index earlier on. So there's this structure that's built inside Google, which is called an inverted index, not so important the but it's a lot like the index you find the back of a book. So if I'm looking for um, I don't know, I'm looking in a guide, I've got a Europe lonely planet book and I want to something about Florence, then obviously I flip to the back, I do a bit of flipping around until I find F, I scan down I find the word Florence, and then it's got some page numbers that I could go to to learn about Florence. So I might say, yeah, I go to page 93, and then I just go to page 93 and I start reading about Florence. The Google index is pretty much that. And so they've got all of the words you could possibly search for, and then where you would find those words in the copy the World Wide Web. So anytime you type a query, you're not scanning through of the pages that they've copied from the web. You're looking in the index and you're saying Florence. And say, okay, well, you need to go to this document, this or that document. They're the ones that mention Lawrence.

Hannah Clayton-Langton:

Okay. And then if you've searched Florence Holiday, it will look up anywhere that the words holiday and Florence are coming

Hugh Williams:

Yeah, exactly. And that's the first step. That's that's great. I mean, that's the first step that Google carries out is it only considers documents that have both the word Florence and Holiday in them. And then its job is to what we call rank those. So to try and find the best one of those and put that at top that's going to meet your information need and put the one at the bottom.

Hannah Clayton-Langton:

And that was what Google was so great at first, right? Which was just relevancy of results.

Hugh Williams:

Exactly. The Google founders, so Larry and Sergei, were students at They were PhD students at Stanford. And they were, you know, either the first folks or one of the first folks to come up with this idea called PageRank, I'm happy to explain if you're interested in. But uh it was, it was a pretty interesting idea that really helped the relevance of search. the original Google search engine was built around. It turns out over time this has become less important, but was certainly the center of the Google IP.

Hannah Clayton-Langton:

Okay, so you've teased page rank, so now you have to tell us that means.

Hugh Williams:

So I'll try and put it in simple terms for our uh smart, listeners, Hannah. Let me have a shot at it. So let's pick on my homepage. It's not very exciting, but we'll pick on my homepage. Um, if my homepage has links on that page that go to resources, that will improve the quality of my page. And so I'm likely to rank a little bit higher on Google. And then similarly, if there's pages out there that point my page, so imagine, you know, I used to work at RMIT Imagine there's a page at RMIT University, it's a credible that links to my homepage and says, you know, former academic, you know, Hugh Williams, that will give some credibility to my homepage. And so there's this score that's computed for every page that gives it a credibility score. And, you know, all other things being equal, more credible will rank more highly than less credible pages. And so that's the fundamental idea of page rank.

Hannah Clayton-Langton:

Okay, so two questions. One, when you say credible links, we really just mean volume of links, right? Like it's not testing if it's a credible website based on if.gov or something. It's just saying more links mean more credibility.

Hugh Williams:

No, it's both um, it's both quantity and and certainly So there's a score that something like a nasa.gov would have that's incredibly high. And then if there was, you know, my something it's gonna have a very, very low credibility score.

Hannah Clayton-Langton:

So there's somehow filtering for the type of website. And then this can be gamed, right? Like, can this be gamed in some way? Like, even if it's not the most credible resource, if you go a volume play and I create a website, anaclaytonlangton.com, then I just make another thousand websites and point them all at it. Will I rank higher?

Hugh Williams:

In the old days, Hannah, that would have worked fantastically well, but that will absolutely not work today. And that's because the Google team has a what we call a team, and the spam team's job is to detect these kinds of Uh, and they're a very, very busy team trying to figure this kind of stuff out. But one thing that would happen here in this case is you'd a whole bunch of domains at the same time. They'd know that.

Hannah Clayton-Langton:

Okay.

Hugh Williams:

You'd generate these pages at the same time, all these suddenly appear that were all pointing to your page, they'd all be a bit similar in these kinds of ways. And so they would sort of see that using their algorithms and they would, you know, basically crush that out of their process. But these teams are very, very smart. Again, they've probably been working as long as a crawling So, you know, go back to the late 90s, big team, really smart people, and their whole job is to make sure that the results aren't spammed. Um, and so that you're really seeing great results, not that have been artificially boosted.

Hannah Clayton-Langton:

Okay, so there there is such a thing as spamming, which is And I won't make a website of my name and point a thousand websites to it. But there is something called search engine optimization, known as SEO. And that is completely fair game, right? And as I've understood, that is understanding how the Google or the ranking is working behind the scenes. And like I have a friend, that's her whole job, is she advises companies how to optimize their SEO because they want to be relevant in search results for potential customers. And that that's all fair game, right?

Hugh Williams:

That's all fair game. And if you go to the Google website, they have webmaster and those guidelines are you know the set of things that are good practice that you ought to do in order to make sure your web page will rank highly. It says all kinds of sensible stuff like make sure your is really fast.

Hannah Clayton-Langton:

How do you make sure your page is really fast?

Hugh Williams:

Well, now there's a there's a Okay, we don't have to it'd be a good episode one day. How to how to make the web fast. often is in that gray area. So there's a set of things you should do that are like make your website really fast. And then there's a set of things like uh you should mention lots of keywords that you'd like to get searched for. And what you find often is that people all stuff all those into the footer of the page. And uh, you know, Google would probably tell you that's not best practice because you're just trying to game search, not actually really trying to deliver great content. And so there's a pretty gray line between what's good and and what you'll be punished for.

Hannah Clayton-Langton:

Okay, and if I um call back to our LLM episode from last season, I imagine that there's like a human in the loop validation of the rankings, or at least there was at some point. Like, how does that all fit in?

Hugh Williams:

Yeah, good question. Yeah, all of these companies, not just Google search Search, but you know, all the companies that are running do human evaluation of their search results. And so what they'll typically do is they'll sample their query logs. So they'll go and look in all the queries that, you know, 100,000 queries a second, go and randomly sample some of those queries, different geographies, different languages, kinds of things. And then they'll check what results the customer saw, and they'll send that query and those results to a human judge, and they'll ask the human judge to assess those results and give them some feedback on those results. And they typically do it in two ways. So one is result by result. So this was the first result to holidays in Florence. Is this any good or not? And they'll give it a score, and then they'll look at the result and so on. Um, and then they'll also look at the whole page and they'll say, is this whole page relevant to this query? And they do this on an industrial scale. So we're talking millions of queries, hundreds, if not of humans over the world, you know, working full-time, feeding this data back into the system. And it's used for two things. You know, one is sort of diagnosing issues. So what are customers struggling with? What should we work on? So we might hand out a whole bunch of queries to these and we might learn that we're not very good at chemistry And then we might say, hey, I'm a PM on search, I think we be better at chemistry queries, and we might start a work So just, you know, diagnostics of what we're good and at is one thing. And then the second thing is it makes valuable training data. So if you're going to train your machine learned you know, we talked about that in in season one, uh, then you need training data. And so this is a great way to harvest huge amounts of data.

Hannah Clayton-Langton:

Okay, I have at least two, maybe three questions. So the first one is they're still doing that today. Like they're still doing this industrial scale human of the results.

Hugh Williams:

Yeah, absolutely.

Hannah Clayton-Langton:

Okay. And are they looking at like results where no one clicked And that might be a flag that the results aren't so relevant, or a search where someone clicked like the sixth link, which I assume you want it to be like 80% of your results, they click on the first, and then maybe like the next 15% they click second or something like that.

Hugh Williams:

Yeah, look, a couple of things there. So I'd say by and large, if the user doesn't click, we call it query abandonment. That's an interesting thing to go and study. But there's also queries that users will abandon and be So if I say, you know, time in San Francisco is my query I press enter and I get the time, I'm probably really, happy I didn't have to click on anything. So there's certainly a class of queries where no further is happiness. But by and large, for most standard queries like your to Florence, if I don't click on anything, that's probably a sign that I'm not happy and you know, definitely worth some PM at Google studying.

Hannah Clayton-Langton:

Okay, and we mentioned machine learning. So the Google search algorithm will be machine learned. Is that right?

Hugh Williams:

Yep, that's right. And look, we talked about page rank. So that's what we call one feature or factor that's in the giant ranking of it. So we talked about this, you know, back in season one in our, I think our sixth episode, Hannah, our first one on But yeah, it's machine learned and there's lots of factors or features. You know, something like page rank is one of those features, but there's literally thousands of features that are when you type in a query and you get this answer back in 200 milliseconds. And the ranking algorithm itself is machine learned.

Hannah Clayton-Langton:

And nowadays you also get the summary, AI-generated summary at the top. I don't know whether that's a rabbit hole for now, but I presume that's massively changed the desired user behavior because you'll get what you need from that summary, which I presume from those web pages.

Hugh Williams:

Yeah, exactly. And look, I think, you know, Google's doing some super stuff in doing that. And that's really hard to do at the industrial scale that operating at. So they're operating at an incredibly large scale and trying to return those AI summaries in a time that's similar search results takes a little bit longer. You can kind of see it bubbling in and a little bit slower. But boy, is that an industrial scale, hard, hard problem work on. But, you know, it has some negative consequences. So, first of all, people are clicking on the blue links less because they're finding the answers there and the blue links are pushed down the page. So websites are getting less traffic, yeah, which obviously people care about. So if you if you work at a business where you're counting on traffic from Google, you're now going to get less So that's a problem. These AI generated summaries, as we well know, we that back in season one, hallucinate. And so what you are getting back at the top there is, likely to be reasonable, but definitely fallible. Yeah.

Hannah Clayton-Langton:

Okay. So the ranking and some of the machine learning in there is information need to use the official term. Nice, nice. But what else? Is there anything else going on behind the scenes that we not recognize at surface level?

Hugh Williams:

Yeah, absolutely. There's a lot going on. We talked about autosuggest earlier on, that's a big deal. But there's this broader field called query understanding that's pretty interesting. You know, when I was working on being in the sort of you 2005, 2010 kind of period, we created a team that query understanding and it became actually as important to users with their information needs as the core So probably equivalent. I'd say probably the experience at Google is this query thing is just as important as the machine learning

Hannah Clayton-Langton:

And is that because we're giving it two words and we want it know exactly the context in our heads? But actually those two words could hold very different in different contexts, if I'm to guess. Okay.

Hugh Williams:

Exactly. Let's pick a couple of examples, right? So if you turn up to Google and you type bus and press I'd say like the 90% case is you want a bus timetable. You're out and about on your phone, you type bus, you're standing near a bus station, or you want to find out how catch a bus, right? So the results that you'd want are going to be pretty to somebody who isn't in London. Somebody's in Barcelona, different set of search results. Somebody who's in Mumbai, different set of search results. And so the query understanding team's job is to say, this is a local query. Yeah. This isn't a general query. This isn't Taylor Swift, right? This is a query that's local. So then they have to develop techniques to make sure that get local answers, right? You're not getting the best answers for bus from across the web. You're getting the best answers for bus when you're in

Hannah Clayton-Langton:

Okay, and that's an example where the location is the relevant like attribute for that search. But I might be searching for like genes, and then it might to pull on other attributes it knows about me, such as like my age and gender rather than my location. So is it is query understanding like basically knowing to pull to make the query relevant based on what it is you're searching?

Hugh Williams:

Yeah, exactly. And it could be trying to figure out phrases within the query. So knowing that when you're searching for genes that you you mean a particular brand of genes, so some kind of that's pulled from your query. It could be understanding your query's local, it could be spelling mistakes in the query. And of course, they've got a lot more common as people use Yeah, yeah. So, you know, back in the day, you could pretty much count on when people were using a laptop that they'd always get the letter right. Yeah. And then they might mess the word up. Today you can't even count on that. Spaces are missing. All kinds of things. And Google's magic at that. Yeah. So the query understanding team, if you like, is top of the ranking team, and their job is to try and turn the query into a better query. They can even add words to the query, they can take words of the query, they can put little clues next to the query, these kinds of things that make it a lot easier for the Google search ranker to do a better job of answering the

Hannah Clayton-Langton:

Can we do a few more examples? Because I think they're quite helpful.

Hugh Williams:

Yeah. When I was working at eBay, different search problem, released the iPad. Okay. Super cool, right? So one day there's no iPad. Next day, suddenly this new word exists. And boy, did we screw that up. If you went to eBay and you typed in iPad, we would say, some results for the query iPod.

Hannah Clayton-Langton:

Oh no.

Hugh Williams:

Because we were we were really, really sure that what you was iPod. Right. So there's probably enough users out there in history accidentally put an A in instead of an O. And so there was no way we were going to show you iPad

Hannah Clayton-Langton:

Because that wasn't a thing.

Hugh Williams:

Wasn't a thing. That was a typo. Yep. We were really sure that you meant iPod. And so, you know, I got a call from uh the boss's boss and it kind of said, hey, there's all these customers who want an iPad, and we're showing them iPods. Like, what are you guys doing? Please fix. Yeah. And so we learned, you know, the really hard way that we to have some overrides in the system. You couldn't just rely on history to do the query and the correction. You have to have some overrides where you kind of you can deploy the idea that this word should not be corrected. And then all of a sudden, we started to show iPads to who wanted to buy an iPad.

Hannah Clayton-Langton:

it comes out? Someone who missed it at the store. And then it's okay.

Hugh Williams:

So, you know, people are, you know, it's like ticket I suppose. People would go and buy a bunch of these things and then list them on eBay at a premium, and people were happy to because they were such a big deal when they came out.

Hannah Clayton-Langton:

So, how quickly were you able to fix the did you mean iPod?

Hugh Williams:

I think it was a long night for the team because we'd needed this feature before. Uh, this had never happened to us. And so it was a long night, probably in the order of six eight hours before we had a fix out. And then uh and then we had to build a feature that allowed us to do this anytime. Well, we could do it in seconds in the in the future when a new word appeared. But again, you know, this is part of the query understanding team's job is to make sure that the right queries are the queries are annotated in the right way, these these kinds of things.

Hannah Clayton-Langton:

So there is some commonality between how Google how like local proprietary search will work at some of the big companies. So like Instagram, Amazon, eBay, et cetera, ASOS have their own search engines and those will all have similar features to we just described.

Hugh Williams:

Yeah, absolutely. And and look, you know, there's probably a pool of well, call it 500 people out there around the world who've worked on search at an industrial scale. And those people move between companies and they're pretty sought after, right? So if you're TikTok and you're looking to build search, you probably go and recruit some of the people who've worked at Google or on one of the other large search problems. You get them over to TikTok. And of course, they're going to do, you know, many similar to the things that they'd done previously.

Hannah Clayton-Langton:

Makes sense. And then there are just this is a cool thing that listeners in the industry might not know, but um, there are also products that provide search engines. So like Algolia is an example of that where a smaller might just use Algolia's search rather than building their

Hugh Williams:

Yeah, and look, one of the big mistakes that I think companies make is they say, let's start our own search team and build our own search engine. And that's not a very good idea. I think the only folks who should be building a search from scratch are these absolute giants where search is a advantage and you really need to win in search. So, you know, I'm thinking the the likes of, you know, with Instagram and Facebook, Amazon with their search, TikTok, all these kinds of folks ought to hire a search team and build one. I think everybody else ought to just buy one. And El Goli is a popular commercial product. There's another product called Elasticsearch. So you can certainly go and buy off the shelf.

Hannah Clayton-Langton:

Okay, and we should not today, but I would be interested in the information query needs of like an Instagram user or a user versus an Amazon user or indeed a Google user. Because probably is quite subtle if you're only giving it like two or three words, but the needs are fairly distinct.

Hugh Williams:

Yeah, absolutely. And Google's a, you know, a general search engine that's there for searching the web, right? have thought about the types of queries that are run at a search engine. And really, if you zoom out enough, there's really three of things that are going on. So first is what we call navigational queries. So that's where you type Microsoft into Google because you to go to the Microsoft homepage, right? That's easy. Navigational queries. There's a second query called informational queries where want to find out information about something. And then there's a third set of queries, which are what we transactional queries, which is where you want to carry out some kind of action like buy something. And they all roughly fall into those kinds of categories. But then if you pick on something like an Amazon, obviously it's pretty much all transactional, right? There's a little bit of informational. I'm trying to read some reviews about things, but it's not a great place to do that. So most of the time, you know, it's just transactional People aren't trying to navigate the web when they go to So different search engines in different companies are, know, tuned, if you like, to do different tasks, which may not be as general as the Google search engine.

Hannah Clayton-Langton:

Okay, makes sense. But like Bing will be very similar to Google Web.

Hugh Williams:

Yeah.

Hannah Clayton-Langton:

And back to information queries. I imagine a lot of this has been disrupted by the likes of Chat GPT and Claude, et cetera. And like, is it controversial? But surely Google search has peaked because I'm taking, I say, like 70% of my let's call them information needs to Chat GPT now. Sorry, Google, but I I am.

Hugh Williams:

Yeah, absolutely. And I think I think we're maybe leaving that 10 blue links It's an era we've all grown up with, but I think we're past that now. And I think the wonderful thing about the large language you know, whether it's ChatGPT or Claude or whatever it is our listeners are using is that they have so much context you. And we've talked about that in other episodes, right? So they're saving memories about you. And so you can really have a dialogue about the need that you have, and you can kind of go back and forth kind of refine the information need you've got and get, know, fantastic answers for your trip to Florence. Whereas Google's a bit one and done, right? Like you go there, you type in your query, you press enter, fifth of a second later, you've got some blue links, maybe an okay AI summary, but it's just based on that query. And that's not as good as what the LLMs are offering today.

Hannah Clayton-Langton:

But it I also would wonder if we're at like the peak LLM because they're not yet bombarding me with ads, but at some I assume they need to make money. And so they're going to be putting those in. So I'm getting like the best service from the LLM for free now, basically, right? And so it's pulling me off of Google. But if that starts to be disrupted by other things because need to monetize it, then maybe I will end up back at Google certain searches.

Hugh Williams:

Yeah, absolutely. And one of the nice things about Google is that they clearly separate what's an ad from what's not an ad.

Hannah Clayton-Langton:

Yes, yes.

Hugh Williams:

Right. So they're very clearly labeled. I'd say they've dialed down the separation a little bit maybe not quite as pure as they once were, but it's still clear what's an ad and what's not. And certainly you can't buy placement in the Google search results, right? Like you just have to do the right kinds of things and let algorithm look after it. And there's a bit of a church state separation at Google the ads team and the search team. So the ads team really has no influence over how the search team does ranking. And the search team, you know, I guess kind of just puts up with the fact that the ads team is going to put ads in part of the Google search results page.

Hannah Clayton-Langton:

But I also like that on Google you can navigate specifically the shopping page and then look specifically and compare products from different websites. Be interested to see how the LLMs answer that need in how build out their search results because it's pretty bog how you get your response from the LLM right now. Like sometimes they'll offer me a table, which can be if I'm looking to compare, but it I don't know how they'd work in shopping. And it's all trying to be like your friend. And so that whole idea of like a friendly LLM trying to sell something feels like it will be like psychologically quite whereas Google's pretty cut and dry, right? Like here's where you get your photos, and here's where you your shopping results or Google flights or whatever it is. So yeah, I'll be interested to see how that evolves.

Hugh Williams:

Yeah, absolutely. And you know, the LLMs today have really nailed that query, right? So navigational queries, informational queries, transactional queries, they really nailed the informational queries. I think it's gonna be pretty interesting to see what they in the transactional queries when it comes to things like booking trips, these kinds of things. I'm not really clear exactly how that'll work. And again, as you said earlier, you know, not exactly clear how they'll delineate between ads, what's an ad and what's an ad, but it's gonna be one heck of a business model when turn it on.

Hannah Clayton-Langton:

Yes, although I would say that my innate trust in Google is high. I don't know whether that's a good thing or not or naive, but I have pretty strong trust in Google and the robustness and the veracity of the results I get. And I wouldn't say the same thing about ChatGPT, at least not right now. Like I think it's its friendliness all almost puts me off. Maybe that's me being very British. But like I'm not sure. Yeah, I feel like it could lose my trust pretty quickly if I like they were trying to sell me something without admitting they were trying to sell me something. And I just like the way that Google's very clear with if you to buy, here's where you buy, and it's very clearly an ad.

Hugh Williams:

I think that's spot on. And you know, props to Google, right, for always keeping search team separate to their ads team. And so there really is no influence from the commercial part of the organization on what the search team does. And so the search team is very, very pure in how it goes about solving the problems. Back in the old days, if you go back into the into the 2000s, we were certainly the same at Bing, and Bing is still the So the search team isn't influenced by the commercial But Yahoo definitely went into some gray areas. So, you know, there was lots of rumors on the street that could kind of buy placement in the algorithmic results And of course, that didn't end well for Yahoo. They shut down their search team, they got out of the game. One of the biggest mistakes in tech history, by the out of the search game.

Hannah Clayton-Langton:

So we should definitely do an episode on the biggest mistakes in tech history. But I also feel like we could do an episode on how Google work because in a similar way to how I think I said in our um is your phone listening to you episode that I quite like that Instagram serves me relevant ads. Like I also do quite like that Google is clearly processing a lot of information from myself and users like myself and me relevant content. And that would be an interesting thing to break down that could be an episode in its own right.

Hugh Williams:

Yeah, absolutely. It'd be fun to do a history of ads and the current state ads and uh yeah, and ads is like 70% of Google's revenue or something.

Hannah Clayton-Langton:

Is that right? Yeah, that's right. Okay, yeah, so it's pretty big. Okay, so before we wrap it up, let's call back to I think you you did a PhD in search, is that right? Yeah. That's pretty cool. And then you were at Microsoft what, like the early part 2000s? Is that fair?

Hugh Williams:

Yeah, mid to late. So 2004, five to around 2010.

Hannah Clayton-Langton:

Okay, so it's been a fair while since you were working in search. Are you keeping an eye out like outside in on what's changing? You know, I mean, I guess you've got mates that probably still work in the industry, but like what has changed since then?

Hugh Williams:

Great question, Hannah. I think the thing that's changed the most is that we've moved from matching words. So when you type in a query like, you know, holidays in we're not literally today just taking holidays in Florence and looking for documents that match. What they've really started doing now is is really understand the documents and represent those documents as a point of understanding, and then understand your query, that as a point of understanding and try and find might match. So, for example, let's say there's a document out there about holidays in Florence, you might distill that out you know, a tasteful cultural holidays in Europe. And that might also be near some other page, like say a holidays in Vienna. Might say, well, that's also a tasteful cultural holiday in Europe. And so those documents would be sort of represented in as being pretty similar. And then when you type holidays in Florence or cultural in Europe, moved away really from that idea of matching to now trying to really understand a query and understand a document, sort of distill the essence of those and try and that. search.

Hannah Clayton-Langton:

Okay, so there's like a nuance in understanding the need, and then there's a richness in the results that's not cause and effect, search for this, get that, and it's more of context-driven.

Hugh Williams:

Yeah, that's it. So more about sort of intent and understanding and less literal matching of words.

Hannah Clayton-Langton:

Okay, and where do you think search is going? Is this just some of what we mentioned before about the advent of the LLMs and how they'll disrupt?

Hugh Williams:

I think that's a big part of it. Look, you know, search on mobile is going to be a thing for a very long time. Do you really want to use an LLM when you're in a hurry to do something on mobile, buy something, whatever it is, book a Probably not. And so I think, you know, search has got a long, long way to run. You know, Google's still getting 100,000 searches a second. But I I think we are heading towards a world where maybe less typed, more voice, more video, you know, more of a of the lines between LLMs and blue links.

Hannah Clayton-Langton:

Yeah, okay, that's fair. And to take it back to Google before we close out the in some of my research, the word Google actually holds a meaning or a very similar word, Google, which I think is what Google named after, which isn't a type of bird. I've made it sound like it's a type of bird, but it's like a with a hundred zeros or something, like one to the power of hundred or something.

Hugh Williams:

Yeah, one with a hundred zeros.

Hannah Clayton-Langton:

Yeah, exactly. So there's your fun fact. I know we said we wouldn't do any more tech trivia, but I did think that was pretty cool.

Hugh Williams:

Can I wrap by telling you um that I got a Google job offer in 1999 and turned it down? Did you know that story? Oh my god.

Hannah Clayton-Langton:

Is that your biggest regret?

Hugh Williams:

I don't really have regrets. You know, I'm very uh I'm very happy with my life the way it is and might have gone down a different path. But um certainly the piece of paper from that job offer probably would have made me a billionaire.

Hannah Clayton-Langton:

Oh my wait, so do you still you don't still have it or do you?

Hugh Williams:

Yeah, I do. I think it's in uh in in Selena's office in a filing cabinet somewhere. It looks like it was almost typed on a typewriter.

Hannah Clayton-Langton:

Well, hey, the podcast could still make you a billionaire. You never know.

Hugh Williams:

Oh, you never know.

Hannah Clayton-Langton:

And on that note, guys, if you've liked what you've heard you can find us at TechOverflowpodcast.com. You can listen to more episodes wherever you get your And we are on LinkedIn, Instagram, X, and I believe now

Hugh Williams:

Yeah, and YouTube Shorts.

Hannah Clayton-Langton:

Oh, that's new. Okay, great. Um, like, subscribe, leave us a review. We love your feedback. And please do us a favor and share us with your friends, and family if you think they'll be interested because we've to make you a billionaire.

Hugh Williams:

And uh pay for the podcast at least.

Hannah Clayton-Langton:

Yeah, oh, and a trip to Florence would be nice. Yeah, yeah, yeah.

Hugh Williams:

We keep using that example. I like it. It's making me want to go to Europe.

unknown:

Yeah.

Hannah Clayton-Langton:

Okay, great. Well, thanks so much, guys. We will see you next time.

Hugh Williams:

Thanks, Hannah. See ya. Bye.

Hannah Clayton-Langton:

Bye.