Loading…

Search engine bots crawled so AI bots could run

Ryan hosts Akamai data scientist Robert Lester on the show to discuss how the growth of AI bots affects internet traffic, the ways these AI bots differ from the original search engine optimization ones, and why you might not want to mitigate AI bots on your websites.

Article hero image

Akamai is a CDN, full-stack cloud computing, and cybersecurity company that keeps experiences closer to users and threats further away using the world’s most distributed compute platform.

Connect with Robert on LinkedIn and check out his AI Pulse blogs.

Today’s shoutout goes to user Evan Phoenix for winning a Populist badge for their answer to llvm ir back to human-readable source language?.


TRANSCRIPT

[Intro Music]

Ryan Donovan: Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. My name is Ryan Donovan, and today we're talking about all the AI bots, and the traffic, and the effects that it has on the internet. And my guest for that is Akamai Data Scientist, Robert Lester. So, welcome to the show, Robert.

Robert Lester: Thank you, Ryan. Happy to be here. Been a big fan of Stack Overflow and the work done there for a long time. Very exciting.

Ryan Donovan: We love to hear that. And as a site that is concerned about the traffic we receive on the internet, this is a topic close to our heart. But before we get to that, [we] would like to get to know you. How did you get into software and technology?

Robert Lester: I did not start my academic and professional career here. I actually started in ancient languages. That was a big interest of mine for a long time, and studied that, and it led me to a natural evolution towards language and logic problems that got me into computer science and engineering; and it's led me towards data science where I get to do a blend of engineering and problem solving, but also data storytelling and crafting. So, I really like both of that.

Ryan Donovan: So, what was your favorite ancient language?

Robert Lester: I spent a lot of time reading ancient Greek and Latin poetry, primarily.

Ryan Donovan: The classics. There you go. So, today we're gonna be talking about the AI bots on the internet, and we've always had bots crawling the internet for search indexing, and such. But from what I've heard, it seems like the bots that the AI companies have sent out are sort of another level of traffic. Can you give us a sort of overview of the research that you did on this?

Robert Lester: If we back up, it kind of starts with classification and taking a look at where we are in the evolution of all this stuff. So, when we think about the tech giants, the traditional ones, especially those that already have scrubbed the internet for a large majority of its data, like Google or Amazon, those who already have these sort of products built out, they've already got these massive internal repositories of data, as well as the infrastructure in place already to be scraping the internet daily, updating their indexes and all of that. So, from a training presence, we only classify really as the AI bots in this space, those kind of adjunct research bots like you might see, for example, the Google Vertex lab. It's really difficult to engage with a customer sometimes and say, Google bot, traditionally, you want to rank high in search rankings. This is something that has been going on for 15 years on the internet, but then, at the same time, the same data is getting mixed with their AI training data. So, it's difficult to draw that line. But then, from another perspective, you also see, getting away from that are primarily for training data, we get over towards like user-driven activity, like 'fetchers,' is what we classify them as. And these are invocations of external fetching when users are using the model. And for someone like OpenAI, they don't have that search index already entirely built, or Anthropic. And then we see with something like Google AI overview, they're able to make internal fetches towards their already indexed results. So, it's a question of presence and categorization rather than these companies taking over.

Ryan Donovan: The fetching is like when you do a research query, or it pulls in just-in-time data it does inference on the fly, right?

Robert Lester: Exactly. Yeah. That's how we're classifying them.

Ryan Donovan: So, my sense was that the AI bots are putting a lot more bot traffic on pages. Is that born out by your research?

Robert Lester: So, it's not a massive needle mover yet, but the growth rate is what we're more interested in. As far as the raw numbers, we're still only looking at this stuff as about a percent of all of the validated bot traffic that we see on a daily basis. But this is a massive growth over what we were seeing at the beginning of the year, or last year at this time, where we've gone up, I think, 400% across all industries. So, it's been a pretty incredible increase and something that we're definitely keeping our eye on.

Ryan Donovan: Yeah. The way that AI companies have used data has changed in the last year. The beginning was just all for training data, and now it is that sort of reasoning model chain of thought, like age agentic stuff. Do you see the agentic stuff sort of increasing that traffic load even more?

Robert Lester: So, part of the problem here is drawing lines between what's bot, what's not a bot. If you think about what an agent is, it's automated like a bot, but it's reasoning in a more intelligent manner than a bot, and it's non-deterministic in that fashion, a lot of the time. And so, the behavior isn't quite the same. Similar to classifying these user-driven fetchers, It's hard to draw that line. And so, what we're kind of moving towards is more of identification and intent of these bots, or whatever you wanna call them, these entities that your online products are interacting with, and moving away from 'bot or not.' 'Cause that is, seemingly, a less important question at this point.

Ryan Donovan: I think I remember that site on the early internet, Bot or Not.

Robert Lester: Yeah. It's very different. It's rapidly evolving, and it's pretty cool.

Ryan Donovan: Yeah, because when you have an AI agent, it's almost like giving everybody their own sort of bot.

Robert Lester: In a way. Absolutely. Or it's something that these large language models are doing as well, is increasing access for people. So, while we're seeing this rise in AI bots, there's also been increased internet activity across the board.

Ryan Donovan: So, you'd said the big majors have everything already indexed. Basically, they have a copy of the internet on their servers.

Robert Lester: Something like that. And I won't speak for them necessarily, but they have a lot of data at their disposal. And the key thing is also they're reusing infrastructure in a lot of cases to where if you wanna classify it as an AI bot, sure, you absolutely can. And in some cases, that makes sense, but it also makes sense to classify it as a traditional search engine optimization bot.

Ryan Donovan: Do you think the other AI companies will start doing this? Should they do this? Is there a reason that they don't?

Robert Lester: I think that they're probably working on it. We do see very large amounts of training activity from some of the bigger names in the space, as you'd expect. The leaders in the space are definitely the ones making more waves on the internet. I assume that in each of these training runs for their new model releases that are collecting more and more of the Internet's data, and trying to do better and better.

Ryan Donovan: You know, we've seen some mitigation efforts against these bots, whether to reduce traffic or to protect the content of these websites. Things like, you know, different licensing schemes, a closing of the internet. Do you think those are effective? Do you see any part of that making an effect?

Robert Lester: The question I think is most important first though, is what is your business model? What does it rely on, and what posture makes the most sense for you? Something that we've done at Akamai that I think is pretty responsible approach when it comes to this stuff is being nuanced in our approach. We're approaching this as a management problem, not as necessarily a threat vector, but these bots can be beneficial to people in different industries while being detrimental to others, for example, someone in the hospitality or retail. They're going to be more inclined to increase their LLM retrieval optimization. You know, they wanna be the first-ranked page. You want your hotel room up there first. You want your sneakers coming to the top of the search results. But at the same time, digital media companies, news publishers, people in that industry, they don't want their content aggregated. You know, that hurts their referrals, hurts their click-through rate, and that's in many cases, bad for business. You know, mitigation isn't the only number that we're going for, though. We have seen a rise in the number of customers that are mitigating these AI bots, and on a case-by-case basis. But yeah, we're seeing a lot of varied approaches across the board, and I think that's pretty healthy for the space.

Ryan Donovan: It seems like the difference you're pointing out is whether the content that you're putting out supports the business or it is the business. What is it when you say you've measured the bot traffic, how do you get the data? I mean, I know you all are a big infrastructure company, but how does that work on the backend?

Robert Lester: So, there are different data feeds that we rely on. Obviously, we can't catch every single thing that comes in across the internet or else we'd be absolutely drowning. We rely on research feeds from what we're seeing across our customer base, for both threat research and larger analytics purposes. We're able to look at both attack traffic and non-attack traffic, and so this really helps inform a lot of our research, our model building, and things of that nature. It's a large amount of data, and it's often like looking for a piece of hay in a haystack, so we do our best there. We rely on a lot of different feature data that we're able to gather from our different products.

Ryan Donovan: Do you end up using any AI to sort out the haystack data?

Robert Lester: We're constantly innovating at Akamai, and there is, even on my team, we work heavily in threat research, and a lot of other places. But a lot of what we do starts with ground-level analytics and trying to take a look at the space at large, and then applying more advanced research methods, and getting towards model building as a final result. We're aimed towards enhancement of a lot of products. Our product backbones are still very fundamental. And we do our best to increase the effectiveness. These kind of newer concepts, we leverage large language models on our own, we leverage neural networks, and it certainly something where we're always trying to improve.

Ryan Donovan: Did you see the bot traffic evenly distributed across sites, or was it very strongly targeted towards larger sites, where there winners or losers?

Robert Lester: There are definitely winners and losers in this game. If you were to guess what the top industries were going to be targeted by these AI bots, what would you say?

Ryan Donovan: I'd imagine it's probably somewhere in the tech industry, right?

Robert Lester: It's actually commerce. So, the most targeted industries are commerce, which kind of encompasses retail, hospitality, things of this nature, different online brands. But really what's happening is the most requests are coming from these bots that need to be constantly updated for spaces that need to be constantly updated. You're gonna see a lot of fetcher requests towards hotel providers or companies because they're always changing rates on rooms. People are always trying to get the best deal. And so, it's interesting that is where this is funneling, but it makes a lot of sense, as far as market dynamics go.

Ryan Donovan: Do you have a sense of what percent of these bots are the front-ends creating alternative commerce marketplaces are the researching prices?

Robert Lester: It's actually really interesting. We're just starting to take a look at a report that was released by the National Economic Bureau, and some OpenAI and Harvard researchers, and it said that ChatGPT user-driven traffic is moving away from work and towards non-work activities, and we're seeing a lot more of this doing than in the past, where people are asking models to do things for them, rather than just asking questions. I think that is probably in large part due to the fact that we're starting to see them engaging more with external resources, whether that be through agents, fetchers, these different search triggers, search bots. I imagine in many cases, there are a lot of these wrappers out there that are just an API call to one of OpenAI's models, and trying to build the most effective hotel fetcher. But at the same time, we are seeing a lot of organic user-driven traffic, as well.

Ryan Donovan: Also, with the unevenness of distribution, it's not equally driven by all of the bots, right? There's certain standout ones.

Robert Lester: And it's constantly changing, which is crazy. But, you know, we're looking at this stuff, and every week something new is happening. Like, we published a blog on this earlier in September, I believe. But it was talking about OpenAI, and after their GPT-5 released, a lot of stuff went just insane. Their numbers were going up and down like crazy. They had released this new model, and when you would make a search request, we would see a lot more results in the search request. And we were able to at least request growth in ChatGPT user, which is the user agent for that bot. But yeah, it went insane, and then it seemed later that they dialed it back, and then were crawling through dev forums, and we're seeing that people are reporting a lot of ghost requests made by ChatGPT. And then, soon after, there was a new release and seemingly affixed to that. And so, that is something else that stands out about these AI native companies–the ones that have popped up in the past five years–is they're not afraid to build in public, and they're not afraid to move fast, and break things, and put them back together, and they're having a lot of success doing it, but it's something that is definitely bearing out in what we see from them.

Ryan Donovan: Yeah, I mean, in this case though, the things that they're breaking may be the rest of the internet.

Robert Lester: We hope not. So far, it's pretty benign, but yeah, it's definitely worth watching.

Ryan Donovan: Would you say the bot behavior was insane, is that a product of just like the sort of fluctuating, constantly changing behavior, or were there things where you're like, 'what is this guy doing?'

Robert Lester: Oh, no. I'd say it's definitely the prior, and it's relative to what we know, right? When we see these traditional search crawlers, a lot of them behave in a very predictable sense. We see seemingly circadian patterns that, you know, might relate to load shifting between clusters, or something like that, where we're able to see these; if something makes a massive change, then we look into it. They've made an infrastructure change, and this is the new norm. We haven't really been able to establish a lot of norms for these bots, and that's partially due to just the new nature of them, but it's also due to the fact that they're growing very fast from a popularity standpoint, but also from an infrastructure standpoint where they're getting better at collecting data, they're getting better at letting their models loose on the internet, and it's cool to watch.

Ryan Donovan: And the nature of what AI does and can do changes, too. It's fascinating.

Robert Lester: Yeah, absolutely. Some of the more bleeding-edge stuff that we're looking at now is really interesting. We're starting to see agents interacting at point of sale, which is something that we're not entirely sure how the public is going to react to something like that, or if it's necessarily a super viable future, but it's a really interesting concept of these agents are actually exchanging money, and they're buying products, and what does that do to the customer who is optimizing for sales from anyone? Not just a human, but maybe you need to learn how to sell the agents now, which is a totally different question, maybe.

Ryan Donovan: There's so many weird things with that. First of all, are you comfortable having your agents spend your money?

Robert Lester: These things are pretty good, and they're getting better, but they're not perfect. And so, it introduces a pretty interesting question both on the client side, but the customer side as well, both from a sales perspective, and also a security perspective, because we don't know exactly how they're gonna interact. And that's why it gets back to that question of not 'bot or not', but intent and identification.

Ryan Donovan: Have you seen any data on the sort of difference in behavior between sort of simulating a browser and taking entire webpages, or any of them just calling low APIs directly?

Robert Lester: We haven't seen a lot of differences. I mean, there are some though, in how these AI companies present themselves. You know, for example, some of these places are. Really cooperative. They're doing their best to be good participants in the online universe. They're doing their best at self-identification, helping us verify that they are who they say they are, and making sure they don't get blamed for anything that wasn't them, right? Which is really positive. And so, we see that, for example, one company that does this, they use a certain identifying feature for a lot of requests that come from this bot that comes from interaction in their browser. However, when people are going through the API and making calls from there, they aren't efficient enough, or whatever the case may be, they're not including the same signal, and so despite best efforts and the fact that they are still identifying in some respect, it makes it a trickier question where we're having to rely more on behavioral signals than self-identification entirely.

Ryan Donovan: For those bots that don't self-identify, what are the sort of behavioral signals that you use to spot them?

Robert Lester: Can't give away everything, but we do factor in a lot of features, whether it be something like network telemetry, whether we're starting to look at the actual behavior of how these things are working, which is something that we've been working on building models for a little while, which has been just an awesome and super interesting process trying to identify what these bots actually behave like online, which is awesome. But realistically, we're looking at everything. We're looking at self-identifying features, we're looking at telemetry. We're looking at different signals across the board and a lot more feature data. But yeah, can't get too deep into it.

Ryan Donovan: Don't wanna spill the secret sauce.

Robert Lester: Exactly. No. It's still moving fast. And some of these partners are more cooperative than others, and it's still the wild west out there.

Ryan Donovan: Are there things you are nervous about or hopeful about in the increasing AI bot storm?

Robert Lester: I'm hopeful that customers are going to be able to engage with these bots in the most effective manner for them. I think that while it is a changing landscape, and it's a little bit intimidating when this stuff is moving so fast, and you have to make plans around it, but there's also massive opportunity here. The first people who are able to game this in their favor are going to be massive winners. If what we've seen to this point indicates anything to the future as far as growth and the trajectory of where this is all heading, this will be an important part of the new online economy, and the Internet of things. So, it's a really interesting proposition. I mean, in the short term, we're still working through all the seasons with this stuff. We've been looking at it for a while now, but the traffic today and the way that these models are used today is way different than it was a year ago, and even than it was six months ago. So, we're looking at Cyber Week coming up. We're really excited to take a dive into what exactly we see. Some of these companies with agents, for example, interactive POS. They didn't exist at this time last year, so we're excited to see what they do. And bots always go crazy during the holiday season. Everyone's familiar with Grinch bots, and all of these more traditional threat vectors as the holidays approach, but this is an entirely new ball game.

Ryan Donovan: You know, Black Friday, Cyber Monday, and then Bot Tuesday, maybe?

Robert Lester: Yeah, something like that. Keep an eye out. We'll definitely be putting some stuff out about that and talking about what we see. For sure. I think the best message to take away from it all is just how wide open this arena is right now. There are so many options from a customer standpoint. There are so many different factors going into the equation right now, that being able to manage these bots and being able to see them is probably the most important thing. You wanna be ahead of the curve on this thing, and it is something that I think we do really well at Akamai, as far as being able to provide this service. Being prepared is the best possible step forward, so get in touch with your bot.

Ryan Donovan: It's that time of the show again where we shout out somebody who came on to Stack Overflow, dropped some knowledge, shared some curiosity, and earned themselves a badge. Today, we're shouting at a Populous Badge winner, somebody who dropped an answer that was so good, it outscored the accepted answer. So, congrats to Evan Phoenix for answering 'llvm ir back to human-readable source language?' So, if you're curious about that, we have an answer for you in the show notes. I am Ryan Donovan. I edit the blog, host the podcast here, at Stack Overflow. If you have topics, questions, concerns, comments, please email me at podcast@stackoverflow.com. And if you wanna reach out to me directly, you can find me on LinkedIn.

Robert Lester: And I'm Robert Lester. You can find me doing Akamai AI pulse blogs, or yeah, if you wanna reach out, I'll be on LinkedIn as well.

Ryan Donovan: All right. Thank you for listening, everyone, and we'll talk to you next time.

Add to the discussion

Login with your stackoverflow.com account to take part in the discussion.