Revealing the unknown unknowns in your software

New Relic is a full-stack observability platform that helps engineers plan, build, deploy, and run software. Read their 2025 observability forecast.

Connect with Nic on Linkedin or email him at nic@newrelic.com.

Congratulations to user Yochai Timmer for winning a Populist badge on their answer to Reader/Writer Locks in C++.

TRANSCRIPT

[Intro Music]

Ryan Donovan: Hello and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm Ryan Donovan, your host, and today we are talking about the complexity crisis. We'll be getting into what the complexity crisis is, possible solutions, and confounders with my guest today, Nic Benders, who's the Chief Technical Strategist at New Relic. So, welcome to the show, Nic.

Nic Benders: Thanks, Ryan. It's great to be here.

Ryan Donovan: Before we get into the complexity crisis, how did you get into software and technology?

Nic Benders: I always talk to people, and they have these fabulous, meandering careers where they touch on all these things, and mine was kind of like a straight line. It's that when I was a kid, I had the crazy TI personal computer. It had 24 big kilobytes of RAM on it, and I spent all my time with that until I could get bigger and bigger and bigger computers. Along the way, I discovered that administering systems is its own kind of joy. In addition to writing software, I went to college for computer science. I actually spent a large portion of my career as an operations and systems architecture person, just kind of running big sets of servers, which is endless fun. And then, you know, it was pretty natural. You'd run into all these problems, and you say, 'well, nothing actually does what it's supposed to do.' And so, you'd cobble together a little bit here, some duct tape there, a bunch of pearl scripts, and I got this opportunity to come to work at New Relic, and you know, I was like, 'I use New Relic. It's like I need that visibility. I'm going to go in there and get right onto that.' So, it's like, it's in many ways, like I said, my career is this funnel that goes straight to this point. You're working in observability.

Ryan Donovan: There are higher orders of CIS admin.

Nic Benders: Yeah. Yeah.

Ryan Donovan: You spend a lot of time thinking about computer systems, network systems, how they work. What is the complexity crisis that we're facing now?

Nic Benders: Well, when I go back and I look over my career, the increasingly large quantity of it– when I started in the 90s, we had lots of people and few systems. You know, a system staff at the university was half a dozen people, and we had probably three or four computers that really needed attention. And so, everyone's working around it, and you knew every piece of this, and you treated them very special. As life went on, we got more and more, and you say, 'well, I can't name my servers anymore. I'm gonna have to just adopt some automatic naming scheme.' I mean, server naming was a lot of fun. You had some type of punchy theme, but you know, so the complexity grew, and we said, 'well, we're gonna need software to keep track of our hardware because we now have so much hardware. We're going to need software to manage our software deployments because there's actually– we're deploying a lot of stuff.' We moved to microservices. I remember the first person who told me about microservices, and we were like, 'okay, so we're gonna need a couple of dozen different services for this architecture.' He's like, 'oh, I think it'll be a couple hundred,' and it was over 1000 within three years. It's just this explosion that occurred, and it brought good stuff. You know, we were able to suddenly really reduce the scope of what we thought about when we're writing a single piece of software. But it also brought a lot of bad stuff. It made debugging into this elaborate murder mystery where you're not sure who did what, like what the pieces are. You couldn't just trace through it, you know, stepping with your debugger. And just again and again, we've done this thing. We've gone to microservices, we've gone to the cloud, so I can't just go to the data center and kick the server with my foot, or listen to it and see if it's swapping. We've gone to containers, so there's not even an operating system to attach to. We've gone to orchestration where I don't even know necessarily where my containers are running. They're happening somewhere, next minute, they might be happening somewhere else. Each one of these steps has been great. It's this really powerful layer that we build on, and we couldn't build the software that we have today without those layers, but at every step, we've given up a little bit, too; and that complexity is what we give up. And so, you know, that last 30 years, I think, have been in a continuous move up the stack, but that pyramid is just getting really, really wide with all the pieces that are inside it.

Ryan Donovan: We're going up these levels and levels of abstraction, right? But you're saying the complexity comes from: we were supposed to not have to worry about all the other stuff underneath, right?

Nic Benders: Well, it's always a lie.

Ryan Donovan: Yeah. Do you still code in assembly? Is that you?

Nic Benders: No, but I think that you still have to know what it is. You have to know that assembly exists. Maybe it's not quite the, you know, drop into machine language when optimizing my loops that we used to do, or like, you know, I used to be suspicious of the compiler constantly. You're like, 'that compiler is messing with me.' But especially in the operations space, everybody knows there's been a time when your tool you're using is this beautiful abstraction, and it's just not accurate enough. Like, what is happening behind that screen? And if you don't know, oh, well this is actually just issuing these commands to Kubernetes, what's Kubernetes doing? Kubernetes is actually just manipulating these operating system concepts. Those operating system concepts are really operating this hypervisor, or they're doing this down into the hardware, this on the network. You kind of have to be able to picture that whole thing because sometimes those abstractions do leak. It gets kind of weird, and most of the time it's great, and I almost feel like the more often it's great, the worse it is when it's not great.

Ryan Donovan: Yeah. Well, it seems like you talked about optimizing your loops with assembly back in the day; that computers and computing have gotten almost too powerful, where you don't feel the pinch of that lousy loop, or whatever thing you have there. It's, you're not getting bit by it.

Nic Benders: Yeah, no, I think that's absolutely true. So, we live in such an abundance of resources–

Ryan Donovan: Right.

Nic Benders: That you're like, 'oh, I'll just spin up a bigger instance. It'll be fine.' And, you know, I don't know, I'm sure that the people whose beards were gray already when I started would say the same thing about, you know, me. It's that, you know, I'm not punching my own cards. I didn't have to pull the tape back and forth, and like, 'oh, well you've got all this ram. You got ram and megabytes,' like, 'you're going soft.'

Ryan Donovan: We used to have to code uphill both ways in the snow.

Nic Benders: Exactly.

Ryan Donovan: So the complexity under the scenes, how do we make that visible and understandable, and get people to get in there?

Nic Benders: You hit on the two key concepts immediately in your first question, which is the visibility and the understandability, because these are closely related, but they're not the same. And I think a lot of people conflate this. I work in the observability industry; that's what we call ourselves. It's kind of a dumb name. Observability is the tool that we make, and it's an important tool, but it's not everything you need. So, the first thing we've done is we've tried to say, 'hey, when something does something that you didn't expect, you know your application is getting crushed for resources, but everything you say, well, why did that happen?' First thing, you have to do is make your system observable. So, you have to build each piece so that you can see what's happening inside it when you need to, and you need to be sending that data to somewhere that you can compare it over history so you can go back and say, 'yes, this is a problem. Is this the new problem, or is this just a thing that I wasn't paying attention to?' And so, stepping through all of that observability gives us the ability to travel backwards in time and to look inside the system, to take, you know, off the side of the server, and like, the side of the piece of software, and say, 'well, this application is slow, why?' But that next step, which is, can you understand it? Understandability is kind of what I think we should be called, 'cause this is the missing piece in a complex system, is you can observe every piece of a complex system. And I think a lot of us do when we sit down at our consoles, you maybe you've got a dashboard with 500 widgets on it. Now you know, maybe you've got metrics for a million different time series. So you observed the system, but it's not the same as understanding it. And we talk about understandability, especially right now in this AI moment, because AI is feeding this problem and making it worse, and also giving us the opportunities to be making it better. Possible way out of this loop, because we have a chance for the first time to really bring some extra human-style thinking to some problems that we were never gonna have enough people for. And so, we've come to this really interesting moment where we don't know whether we're all gonna just get absolutely drowned by the extra complexity from AI, or whether it's finally gonna give us something that gets to float up above the system.

Ryan Donovan: Yeah, I mean, it has a couple things I think that, you know, work in both different directions, and that it's good at processing large amounts of data and making it sort of understandable, but then it's also itself not very observable. Right?

Nic Benders: Yeah, AI, it's kind of got three places where it touches into our industry right now. So, one is this idea of people using AI-powered tools to write software. So, they use Copilot, or they're using Claude Code, any one of these systems out there—Cursor—going through and saying, 'oh, I can write software faster now because I have this coding assistant. I can, you know, maybe do vibe coding, like a low code or no code approach to building a piece of software that otherwise I'd never be able to build, or it would take me a huge amount of time to build.' So, that's awesome. These are powerful new things. Again, I think back to, you know, the software engineers of the past, perhaps chiding us for, you know, 'well, compilers, you'll lose the feel for the machine.' At the same time, if I hired a new employee and they wrote a piece of software, and then it wasn't working the way I thought it was supposed to work in production, I just get on Slack and I ask them, I say, 'hey, could you take a look at this? Is this doing what you expect? Could you explain to me what was your thinking? Why is it written this way and not this way? Like, is there something I just don't understand?' If I use a coding assistant, or if I use a vibe coding system and it writes a piece of software, or it writes a software module, and it goes into production, and it doesn't do what we expect, who are you gonna ask? Like, I can't go to Copilot and be like, 'hey, Copilot, could you explain what you were thinking here?' Because it doesn't know what it was thinking. It only knows what it built. It doesn't have that internal history of the decisions: 'oh, I did this instead of this, although I was worried about this other thing.' It's just gonna tell you, 'well, I can read the code same as you,' and tell you that's a problem. So, much of debugging is about matching those expectations of the universe into the realities. So, that's one thing that AI is doing. It's creating this new problem, both because it generates lots of software, and also it doesn't understand the software it's written. The second is that AI is a new technology just like mobile, or cloud, or anything else, and every company's deploying it. I think that, you know, when we talked to– you know, we do this observability forecast every year to help understand our market. People are putting a lot of AI software into production, and so then you have to be able to look into that. Of the companies we talked to, last year was like 40-some percent had observability for their production AI systems. This year it's 50-some. That's great. That's moving in the right direction. It's the first time we've seen that result [be] over half, but that still means almost half of AI systems out there are not yet being observed by anything. This isn't just New Relic customers, this is just people in the industry in general. Okay, well that's kind of scary. There's a lot of software out there that's doing things, and no one's keeping an eye on it.

Ryan Donovan: When you talk about observability for these systems, do you mean some sort of explainability viewpoint into the LLM itself, or is it around the sort of inputs-outputs of a model?

Nic Benders: Mostly the inputs-outputs. And so we see a lot of customers and a lot of other companies in the space very heavily relying on these cloud-delivered AI solutions, especially for LLMs. And so, there's some other, you know, smaller statistical tricks and things like that that are running locally, and with new Open Weights models, maybe we'll see that grow again. For privacy reasons, people will be pushing that, but OpenAI, and Anthropic, and you know, Google, they're just pushing this state-of-the-art so fast on their cloud-hosted models–

Ryan Donovan: Right.

Nic Benders: That most people just turn to those. So, what happens inside the model? Hmm. That's really very hard for us to see. But what we can see, we can observe that interaction and say, 'you sent this message to this system. It answered back with this other message. It cost you this many tokens. It took this much time.' These are pretty basic measurements, but they're really important because it lets you find somebody who had a weird customer interaction later, and say like, 'hey, what did the model say to them?' Like, 'let's go back in time and see what was said.' You could also use AI to monitor AI and do simple things like a sentiment analysis over all of the responses. 'Hey, did our chatbot swear at anybody yesterday? Was it really angry or depressed?' Let's check into those types of things. And of course, cost, just like the early days of the cloud, we handed everybody this box of paints, and they're pretty expensive, and we shouldn't be surprised if somebody tries to do the exterior of their house in oil paints, because we haven't told them what to do. So, you gotta keep an eye on those basics, you know, things they apply there just as much as anywhere.

Ryan Donovan: I can imagine folks looking at saving the prompts and responses for every interaction as like, you know, the problems with logging blown up to the nth degree, right? You just have so much storage that you're using for these three-paragraph responses. And then are you leaking sensitive information?

Nic Benders: It's definitely something that, you know, we and every other tool in the market have to be very mindful of the sensitive information part, especially the storage size in comparison to a lot of stuff that happens. At machine-scale, where one service is calling another service, and it's generating log lines for like a 10-paragraph exception stack, the amount of data that you go back and forth to the LLMs is relatively small compared to those, and it's manageable 'cause it happens at human scale. Some person did a thing, and the number of human interactions is so much smaller than the number of machine interactions that it really keeps it under. But you have to apply controls for masking. You have to say, 'let's mask out any known sensitive data.' You've gotta apply sampling controls and say, 'hey, if something were to happen where the volume exploded, we actually don't need to capture every one of these. How do we know which ones to look for? Let's look for ones that have errors,' et cetera. A lot of the same practices that we would've taken over the last 10 years to analyzing distributed trace trees, looking at 'how do I know which database calls are the ones that I want to capture?' We can look at those same techniques, and you know, apply those into this. When it all boils down to it, it's just another service call. Like, we're still reaching out over the network, and we're talking to another system, and the protocol's a little weird, and it's odd that it talks to me in human language instead of like, you know, in SQL, but it's fundamentally the same.

Ryan Donovan: Right. It's just an API call at the end of the day. So, we've been talking a lot about how AI kind of complicates things, makes things difficult. How does it help?

Nic Benders: The way it helps is that third piece. I almost wanna take a step back. We talk so much about AI right now. Like, you know, what is AI? Is it artificial general intelligence? All these kind of moving goal lines. What I wanna talk about here is specifically two things, one is LLMs. LLMs are those chatbots and everything that most people are talking about right now when they use the word AI, and the other is neural net-based prediction systems, not for language. And so, it's like, beyond a statistical model where you might have a static baseline system, but it isn't talking to you, it's intelligently making inferences. The thing that these two systems both have in common that's super valuable for our industry is that they do a good job of dealing with unstructured and semi-structured data. And it turns out we have a lot of semi-structured data in the world. We talked about those log lines– I got log lines, while I know which host it's from, but my host name maybe is in a kind of funky pattern where sometimes it resolves and sometimes it doesn't. I know the timestamp, but then the message of a log line, is it 100 lines of stack traces? Is it a message? Is it kind of the same message, but it was worded by somebody on a different team, and they just didn't quite phrase it the same way, where it has a typo in it? So, it's got this semi-structure, and so much of our world is pretty easy for a human to look at. We would say, 'oh, that's an interesting error. Oh, that's the same thing. Oh, you know what? This message reminds me of a message I saw six months ago when we were doing an incident. Huh. That's interesting. I should go back and look at that incident now.' Let's go to the LLM world. The LLM can do all of this in almost real-time. It can say, 'oh, here's a log line. It's an error, or it's a new burst.' I say, 'I see a significant excursion from standard rates.' I look at the log message. It's got some interesting words in it. Those interesting words map against my perfect artificial memory of every incident that we've ever seen before, and anything that was said in those incidents, 'oh, that's in these RCAs, let's pull up those RCAs. Do these RCAs have anything to do with any of the current problems that you're seeing now? Maybe yes. I'll show them to you. Or, maybe I'll even just propose them to a human,' not as a final answer, but as a, 'hey, let me save you some legwork should you look at this old retro.' I think this possibility, combining the prediction characteristics of figuring out if something is anomalous or not, with the structured and semi-structured merging between log data, span data, host names, application names, error stacks, retros, you know, chat messages– you can start to stitch all this together and do something at scale that would just have been infeasible to do otherwise. And you don't have to be a superintelligence to do this. You just have to be good at navigating the kind of weird way that engineers talk and type, and the way that concepts are related. I think that this is a huge promise. So, that's that understanding engine that I was talking about earlier where we go from being just 'seeing' to 'understanding'. And then from there, can you go into 'acting'? Can you see what's happening? Understand what's happening, and take action to deal with it. You know, this is something that we talk about as an industry, as if it's a holy grail, but if I look backwards, automated remediation is real. It's every day for us. It's 'white bonnet' or whatever was restarting our daemons, you know, 20 years ago. Kubernetes–

Ryan Donovan: Turning it off and on again, right?

Nic Benders: Right. Turn it off and on again, at scale. Kubernetes, when I lose a node, it just reschedules somewhere else. Like, you know, there's so many parts of our system today– auto-scaling, 'oh, we got extra load. Just spin up more instances.' This is auto remediation, but it's all done in a very, very clear way. This happens, then this happens. I have to build each playbook. And we could build playbooks for every conceivable task, and we'll never keep up. So, can we use the same kind of semi-structured, structured interface to take those playbooks and make them work without having to spend my entire life writing every case for one app?

Ryan Donovan: DevOps is all about automating a lot of the incident infrastructure production stuff, right? Like, just make it run smoother. But I think there's a lot of folks who are also like, 'I don't understand thought processes of this AI,' right? Just have Kubernetes spin up a new node. You understand that thought process. There's not a lot of thought to go in there, but when it's like, 'all right, how do I fix this incident?' You want something that you can sort of understand the thoughts of, right?

Nic Benders: Yeah, I think that's true, and I think that what we'll come to is that that line between clear and understandable, like, 'oh, machines just do it.' Kubernetes doesn't page a human to say, 'hey, I lost a node. Should I move the stuff?' You're like, 'yes, obviously move the stuff.' It's considered to be so simple. What's that next layer? What are things that we might've said we required a human five years ago that today maybe don't? Maybe, 'oh, well, if I see this message, that's actually a really bad sign, trigger rollback. The rollback mostly worked, but this one node got stuck. I need you to kill that node and then move it through.' I'm not asking it to necessarily write a poem for me, although I guess we saw LLMs are really good at that. But just to think a little bit past the boundaries of a static runbook, and I think that we could probably actually gain a lot from this just looking at how do we push on that boundary and then, you know, three years from now, five years from now, if we look at the problem again, maybe we think differently about what is the machine's job versus the human job. I think there always is a human job. It's like, build the software to accomplish goals and things like that, but you know, just like you asked me earlier, am I still writing any assembly language? I'm like, 'no, I'm not.' You know, maybe I'll feel the same way in five years about some of the stuff that we do today.

Ryan Donovan: Right. You're not coming through Stack Traces anymore, or something. How do we get an LLM that understands all this context, all these possibilities of a system, and have enough guardrails on it so when it does act or suggest actions, it's not breaking anything vital?

Ryan Donovan: I think

Nic Benders: that we have to build trust, and you've gotta build context. So, trust comes from– as humans, we're gonna have to see into the system. So, we need to see what's happening in the applications. What's normal, what's not, what happened after, what were the actions? We wanna be in that centaur mode where the AI is powering us and letting us do this, but we're still seeing all the steps and making the decisions at this point. And then, I think that what we'll find is there's some parts of that that we can really understand the edge of and say, 'okay, well we can put a strong box around this. The AI is allowed to restart nodes and to escalate and do this and this, but it's not allowed to just like, you know, delete all my AWS resources.' That's the kind of thing I don't need it doing. We gotta look to slowly move that up. In order to get to this point, we're gonna need to settle some things that today are pretty fragmented in the industry. I think that we need to settle on the AI tools themselves, but also on the visibility, like what do we need to do for observability? Right now, a lot of AI monitoring is done just by whoever wrote the AI system, and I think that's fine for those individual technology platforms, but it's not good for consumers, and the customer who needs to see across multiple platforms. So, we're really hoping to see OpenTelemetry adoption in the AI systems. We've put a big bet on OTel a number of years ago because we could see already that, regardless if this is even before AI was a news item, you're never gonna be able to install your own software everywhere. You are always gonna be working in a partnership mode. I have this cloud provider, I have this SaaS provider, I have this tool here that this team loves, and it answers this question, but I want access to some of this data, I wanna pull it all together to do my job. You know, I can't convince every cloud provider to go out and install my own company's software everywhere. So instead, we work with the OpenTelemetry group and get them to settle out, 'here's how we can do these problems with that,' and we're seeing this payoff. We see Amazon embraced it years ago. Google is now embracing it. If we can get OpenTelemetry to also be embraced in the AI world, that will, I think, add that key ingredient, which is interoperable visibility into what the AI systems are doing.

Ryan Donovan: That's interesting. So, getting OpenTelemetry into the decision-making process of the A– or sort of behind the API?

Nic Benders: I'm thinking behind the APIs, like, around all the things. When I think about the strength of OTel, there's a wire protocol and things like that, but those are actually almost kind of boring. The strength is that it's a set of common decisions, like concepts, nouns and verbs, things we can talk about so that we know, 'oh, this is a span. This is what a metric looks like. Here's how a distribution works.' That's what we need. We think it's gonna be OpenTelemetry, maybe it's something else, and we'll have to adapt to that, but you've gotta have this common conceptual framework for monitoring what these things are doing. And then that provides not only output from the AI systems into your supervision of them, but it also can provide the input because we're gonna use AI to help solve this complexity problem caused by AI. And so, being able to stack that is super valuable for understanding if all of these agents, or tools, or whatever else we're calling them this week, if they are doing what I want them to do, and then being able to go back in time and understand where they went wrong when they don't.

Ryan Donovan: I mean, I could see a little resistance to that in the AI companies. There's been a little bit of magic and smoke and mirrors. I remember there was some state-of-the-art LLM that was actually eight different models or something, and to get OpenTelemetry in there, it would be like, 'here's how it all works. Here's Oz behind the curtain.'

Nic Benders: It's probably too much to ask for it to be all the way in the back, but I think a little bit forwards just being able to see those interactions. A great example of this would be all the different frameworks that you can use for doing vector searches, and inference, and writing agents, and so a common instrumentation for how we build AI agents, I think, would go a long way towards being able to trust that our agents are doing what we want them to do, and it almost doesn't matter– yeah, what is behind the curtain? The great and powerful Oz can be left in peace, but I wanna know what is happening in Munchkin Land.

Ryan Donovan: A lot of companies have moved to Open Source in general just 'cause it's a more visible ecosystem. Would having the sort of Open Source, or at least Open Weight AI, help get that visibility?

Nic Benders: I think Open Weights are interesting. I play around with a lot of Open Weight models. I think Open Weight models are really good for understanding the world of the possible, be like, 'hey, I've got some experiments I wanna run. Maybe I have some proprietary data I don't feel comfortable sending in some opaque API to a possible competitor. So, I wanna play around with this.' And I think that it's also it's just a chance for people to understand that there isn't the great and powerful laws that, you know, it's like, 'oh, you know what, this is actually mostly just Python and some GPUs,' and that it demystifies, even if it's not explicitly telling me, 'oh, well this is what's happening exactly in my production environment.' Yeah, Open Weights are the perfect tool for understanding, 'oh, it does this, it goes through a router, and then it's being transformed. It comes out and spits out,' and just building that intuition, a feel for the system, that makes them less scary.

Ryan Donovan: Maybe we need a great demystifying overall. I think what we talked about earlier is that people are losing the visibility and the sort of nuts and bolts of the things that are going on underneath. Getting these layers of abstraction on top. Maybe we need to take the magic out of AI and software engineering again.

Nic Benders: Yeah, I think that's absolutely true because the grumpy software engineers were here to take the magic out of it. But you know, it's kind of true, a lot of mistakes were made by people who thought that JVM was magic. And then it turns out that JVM just runs on a computer, and then, 'oh, it's a cloud. The cloud can do anything.' A cloud is just somebody else's data center. 'Oh, AI will produce all these miracles.' You know? It's really just a statistical model. Some Python. It's got tokens in, it's got tokens out. It's complicated. It's kind of amazing-seeming. It is still weird and wondrous, but in the end, it's just software, and when you run it yourself, yeah, maybe it's a healthy dose of removing some of that magic.

Ryan Donovan: Well, it's that time of the show folks, where we shout out somebody who came onto Stack Overflow, dropped a little knowledge, shared some curiosity, and earned themselves a badge. Today, we're shouting out the winner of a populous badge – somebody who dropped an answer on a question that was so good, it outscored the accepted answer. And congrats to Yochai Timmer for answering 'Reader/Writer Locks in C++.' If you're curious about that, we'll have an answer for you in the show notes. I'm Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you have topics, comments, et cetera, you can email me at podcast@stackoverflow.com. And if you wanna reach out to me directly, you can find me on LinkedIn.

Nic Benders: I'm Nic Benders, New Relic, and when I'm not observing the systems and trying to understand them, you can find me on LinkedIn, or shoot me an email nic@newrelic.com. I'm always happy to hear from you.

Ryan Donovan: All right, everyone. Thank you for listening, and we'll talk to you next time.

Add to the discussion