“You can't vibe code scale”: What the AI hype gets wrong about software engineering

In some quarters, there’s a sense that AI has democratized software creation to the point where deep engineering expertise is becoming somehow optional. Vibe coding—in theory, at least—lets anyone describe what they want and watch AI build it. It’s not that there’s nothing to this. Prototyping is faster, junior engineering tasks are more accessible to folks without extensive formal training, and iteration cycles have been compressed. Again in theory, this should give developers more time and mental energy for complex, higher-order work.

But (you knew there would be a “but”), there’s a big difference between building software and running software at scale. AI hasn’t closed that gap. As Braze cofounder and CTO Jon Hyman and Stack Overflow CPTO Jody Bailey discussed on last week’s episode of Leaders of Code, the AI explosion actually makes senior engineering judgment more, not less, valuable. Because someone still has to own the consequences of what gets built and whether it can function at scale.

Let’s give the Pollyanna case its due. AI really has lowered the barrier to building functional software. Product managers can spin up interactive mockups to inform their thinking and communicate the functionality they need to engineers. Designers can resolve UX issues quickly and independently. Small teams that couldn’t afford to staff for moonshot projects can start building right now. All of these breakthroughs are real and worth noting. But as useful as vibe coding can be, it has a ceiling.

Building software and operating software are two different disciplines. That distinction gets lost in most conversations about AI and productivity, and it's where the vibe coding narrative starts to break down.

A prototype doesn't have users, traffic spikes, cascading failures, or data pipelines that degrade quietly under load before anyone notices. It doesn't have the accumulated weight of three years of architectural decisions, some brilliant and some regrettable, all of which constrain what you can do next. The prototype is the easy part. What comes after—running software reliably, at scale, for real customers with real expectations—is where engineering judgment becomes absolutely indispensable. As AI takes on more of the execution layer, the gap between what a model can generate and what it can understand becomes ever-more consequential.

Scale has specific, unforgiving demands. Distributed systems fail in ways that are rarely obvious and almost never reproducible in a local environment. Latency compounds across service boundaries in ways that don't show up until the stakes are uncomfortably high. A database schema decision made in week two becomes a migration nightmare in year three. An architectural pattern that works elegantly for ten thousand users can collapse like a house of cards at ten million. These kinds of challenges are the day-to-day reality of engineering teams running productive systems. Solving for them demands knowledge that’s deeply contextual—hard-won through experience and distinctly human.

Jon Hyman, CTO of Braze, put it plainly: "You can't vibe code scale... Being able to run that at high complexity, high scale, high down use cases is something that requires a deep understanding of what you're doing, the business problem that you're solving, and then how all the systems work together."

AI models are getting remarkably good at reading code, but that’s not the same as understanding a system. Even with a million-token context window, a model doesn't contain your business processes, customer use cases, organizational constraints, or the reasoning behind decisions made long ago. AI sees the what; it rarely has access to the why. (That’s where Stack Internal comes in!)

A new technology that makes everyone more productive? Of course some people are going to look at it as a cost-cutting opportunity. If your engineers can do twice as much, you need half as many engineers. Right?

Not so fast. Think about how the competitive advantage actually works. AI productivity gains aren't proprietary. Every company in your market got access to the same models, the same tools, and roughly the same multiplier on engineering output at roughly the same time. So if you use that multiplier to reduce headcount and hold output steady, you haven't improved your competitive position. You've just spent less money to stay in exactly the same place. Meanwhile, your competitors are using their multiplier to build more and ship it faster.

As Jon framed it: "Everyone instantly, globally, got this stepwise increase in productivity... if you had 100 engineers and all of a sudden now you have the output of, let's call it 180 engineers, is the first thing you do, is it to go and build Salesforce? Because all of your competitors also went from having 100 engineers of output to 180 engineers of output. And they're working on their roadmaps."

In fast-moving markets, the ability to ship meaningful features consistently—and faster than the competition—is itself a differentiator. Customers notice; deals are won and lost on it. Cutting engineering capacity at the exact moment that execution velocity becomes more achievable and more competitively important is a strange way to deploy a productivity windfall.

AI didn't give everyone an advantage; it raised the floor for everybody. What you build on top of that floor—how ambitiously you use the newly available capacity, how clearly you prioritize your roadmap, how well your engineering culture is positioned to move fast without breaking things—is still a human decision. That’s not (just) a limitation of the technology, I’d argue: It’s a good touch-grass reminder that strategy isn’t technology’s job in the first place.

The gist is that AI doesn’t make senior engineers redundant. But it can—and probably already has—changed what they spend their time on. If you’re one of those engineers, you probably spend less time on boilerplate, scaffolding, and repetitive work, which we hope gives you more time for system design, architectural decision-making, and other kinds of context-dependent problems for which your human brain is an unconditional requirement.

This raises the ceiling on what a small but experienced engineering team can do. As autonomous agents take over more of the execution involved in engineering, a new responsibility emerges that sits squarely with senior engineers: codifying what they know.

Agents can only work effectively within the boundaries of what they've been given. Right now, much of that knowledge lives exclusively between experienced engineers' ears. Getting it out of those heads and into a form that agents can actually use will be among the more consequential engineering tasks of the next few years—because it determines how much of your AI investment actually compounds over time.

If you understand that AI can increase the value of human judgment while absorbing low-complexity execution, then the implications for how you should manage your team, assess your budget, and set your expectations are significant. Here are a few places to start.

AI should be handling work that doesn't require the human judgment of senior engineers. If that’s not the case, you have a leadership conversation, not a technology problem. Engineering managers should be actively identifying the categories of work that can be offloaded (e.g., boilerplate, routine testing, basic debugging) and setting a clear expectation for where engineers should spend their time going forward. The bar for what a team can deliver in a sprint has risen.

This is a low-hanging fruit of a metric: easy to focus on, but probably the least useful one for a team with genuine roadmap ambitions. Some better questions are:

What can we build now that we couldn't before?
How has our cycle time on features changed?
Do your engineers report less burnout and frustration at work? Are they excited about the new things they can build?
Are we resolving more UX debt, shipping more experiments, or responding faster to customer feedback?
What's the ratio of inference cost to meaningful output? Is it improving?

This is the one most organizations are behind on and will feel the consequences of soonest. If your agents are producing generic, pattern-inconsistent output, it's usually because they don't have access to the context your senior engineers carry in their heads. Fixing that means documenting coding standards, testing expectations, and architectural patterns in a form models can actually use. It entails building a process for capturing decisions and the reasoning behind them (not just what was decided, but why). For this work to happen, leadership must assign ownership to senior engineers, rather than waiting for it to happen organically.

At the risk of stating the obvious, running models at scale across an engineering organization gets expensive fast. Leaders who aren't already tracking inference spend per engineer, not to mention thinking about what efficient usage actually looks like, are likely to face an uncomfortable budget conversation in the next planning cycle. You can ahead of it by building cost-awareness into how AI usage is discussed and measured on the team.

Letting engineers explore different tools and workflows is how organizations learn what actually works. But there's a point at which the variation becomes its own inefficiency. Inconsistent patterns, duplicated effort, and agents operating without shared context are all significant inefficiencies that arise when you let everybody play in the sandbox at once. You should be moving from experimental to intentional: keeping what's working, discarding what isn't, and building the shared infrastructure that lets the whole team benefit.

The vibe coding narrative is seductive for a simple reason: it focuses on inputs. Look how easy it is to generate code. Look how fast a prototype comes together. Look how much a single engineer can produce in an afternoon with the right model. These things matter. At the same time, we have to recognize that they're measurements of what goes in, not what comes out. What comes out is still software that has to run reliably, scale under pressure, serve real customers, and hold up against competitors who are moving just as fast as you are.

The tools have gotten better; the execution layer has gotten faster and cheaper. But the human judgment required to build systems that actually work at scale, over time, in intensely competitive markets, is as important as ever. Given the volume of output that human judgement now has to oversee, you might even say human judgment is more valuable than ever.

The question for engineering leaders isn’t, “How many engineers do we need?” It’s, “What kind of engineering culture do we want to build?” Do you want an engineering culture that uses AI to do the same stuff, only faster and cheaper? Or do you want engineers who use AI to do things that weren’t previously possible?

“You can't vibe code scale”: What the AI hype gets wrong about software engineering

What vibe coding gets wrong (and right)

The competitive fallacy

What does AI actually change about senior engineering roles?

What should engineering leaders do differently?

Raise your expectations, explicitly

Resist the urge to measure ROI through headcount reduction

Start codifying institutional knowledge now

Watch inference costs before they become a budget crisis

Push back on pressure to standardize too early — but know when to stop experimenting

The question worth asking

Add to the discussion