Welcome to the “find out” stage of AI

When I went to the first HumanX conference in January 2025, agents were vaguely-defined frontier tech. It was the first time I heard the letters MCP. The big conversations were around inference, hallucinations, and retrieval augmented generation. The tech felt new—Tomasz Tunguz of Theory Ventures called it the “bottom of the first inning.” Every company was running AI experiments all the time.

Since then, companies have played a few innings in the AI game. As Anish Agarwal, CEO at Traversal told me, “More companies have gone through a renewal cycle with customers. They've understood what it takes to actually win a contract.” LLMs no longer run raw call and response games in company chatbots. We’ve attached tooling, implemented automation, attached evals, and formalized these as agents, usually with the word “claw” in the name. They—and their customers—need to justify the ballooning token spend with real results.

I started saying that we’re in the “find out” stage of AI, meaning we’re past the experimental phase of AI and entering into one where they need to work and provide real value. HumanX validated that notion, with almost everybody who gave comments referring to “an inflection point”, “a second phase of AI”, and “the conversation shifting.” Here’s a few places that the conversation is shifting to.

In the early days of AI, much of the chatter I heard was about all the cool new things AI could do. There was a lot of talk about emergent behavior—things like getting an AI to guess a movie based on emojis or draw a unicorn. It was a source of wonder and surprise, cool tech that wowed the folks exploring it.

The promise of AI grew and large enterprises started figuring out how to implement AI features into their software and business processes. Enterprises, traditionally, are where wonder and surprise lead to lost customers and lawsuits. Sectors like healthcare, law, and energy have real consequences for errors. “In these environments, mistakes aren’t just technical—they can be fatal,” said Radha Basu, CEO and Founder of iMerit. “That changes the mindset entirely. It forces a more careful, purposeful approach to how we build and deploy these systems.”

For a couple of years, AI has been a story of better and better models trained on more and more data. But as Ravindra Mistri, founding operator at Better Auth, said, “The next phase of AI adoption won’t be limited by model performance—it will be limited by trust.” As HumanX CEO Stefan Weitz said in his opening keynote, “Without trust, all we're doing is building a high-tech house of cards and hoping no one coughs too hard.”

To get that trust from your AI, you need it to be reliable. “Model intelligence has been advancing rapidly, but reliability hasn’t kept up,” said Dan Klein, co-founder and CTO at Scaled Cognition. “You need to hit a high bar on reliability to deploy these systems confidently. You can’t ship a system that’s making up policies as it goes or lying to you about your account balance.”

A lot of this shift can be attributed to how AI is being used now. With chatbots, you could call BS on their output. In the agentic paradigm, those ponies run until the race is finished. They autonomously break a problem down into multiple steps, call a bunch of tools to achieve an outcome, and hopefully do all this without deleting your database or inventing information out of whole stochastic cloth. As Basu said, “AI is becoming less about static answers and more about taking the right action in complex, ambiguous environments. That shift demands accountability, judgment, and a culture that values questioning the model.”

As for how folks are thinking about solving the trust and reliability issues, conversations fell into a few different buckets:

Is it true? - The hallucination problem is still prevalent, despite everyone running RAGs left, right, and center. New solutions for ensuring agents have true information include better context, agentic memory, and other inference-time data access solutions.
Should the agent do this? - A number of people and organizations looked at trust from an identity and user access paradigm. That included tying agentic actions to a human user, just-in-time and ephemeral auth controls, and zero-trust permissioning systems. The context problem creates a new issue in this domain—with all that data an agent has, who’s to say they are leaking it?
Can I prove and audit it? - Trust but verify at scale. Lots of folks are trying to build agentic trust with visibility and data. Observability companies were all around, as were AI SRE companies. But this is also a conversation about activity trails, automated and human-in-the-loop evals, and traceability.

At a conference like this, obviously there will be a lot of people trying to sell you their products. Most of them are in the AI technology space in some way, and are both providers and consumers of AI. I saw plenty of returning logos on the floor, and as the opening quote about renewal cycles indicates, people are starting to eye the tech with a business lens. That is, how do I make more money and spend less money? “Every single person I talked to was thinking about how to change their monetization model, how to monetize AI products,” said Cosmo Wolf, CTO of Metronome. “No one's figured it out yet.”

I heard from plenty of folks that token spend is the new cloud compute bill. Corey Quinn’s genie joke needs a fifth rule: you can’t spend it on AI tokens. Grizzled DevOps engineers used to tell war stories about blowing six figures in a weekend over misconfigured SQS, but recently plenty of folks have started seeing their token spend ramping up as usage explodes. This comes as per-token pricing has dropped about 200x in under three years, open-source and small models perform very well, and competition is fierce.

So what’s the rub? There are a few things at play. The trust and reliability issues mean people are stuffing more and more into context windows. Agents and agentic skills rely a lot on a clever system prompt and context window (plus tools and other harness functions). While input tokens are generally cheaper than output tokens, these can add up—someone I talked to mentioned $1 in context per agent per session. For large enterprises with lots of AI-assisted engineers or customer-facing agents, this cost can add up quickly. Context windows are limited, so if you need to change something, that’s a pile more tokens to send.

The agentic paradigm also burns more tokens than the old way of prompt and response chatbots. They break a problem down into steps, call tools and receive responses from them, and run evals and loops. Some agents run tasks overnight, chewing up delicious tokens on their complex (and often opaque) thought processes.

Where costs start multiplying more and more is when you have multiple agents working together, so-called agent swarms. Miranda Nash, Group VP at Oracle AI, talked a lot about multiple agents working alongside people in her Future of Work presentation. This future is already here in some places (not just Gastown) and spending tokens like a kid in Chuck E. Cheese.

While some folks are saying that coding agents have made code essentially free (it hasn’t, ask around), reviewing and running code has grown decidedly more expensive. There’s increased load on code review, security, and running it in production. This seems like a place where organizations are looking to tooling to help. “There’s a growing gap between how fast teams can generate and ship code and how well they can operate it once it’s in production,” said Spiros Xanthos, founder and CEO at Resolve AI. “Should they build, buy, or wait and see? These aren’t new questions, but AI is amplifying them to a point where it’s harder to wait and costlier to make a wrong decision.”

As for monetization and profitability, nobody’s quite got an answer there. Even the big dogs of Anthropic and OpenAI don’t expect to be profitable until 2028 and 2030, respectively.

Besides the concerns about implementing and monetizing AI applications, there was a fair bit of chatter about the social effects of AI. Many folks felt like, despite or because of how powerful the tech was, the world outside of the tech industry could face some harsh changes. A lot of this is speculation based on headlines, mind you, because AI in its current form has only been around for a little over three years, and the effects have yet to shake out and be studied. Heck, we’re still coming to grips with how social media affects us, and that’s been over 20 years.

Most of this talk was off-the-record, casually dropped during happy hours. But Dr. Danielle Schlosser, co-founder and chief business officer, at mpathic, went into greater detail:

“The technical capabilities are accelerating quickly, but our frameworks for evaluating impact—especially on people—are still catching up. Much of today’s AI is optimized around human preference signals—what people like in the moment—rather than what actually supports long-term well-being. Optimizing for engagement or validation can lead to unintended consequences, like reinforcing bias or reducing critical thinking.”

These concerns were fresh in my mind after researching some of the psychologically distorting and disempowering aspects of AI. I will admit I brought it up plenty in conversation, mostly to see whether people in this industry were looking at this issue. Fortunately, I wasn’t the first person to mention this to them; most people had heard about this and were aware. And most were hopeful that being forewarned made us forearmed.

There was some concern for the economic effects of AI, but less so. Former Vice President Al Gore talked about the need to prepare for possible disruption and retraining now before we get caught out. While some agree that AI would lead to job losses (and all the layoffs blamed on AI bear that out), folks at the conference agreed that AI would enhance human abilities, not supplant them. For other markets, though, numbers will certainly go up. I asked some leaders in the hardware space about power and hardware prices, and those will likely continue to rise as demand for data centers does.

Others worried about the software industry and their place in it as a whole thanks to the models of one single company: Anthropic. “Some people are very excited by it and the implications it has for them in terms of improving their own product,” said Agarwal. “Some people are really nervous as to how they're gonna get eaten up.” This is even before they announced Mythos and the security issues it found in nearly every piece of software underlying the Internet.

Most people, though, acknowledged that human agency and thriving was important (despite some of the spicy taglines on the booths). The world is growing increasingly aware of the power and danger of AI, and the industry seems to be taking that to heart. “There’s real momentum and thoughtfulness emerging around responsible AI, but we still have work to do to move from theory to practice,” said Dr. Schlosser. “To me, human-centered AI can’t just mean aligning to preferences—it has to mean evaluating impact over time, understanding psychological effects, and ultimately asking: does this technology leave people better off?”

If you’re feeling like you’re struggling to keep up, you’re not alone. Everyone at the conference had a sense that they were playing catch up to where the industry is moving. The tech stack is shifting in real time, models still catch industries off guard, and agents gain new skills daily. Context and multi-agent orchestration are bubbling up as big issues, but there’s so much else whispered in the wind—world models, chips built for inference, science-focused models, and more.

Ultimately, AI and tech are part of the business world and are conforming to business concerns. How do we get new customers, keep existing customers, and do so while not scaling ourselves into bankruptcy? It’s still the wild west out there, but trains are moving through and the cowboys are working for P.T. Barnum. That is, the real work is in the infrastructure moving product around safely, and those who seem like they’re outside the law end up working for a bigger show.

Welcome to the “find out” stage of AI

The dream machine grows up

AI’m a business, man

Anxiety among the inference class

Finding out fast

Add to the discussion