Agents here, there and nowhere
We’ve all heard the relentless hype around AI agents. They will be so potent they will come for our jobs, our software, our workflows. Agentic workforces will force an all-encompassing overhaul of how we do business. Wall Street has bought into this vision so completely, it is starting to question the value of many SaaS stocks.
At this point, it's clear that AI is highly proficient at tasks like coding, even if our current approaches to automated coding could be flawed. But that’s not the real issue. The harder question is whether enterprises can overcome the inherent complexity of making such a sweeping change inside large organizations.
AI agents might indeed be coming, but getting real value will require coordinated, organization-wide changes, and as we’ve seen with past technological shifts, large enterprises often struggle to pull all the pieces together.
For agents to work right, we will need clean data, rock-solid security, clear observability, strong guardrails and smooth integrations with other tools. We will also need humans writing crystal clear prompts, and the agents they produce communicating with one another as they interact and move across systems. While the Agent2Agent and MCP protocols are supposed to solve the communication problems, many are what my former boldstart colleague Ed Sim called the hard last-mile problems. These won't be resolved easily, no matter how much hype the industry throws at us.
My agentic adventure
As I often do in this space, I want to tell a personal story about working with this technology. I realize I'm just one person, but if I have trouble building agents within my narrow use case, it speaks to the difficulties of scaling inside a large company.
I spent a couple of frustrating days trying to get Claude to build a fairly modest agent, one I've built before with other tools, so it was a good test. Each morning the agent needs to deliver me the latest tech headlines to my Gmail inbox. It involves building a scheduler, integrating with Gmail, searching for headlines and understanding what constitutes news. To simplify matters, I asked the Claude chatbot to write me the prompt to use with Claude Cowork, the agent builder. It still took a lot of frustrating back and forth before it finally built what I wanted.

Even then, I found that it was concentrating too much on one publication, or wasn't casting its search widely enough. I had to tell it not to use more than two headlines from any single publication, and to cast a wider net by pulling in additional outlets it wasn't including. This kind of tweaking continued, and I still don't have quite what I want.
Now take this small example, and imagine you are trying to build an agent for a complex enterprise workflow. The problems you are likely to encounter just describing the set of tasks are going to multiply, and it will grow increasingly difficult to get the agent doing exactly what you envision. There will be some off-the-shelf tools like coding agents, but the more customization you need, the harder it's going to be to build an agent that can smoothly complete your tasks.
The multi-agent conundrum
Now imagine you are trying to get 10 agents working in tandem, all within an organization mired in technical debt with messy data spread out across numerous systems where you have to build a standard way of securing and governing these tools. The challenges just get bigger than my simple experiment, which is to say the hype is great, but the reality is clearly another matter.
Let's say you get a complex agent doing what you want, the next challenge is getting it to work with other agents, and figuring out what to do when the workflow breaks or throws a curve, something humans inherently understand. We adapt and build work-arounds. Will an agent smoothly fix a problem, kick it out to a human, go into a loop (costing valuable tokens) or stop working altogether?

Then think about a project involving humans. Jeff Bezos popularized the “two‑pizza rule” at Amazon, meaning a team should be small enough to be fed with just two pizzas. He recognized that once a team grows beyond a certain size, bureaucracy creeps in and the group becomes less efficient and less effective.
It stands to reason that the two-pizza rule could apply to agents too, even though they don't eat, sleep or stop working. Research suggests that the more agents you add, the worse they perform. A recent study found multi‑agent groups actually did worse than a single agent as coordination overhead mounted, mirroring what happens when human teams get too large.
None of this is to say you should just throw up your hands and give up, but you should be aware that the hype doesn't necessarily match the reality yet. In this case, the emperor may actually have clothes, but you’re going to have to sew them yourself and make sure they fit correctly.
~Ron