Sam and Dario's not-so-excellent AI adventure

Silhouette of a person standing before a large wall of glowing digital data streams.
Featured image by Chris Yang on Unsplash.

As the drama unfolded last week over OpenAI’s sudden partnership with the DoD, the department was also threatening to treat Anthropic as a supply chain risk (a designation that has since been made official). One of the key sticking points was whether these models should be used to conduct mass surveillance of U.S. citizens.

Chances are that neither of these companies’ chief executives is truly looking out for the best interests of the masses, but the debate itself speaks to the actual capabilities of these models and what's really possible.

I'll start with a research story that shows how difficult it is to use these tools to find specific information across multiple sources when there is no clear source of truth. I was trying to compile a list of all the panels and firesides I've done in my career, and there have been quite a few. This exercise pushed the AI models to their limits trying to build a list from agendas, preview stories, event articles and other sources.

No matter which model I used, whether Claude, ChatGPT or Gemini, they all failed to find many of the events I participated in, and worse, made up interviews or conflated article interviews with on-stage ones. Even when I tried to correct them, or add additional information, they universally did a poor job of compiling a reasonable list.

urveillanceview of people walking across a plaza with targeting overlays tracking individuals.
Featured image by Resource Database for Unsplash+

I tell this story because mass surveillance requires analyzing enormous amounts of data to answer questions or identify actions by individuals and groups. While the government clearly has far more powerful computers than I do, along with access to many other data sources like satellites and location data, the underlying limitation remains the same: these models still struggle to answer questions accurately across fragmented sources. I had to continually point out things I remembered as I went through the exercise, and if I'm feeding the model the information I hoped it would find for me, what is the point of relying on the model?

None of this means the current generation of AI couldn't be misused or prove dangerous in the wrong hands. But we should be honest about what these systems can actually do because it's entirely possible the DoD is reacting to hype more than to the real capabilities of the current technology.

Revving up the hype machine

If you pay attention to the statements of OpenAI CEO Sam Altman and Anthropic CEO Dario Amodei, you'll notice a pattern of exaggeration about what their models can do, along with plenty of scary rhetoric about what they might become capable of. But based on how these models actually perform today, these lofty statements are more about marketing than reality.

Consider these statements from the pair of CEOs:

  • In an essay called Reflections published last year, Altman wrote: "We are now confident we know how to build AGI as we have traditionally understood it." Even though there is no universal definition of Artificial General Intelligence, when we hear the term, the implication is human-like intelligence, however you define that. Well-respected AI researchers like Andrew Ng say it could be years before anyone actually achieves it, yet Altman's comments suggest it could be imminent.
  • In an interview last year, reported by Axios, Amodei predicted that up to half of white-collar jobs could be wiped out by AI over the next five years. It certainly sounds scary, but given my experience with this technology, a timeline that fast feels, let's just say, very unlikely.

A common example of a supposedly-doomed job is that of analyst, but a good analyst is talking to companies and customers every single day, compiling data and building a broad understanding of a market. All AI can do is peruse the internet for information that exists already. It can't access any new information because it's not capable of actually researching, just regurgitating.

Keeping humans involved in AI decision-making

Perhaps even more frightening, given the limitations of the current generation of AI, is the idea of handing over the keys to bombing missions to these models. Besides mass surveillance, Anthropic also specifically objected to launching any automated attacks without human control. They are correct, of course. The possibility of things going sideways is far too great to trust something as high stakes as a bombing mission to a large language model in its current state.

MIT professor Bryan Reimer, co-author of the book 'How To Make AI Useful,' says nobody should be entrusting this software with anything critical right now. "We need to be careful how we deploy AI in safety-critical situations such as defense, healthcare, transport, etc.," Reimer told FastForward. 

Reimer is pro-AI, but he believes it requires a human being involved in any meaningful decisions. "Given the well-discussed limitations, there are areas where processing large amounts of data makes sense (e.g., looking for patterns in health care data), but until we can prove the reliability of these systems, human oversight as a co-pilot will be critical."

While AI clearly has real utility, it's not ready to operate alone in high-stakes situations. The real issue is the gap between CEO hype and the flawed reality many of us encounter on a regular basis using these tools. Until that gap closes, humans need to remain in the loop for decisions where the outcome actually matters.

~Ron

Read more