Back to News

The Deployment Era of AI: Why the Next Bottleneck Is Compute, Cloud, and Managing Digital Workers

TechnologyBusinessEconomy

From Building Models to Deploying Them

The last two years of AI spending went into the buildout: bigger, better models and the hundreds of billions of dollars in infrastructure needed to train them. That phase is now giving way to a different question. The models are already capable enough to do remarkable work. What matters now is whether a company can deploy them correctly.

There are two kinds of infrastructure at play, and people tend to conflate them. One was designed for training models. The other exists for deploying those models, or the AI agents built on top of them, so they can actually finish a job. The industry is transitioning into that second stage right now. The models can do amazing things once you put them to work in the right way.

The Bottleneck Keeps Moving

For an infrastructure company that needs enough CPUs, memory, hard disks, and to a lesser degree GPUs, the constraint has kept shifting. It started with GPUs. Then it moved to memory. Now the squeeze is on the CPU side, mostly CPUs rather than GPUs. Every company is scrambling, often in creative ways, to secure these resources so it can keep deploying and training. This is the backdrop behind the recent scramble: Meta's moves, reports that Apple has been talking to the administration about finding cheaper memory alternatives in China, and the news around OpenAI. These are all companies operating in the business of the bottlenecks, trying to solve the same supply problem from different angles.

Who Actually Wins: Labs or Hyperscalers?

There is a useful way to frame the split between AI labs and hyperscalers. The labs are the car makers. The hyperscalers are the toll booths. Picture the car makers downgrading from a Ferrari to a 12-year-old Honda while still driving just as many miles as before. The toll booths keep collecting money regardless of what car passes through, because they do not care what you drive. As the labs push price cuts and cheaper models, this framing suggests the hyperscalers are the ones who capture the value from all that token optimization.

My answer to that thesis is yes and no, because there is more depth to it. The hyperscalers have been growing extremely well and will keep growing. But the deployment of these models opens room for entirely new types of cloud providers. I believe those net-new providers will grow faster than the hyperscalers can, precisely because the hyperscalers were built for a previous era. They will benefit to a degree. The faster growth, though, will come from cloud providers built for this moment, sitting outside the foundational model companies rather than being those companies themselves.

Agents as Digital Humans

Where are companies succeeding, and where are they failing? It is hard to generalize across every company, but the ones that do well share a pattern. First, they set expectations and understand what the agent can actually do. Second, they build the infrastructure to run these agents.

The mental model that works is treating agents as digital humans. Like a human, an agent needs the same kind of support to function: computers, accounts, guardrails. Think about a new employee arriving at the office. Whatever tools you have to hand them to get the job done are the same tools you have to give an AI agent. Provide those with today's models, and the amount of work an agent can complete end-to-end will surprise you. Give it a computer, a Windows machine, access to the same tools, and a login, and its success rate on full tasks is higher than most people expect.

How Enterprises Should Size Their AI Use

Deciding how much AI an organization needs falls to the enterprise and its own employees, and it is a genuinely individual decision. Some enterprises have already blown through their AI budgets in the span of three or four months. AI is replacing parts of jobs and making workloads more efficient at the same time, which is good news, but it forces a real question about scale.

The method I would use, and the one we use internally, starts with efficiency. Identify the problems an AI agent can successfully handle and that carry high value to the business. Push your token budget into that area first. Once you have proven success there, move the budget to the next problem. There is no universal prescription for what each enterprise should do, but there is a great deal that agents can already handle end-to-end today.

Oversight, Permissions, and Security

A real first-year employee shows up, gets a laptop, signs in with a brand-new email, and operates under oversight: a boss, a manager, restrictions on the machine, an auditing process. How does that oversight and security model translate to a digital human, and are companies actually prepared for the difference between a person holding a laptop and a computer holding one?

My view is that it is essentially the same. An agent has a boss too, and that boss is the person who employed the agent. The single most valuable skill a person can develop right now is learning to manage their own agents the way they would manage employees. The picture people imagine is the correct one. The agent arrives, gets its computer, its login, and its email, all connected back to your central active directory. From there you can see and set its permissions, defining what it can and cannot do.

It works like bringing in a new hire on day one. You extend the same limited trust, because you do not fully trust a new employee yet, which is exactly why the guardrails, firewalls, security controls, and audit logs exist. All of those mechanisms already exist today, and they apply just as well to agents as they do to humans.

Comments