Annual Research — 2024

From Pilot Theater to P&L Impact

McKinsey and Deloitte's 2024 outlooks both ratified what operators had already learned the hard way: the AI dividend was real, but it lived inside the operating model — not the model.

Q4 2024·11 min read·2 primary sources
$300B+

Hyperscaler AI infrastructure capex commitments for 2025

65%

Of organizations regularly using GenAI in at least one function

2-3×

Productivity delta between AI-mature and AI-curious teams

01What changed between 2023 and 2024

The 2024 reports read differently from the 2023 ones in tone. The breathlessness was gone. The 'will this work?' question had been replaced by 'who's already doing it well, and what's the gap?'

McKinsey's 2024 outlook tracked the same broad arenas as 2023 — applied AI, generative AI, future of mobility, electrification, advanced connectivity, immersive reality, future of bioengineering, future of space, sustainable energy, climate technologies, quantum, and more — but added a layer of investment and adoption data that hadn't been mature enough to publish a year earlier. The headline: GenAI adoption nearly doubled in twelve months. The undercurrent: value capture was still concentrated in a small number of operators.

Deloitte's TMT Outlook focused on the supply side of the AI economy: the data center build-out, the energy demand, the semiconductor cycle, and the talent shortage. Their report was, in effect, a warning that the infrastructure layer was about to consume an extraordinary amount of capital — and the buyers had to be operationally ready to extract value from it or risk financing a curve they couldn't ride.

02Where the value actually showed up

By mid-2024 there was enough public data to draw a real conclusion: the productivity delta between AI-mature teams and AI-curious teams had become measurable. McKinsey put it at roughly 2-3x in the functions where GenAI was applied with discipline — software engineering, customer operations, sales productivity, and content workflows.

The differentiator was almost never the model choice. It was the operating model around the model: clean prompts as a documented asset, evaluation harnesses run weekly, prompt-and-output review as a defined role, and feedback loops that fed back into the underlying workflow.

"By 2024 the AI question stopped being 'which model.' It became 'who owns the prompt library, and when do we evaluate it.' That's an operating-model question, not a technology one."

03The infrastructure paradox

Deloitte's report surfaced a tension that defined the second half of 2024: AI infrastructure spend was being pulled forward at a pace that assumed near-universal enterprise adoption. But adoption was still concentrated in a relatively small set of high-maturity operators.

For mid-market companies, this created a strange opportunity. The cost of a model API call dropped roughly 80% over the year. The cost of being operationally ready to use one fell to zero — for anyone willing to do the workflow design work first. The 'AI gap' wasn't a budget gap anymore. It was an operational-design gap.

How this maps to the work

Our 2024 work shifted decisively toward production-grade integration. The pilot conversations were over. Clients who had spent 2023 testing were now asking: 'how do we make this part of how the team runs?'

The pattern was: we'd take a workflow that had reached pilot in 2023, rebuild it as a production system with monitoring, ownership, and an evaluation cadence, and then layer in a second and third workflow on the same operating muscle. By the second engagement most clients didn't need us for the third — which is the point.

Four engagements we run against this thesis.

None of these require a multi-year transformation. Each is scoped to land specific operating-model improvements with a measurable result.

01

Pilot-to-production conversion

We take the AI work that's stuck at 'cool demo' and convert it into production: real ownership, real monitoring, real evaluation, real plug into the team's daily work. Most of the engineering effort is, predictably, not the model.

02

Prompt and workflow asset libraries

We treat the prompts, the evaluation sets, and the workflow definitions as company assets — versioned, owned, reviewed. This is what separates the 2-3x teams from the curious ones.

03

Operating cadence around AI

We install a weekly or bi-weekly evaluation rhythm so the team knows whether the AI version is still better than the previous version. Without it, regression is invisible until it's catastrophic.

04

Second-and-third workflow patterning

The first AI workflow is expensive. The second should be 40% of that. The third should be 20%. We design the first one so the next two come close to free — and we leave the team able to do the fourth without us.

If this maps to what you're carrying — let's talk.

Most engagements start with a 30-minute conversation about the specific operating-model question on your desk this quarter.