AI made everyone faster. Your company is still exactly as slow.

What research studies actually say: task gains die at three gates before they reach the P&L. Here's the test for yours.

Jun 11, 2026

Mark Zuckerberg may have found the perfect AI story for Meta, and the worst possible AI story for everyone else.

If you only have 5 minutes:

“AI makes people faster, so you need fewer people” is a category mistake. Task speed and company productivity are different things — the gap between them is where most AI initiatives die. Meta might be the exception. You are probably not Meta.
I pulled tons of research pieces apart; the strongest pattern is a warning: most “X% more productive” claims are task numbers sold as company numbers.
A task gain becomes a company gain only if it passes three gates: it sits on a blocked value path, it survives judgment, and it survives incentives. A 50% gain times zero is still zero.
The cleanest documented win is boring: eBay attached machine translation to one blocked value path and cross-border trade rose 10.9%. No transformation program.
Gen AI enhanced tutoring studies show the differences in studies: one study on the same model produces three outcomes, all done well. But one key: In all studies it showed (implicitly) that the redesign (workflow!) was the entire difference.

The story goes: AI makes everyone more productive, so you need fewer people. Same company, fewer humans. It’s clean, it’s quotable, and it’s being copied into strategy decks at companies that look nothing like Meta.

Here’s the problem. That story makes a category mistake feel like common sense. It treats “people doing tasks faster” and “the company producing more” as the same thing. They are not. And the gap between them is where most AI initiatives quietly die.

To be fair to Zuckerberg: he might be right about Meta. Maybe some teams that used to need 50 or 100 people really will need 10. Meta is a software company whose product, process, and output are all made of the same stuff, digital stuff, 0s and 1s. For a company like that, “make the humans faster” and “make the company more productive” might genuinely be close to the same sentence.

But your company is probably not Meta. Most companies have customers to serve, contracts to fulfill, assets that break, invoices to collect, a pharma plant, a logistics network, a hospital group, a bank, or just a regular company where some boring office process ends with a customer waiting for something. That company does not get more productive because people summarize documents faster. It gets more productive when the system performs better.

And that’s why the current AI conversation feels so strangely small. We keep asking how much faster each individual can go, the developer, the marketer, the support agent, the manager. Those aren’t fake questions; I use these tools every day and I’ve had the “this just saved me 80%” moments. But the real gains almost never come from sprinkling AI on an old task. They come when the task itself changes, the handoff changes, the workflow changes or the company stops doing the work at all.

If that sounds familiar, it should. Swap “AI” for “electricity” and the whole thing rhymes.

How Did Factories Get Power to Their Machines Before Electricity? - Core77 — Soo many belts!

Early factories ran on one central engine, usually one fat steam engine and power crawled through the building on shafts, belts, and pulleys. Machines were placed wherever the power could reach. When electricity arrived, most factories just swapped the central engine for an electric one. Same layout, same logic, slightly cleaner power. A nice, small improvement.

The transformation came later when companies realized, each machine could have its own motor, and they could tear out the shafts and rebuild the entire floor around the work. That redesign was expensive and slow, which is exactly why upstarts pulled ahead. They had no factory to gut, no belts to rip out, no maintenance guy who liked the old way.

Electricity didn’t transform the factory because every worker got a little more electric. It transformed the factory when companies stopped designing around the old constraint and redesigned the system. AI has to do the same.

So before I wasted six months betting on that, I went looking for what the research actually says. I pulled 59 sources apart into 145 concrete claims, grouped them into 22 patterns, and looked for where the evidence converged. What emerged was more like a framework than a clear answer: three levels of productivity that the whole public conversation keeps collapsing into one:

Task productivity: one person does the same task faster. I answer emails faster with AI. True, measurable, real.
Workflow productivity: the process changes because the task changed. My marketing manager used to listen to my whole customer calls and write our LinkedIn posts. Now I produce a draft from a few insights and she makes them excellent. The task got faster and the handoff was redesigned around it.
Company productivity: the business gets more output, revenue, or resilience from the same or fewer inputs. More sales. Lower cost. Better decisions. The thing on the P&L. There isn’t really an example, and that’s the whole point!

Everyone feels the first level. Almost nobody can point to the third. And here’s the thing: if my marketing manager and I both get faster, redesign our whole workflow, but we’re selling the wrong thing, our very elaborate “obvious productivity gain” moves the needle exactly 0%. We’re both still on payroll so the company productivity gain of AI is exactly 0%.

That’s the conversion problem.

The public AI productivity story treats task gains like they automatically add up to company gains.

But the hidden math is uglier:

This is not meant to be precise economics. It is meant to show the trap: task gains do not simply add up. They depend on where the task sits, what changes around it, and whether the gain survives the company.

A task gain becomes a company gain only if it passes three gates:

it sits on a blocked value path,
it survives judgment,
and it survives the incentives of everyone who has to carry it downstream.

A 50% task gain multiplied by zero is still zero.

That is why “AI made this task faster” is not the answer. The real question is: where does this task sit in the value path, and what has to change for the gain to survive?

The rest of this piece is about that conversion rate.

Real gains show up where value was blocked, eBay’s 10.9%

Azeem Azhar and Nathan Warren made a similar point: individual gains don’t automatically compound into firm-level ROI, because firms stay stuck in old organizing logics. Directionally right. My question is narrower: when you actually go to the research, what evidence is there that task gains convert into company results?

I didn’t want this to stay at the level of metaphor. “AI is electricity” is useful, but only up to a point. At some point, you have to ask whether the research actually supports the analogy, or whether it just sounds good in a strategy deck.

And that is kind of the point of ThDPTh. Data and AI hype comes in, and before anyone wastes six months on it, we try to find the part that is actually true.

I went through tons of research papers, broke them into 145 concrete claims, and grouped those into 22 patterns (and yes, Claude is my best buddy in doing that efficiently). Then I threw most of it out because almost none of them answered the only question that matters: did the company get more productive, or did one task just get faster? Most of the literature measures the second and quietly hopes you’ll read it as the first.

What’s left after you apply that filter is small. But it’s still load-bearing. The literature is much better at killing bad productivity claims than at giving you a clean productivity recipe: AI productivity does not usually show up because people use AI. It shows up when a specific AI capability is attached to a specific value path, and the company changes enough around it for the gain to travel.

That sounds less exciting than “AI will make everyone 30% more productive.” But it is much closer to what the research can actually support. The first thing this rules out is the usual task speed story.

Across the sources I reviewed, the strongest pattern I found was not “AI makes developers productive.” It was a warning: when a study shows one task getting faster, stop there. You do not yet know whether the company became more productive. This is how AI productivity bullshit gets manufactured. A study measures something nearby: pull requests completed, support tickets resolved, drafts written, documents summarized, tasks finished. Fair enough. Those things are measurable. Sometimes they even matter.

But then the finding starts to travel. The paper becomes a white paper. The white paper becomes a framework. The framework becomes a conference slide. And suddenly someone says, “Research shows AI makes teams 30% more productive.”

No. It usually doesn’t.

It shows that one activity got faster under specific conditions. Before you treat that as productivity, you still have to ask: did anything downstream improve?

Local speed is not the same thing as system performance.

The cleaner firm level cases look different. They do not start with “where can we make people faster?” They start with “where is value blocked?”

eBay is the cleanest example.

In Brynjolfsson, Hui, and Liu’s eBay study, neural machine translation increased cross border trade volume on the platform by about 10.9%. Nice number. But the number is not why I care.

The interesting part is where the gain showed up.

Trade rose most in the language pairs where translation quality had been worst before. The bigger the translation quality improvement, the bigger the trade response. That is what makes the finding useful. It is not just “we deployed AI and good things happened.” It is a mechanism you can point at.

Language was blocking buyer and seller matches. Translation improved. The (bottle neck) block got smaller. More trades happened.

That is not the Meta story. Nobody became generally “more productive.” eBay removed one specific friction from one specific value path.

Babina and co authors find a different version of the same idea in firm data across the US economy. AI investing firms grew faster on sales, employment, and market value. But the channel was not measured total factor productivity. It was not “same work with fewer people.” It was product innovation: broader product ranges, more new products.

Again, the mechanism matters.

In one case, AI removed a friction. In the other, AI helped firms make new things. Neither looks like “sprinkle AI across the workforce and wait for productivity.”

And that is where the research starts to rhyme with the electricity story.

The strongest general pattern I found is not “give people AI access.” It is what economists would call complements: process redesign, organizational change, skill investment, new business processes. Brynjolfsson, Rock, and Syverson use this logic in the productivity J-curve: the technology arrives first, but the productivity gains wait for companies to build the missing organizational pieces around it.

Electricity did not transform factories because workers became individually more electric. It transformed factories when companies redesigned the system around the new capability. The empirical AI literature points in the same direction.

So the answer to the section question is not:

AI productivity shows up when people use AI.

It is closer to:

AI productivity shows up when a specific AI capability is attached to a specific value path, and the company changes enough around it for the gain to travel.

That is a much smaller claim than the usual. But it is also much more useful. So if task-level gains are real, but company productivity often does not move, why don’t those gains convert?

Task gains die at three gates: path, judgment, incentives

The task gain is not the hard part. Writing all my mails with AI is the easiest thing I can do.

The hard part is getting the gain out of the task.

A worker gets faster. Great. But now that gain has to survive the company. It has to sit where value is actually blocked. It has to pass someone’s judgment. And it has to survive the incentives of everyone who touches it on the way up.

Three gates. The research shows gains dying at each one.

The first gate is the path.

AI is easiest to add to text-shaped, individual, digital tasks. Emails. Tickets. Code. Documents. Slides. Summaries. Meeting notes. So that is where it goes first, not because those tasks matter most, but because they are reachable.

Reachable is not the same as blocked.

My mails are the perfect example. AI made them faster. Nothing downstream moved, because nothing downstream was waiting on my mail speed. Now compare that to eBay: translation was the same kind of task, text in, text out, but it sat exactly where value was blocked, between buyers and sellers who wanted to trade and could not. Same task shape. Completely different position in the system.

A gain that lands off the blocked path multiplies by zero. The gain is real. It just landed where nothing was stuck.

The second gate is judgment.

AI is not self-installing into useful work. Someone has to know where it belongs. Someone has to know when to trust it. Someone has to know how to wrap it into the process. Someone has to know when the output is good enough, when it is dangerous, and when it just looks good because the text is confident.

That is a lot to ask from a worker who was just told: “Here, use AI, become more productive.”

There are really two sub-problems here.

First, workers do not always know where AI could help.

The frontier is jagged. A task that looks perfect for AI may sit outside the model’s reliable capability. Another task that looks boring may be exactly where AI shines. From the outside, they can look almost the same.

That makes adoption weird.

People may avoid AI where it would help because they do not see the use case. Or they may use it where it quietly makes the work worse because the output looks plausible.

Second, even when workers find a good task, they often do not know how to integrate the gain into the actual work.

Writing one draft faster is easy. Changing the workflow around faster drafting is harder.

Who reviews it? Who owns the quality? Does the AI update the knowledge base? Does the next person in the process get a better input or just more inputs? Does the task disappear, or did we simply add a new AI step to the old process?

Once I grouped the studies, this became hard to unsee: the strongest examples were rarely just “person plus model.”

The call center assistant worked partly because it encoded high-performer behavior. The AI tutor worked because someone had scaffolded the learning process. eBay worked because translation was embedded directly into the buyer-seller path.

“Give everyone ChatGPT” sounds like empowerment. In reality, it often turns every worker into user, judge, workflow designer, and quality gate at the same time.

Some people will be good at that. Many will not. And even the good ones may only improve their own corner of the system.

The third gate is incentives.

If you give people AI and tell them to “find use cases,” they will mostly find the places where AI makes their own work easier.

Of course they will. I do the same. Everyone does. The problem is that making my task easier is not the same thing as making the company more productive.

A developer has an incentive to write code faster. A salesperson has an incentive to write proposals faster. A manager has an incentive to summarize meetings faster. None of them is naturally optimizing for the full value path across product, legal, support, finance, customer success, operations, and whoever else has to absorb the output.

And even a real gain needs carriers. The freed hour becomes invisible slack unless someone reclaims it. The faster drafts become someone else’s inbox unless someone redesigns the handoff. Nobody in the chain is paid to make my gain travel.

So bottom-up AI adoption creates a predictable pattern: lots of local relief, lots of impressive anecdotes, and very little guarantee that the actual bottleneck moved.

And here is why nobody notices the corpses: measurement.

We measure the task because the path is hard to instrument. Pull requests are easy to count. Support tickets are easy to count. Documents summarized, drafts written, hours “saved.” The three gates are not.

So gains that die at a gate die invisibly. And gains that never mattered look real, because nobody followed them downstream.

This was one of the most irritating things I noticed in the review: the workflow layer was often right there. In a Copilot study, the development workflow has absorbed a new tool. In a tutoring study, someone has built a custom tutor and changed how the student interacts with the material. In a call center study, someone has trained a system on high-performer behavior and inserted it into the agent’s work.

That is not just “AI makes a task faster.” That is a process change. And then the study still measures the nearby task. I get why. Clean measures publish better than messy workflows. But it means even the research sometimes looks away from the exact layer where the conversion problem lives.

That is why task gains usually fail to become company productivity.

They have to pass three gates: path, judgment, incentives and nobody is measuring at the gates.

A task can get faster while the company stays exactly as slow as before.

Match the fix to the friction: surgical embed or workflow rebuild

How much of the company do you actually have to change? Less than the AI transformation people say, and more than the “just give everyone ChatGPT” people think.

Both camps are stuck on the same wrong question. “Should we redesign the company around AI?” is too big. It makes everyone either excited in a stupid way or tired in a reasonable way.

Once you accept the conversion problem, the better question is:

What kind of conversion problem do we have?

Let’s look at eBay. Or rather: at what eBay didn’t do.

No transformation program, no new operating model, no twenty-person steering committee for the future of work. They attached one capability to one visible blockage on a value path that already existed, and cross-border trade went up 10.9%.

Nobody gets invited to Davos for saying, “Have you considered translating product listings?” But the mechanism is beautiful precisely because it is boring.

Clean friction has a recognizable anatomy: it is narrow, it sits on a value path that already works, the result is verifiable at the point of use, and nobody downstream has to change how they work.

Buyers and sellers wanted to trade. Language was in the way. A trade happens or it does not.

You can imagine that same shape all over a normal company: a model that routes the right ticket to the right team, a forecast that changes inventory before the warehouse gets stupid, a classifier that catches defects before they become returns, a search system that helps a technician find the right fix instead of calling three people and ruining everyone’s afternoon.

These wins are not necessarily small.

But they are narrow.

The workflow already knows what value is. AI is not inventing a new operating model. It is removing a blockage from an existing one.

That is the first kind of conversion problem: clean friction. It comes with a precise fix: the surgical embed.

If the bottleneck is clean, don’t redesign the factory. Remove the bottleneck.

Now look at what most companies actually do with AI.

Writing a draft faster does not automatically change the work. Summarizing a meeting faster does not automatically change the decision. Writing code faster does not change the release cycle, because shipping never depended only on typing speed. It depends on review, product decisions, architecture, release cadence, each one a handoff owned by someone who just inherited 50% more incoming work.

The gain is real. It just does not sit on the value path.

Teaching is where the research caught this on camera.

Hand students a chatbot to practice with, and something ugly happens: practice scores go up, learning goes down. In one of the cleanest experiments we have, students with plain AI access did worse on the exam than students who never had AI at all. They had outsourced exactly the part that produces the learning.

Layering AI on the old workflow did not just fail to help. It made things worse.

In the same experiment, a second group got the same model wrapped in guardrails: a tutor prompted to give hints instead of answers. The harm disappeared.

And in another study, someone went all the way. They built a custom tutor around the actual course content, with real pedagogy baked in: what to teach, in what order, never just handing over the solution. Those students learned more than their classmates in the regular session, and in less time.

Same model in every arm. Three different outcomes.

Raw access: worse than nothing. Guardrails: harm contained. A rebuilt learning workflow: real gains. The more of the work somebody redesigned around the capability, the more of the gain converted.

Because the bottleneck in learning was never access to answers. Answers have been free since the textbook. The bottleneck is getting the right content in front of the right student at the right moment, while protecting the struggle that produces the learning. The chatbot removed the wrong friction. It removed the effort, and the effort was the product.

That is what makes this kind of friction expensive: inside a workflow, friction and value are entangled. Somebody has to separate them. Somebody has to decide what the tutor teaches, when it refuses to answer, how progress gets checked. A quality gate, an owner, a measurement, a decision rule. These are the “complements” from the J-curve research, translated from economics into actual work.

In the papers, that design work lives in the methods section. In the results, it is the entire difference.

The model cannot just be available. It has to be placed. Access is handing the class a chatbot. Design is the custom tutor.

That is the second kind of conversion problem: workflow friction. The blockage is not in one task. It is woven into the handoffs, the decisions, and the work itself. And some of that friction is load-bearing. Its fix is the expensive kind: the workflow rebuild, done by someone who can tell the friction from the work.

If the bottleneck is the workflow, adding AI to one task just makes the old workflow louder.

Five questions that sort every AI productivity claim

How do you read the next claim without getting fooled? Not with a grand theory of AI productivity. With something more useful: a way to sort the claims.

The research, for all its value, is lopsided. It sees tasks very well. It sometimes sees firm-level outcomes. It barely sees the workflow layer in between, where operators actually live.

The literature can warn you where not to overclaim. It cannot redesign your workflow for you. And nobody is coming with a study about your company. The measurement is on you.

Meanwhile, the claims keep coming. The hype crowd points at surgical wins. The disappointed crowd points at flat rollouts. Both may be reading real numbers. Both are skipping the same question.

Do not ask whether the number is impressive.

Ask where the gain went.

Here is the test, the level, the three gates, the result:

What level was measured: task, workflow, or company? Pull requests, tickets, drafts, task metrics, however impressive the number.
Which blockage does this sit on, and is it clean or woven in? Imagine the friction gone: does value flow without anyone else changing how they work?
Who judges the output, and how cheap is it to catch wrongness at the point of use?
Who has to carry the gain downstream, and what is their reason to do it?
Did the bottleneck move? Name the downstream result that changed not the corner that got more comfortable.
Even if the authors won’t spell it out explicitly ask, what did they change about the workflow? What did they have to do to make the study work? What do you have to do step by step if you had to repeat this from the ground up?

And if the claim is someone else’s number: would any of this survive outside their study, their platform, their company?

This works on vendor decks, research papers, and your own pilots.

Meta may be right about Meta and still be the wrong story for everyone else.

Maybe some teams really will shrink from fifty to ten. Maybe Meta is the rare company where the task layer and the company layer sit close enough together that faster humans really can mean a faster company.

Your company is probably not that company.

Three Data Point Thursday

Discussion about this post

Ready for more?