The news, 365 days behind — on purpose Delayed live · replaying 2025

One Year Ago.AI

Remember how fast this is.

11NOV2024replayed
one year on
researchOpenAI · Safe Superintelligence Inc. · Microsoft

AI Scaling Laws Show Diminishing Returns, Labs Turn to Test-Time Compute

Reports from several AI investors, founders, and CEOs indicate that simply adding more data and compute during pre-training is no longer yielding proportional gains, pushing researchers toward new approaches like test-time compute.

The AI industry’s long-held assumption that bigger models and more data inevitably yield better results is facing a reality check. Reports from multiple labs in recent weeks confirm that pre-training scaling laws are showing diminishing returns. Ilya Sutskever, co-founder of OpenAI and Safe Superintelligence Inc., told Reuters that everyone is looking for the next thing to scale their AI models.

The shift comes as industry leaders publicly acknowledge the limitations of the current paradigm. At Microsoft Ignite, Satya Nadella declared ‘we are seeing the emergence of a new scaling law,’ pointing to test-time compute — the technique behind OpenAI’s o1 model, which allows models to ‘think’ longer before answering. This method, which scales compute at inference rather than during training, is emerging as the leading candidate to drive future improvements.

While some caution that the old scaling laws were never guaranteed, others see this as a natural evolution. The ways AI labs try to advance their models for the next five years likely won’t resemble the last five.

I
Ilya Sutskever

Sutskever told Reuters that everyone is looking for the next thing to scale their AI models.

S
Satya Nadella

Nadella said at Microsoft Ignite that we are seeing the emergence of a new scaling law, referring to test-time compute.

A
Anjney Midha

Midha said we are now in the second era of scaling laws, which is test-time scaling.

One year later — open only if you can handle spoilers

Test-time compute did become a major focus, with OpenAI's o1 and o3 models and DeepSeek's R1 showing strong results. However, the pre-training scaling debate continued, with some labs later reporting renewed gains from larger clusters and novel architectures.

Replay thisPost on XRedditHNLinkedIn