OpenAI launches Superalignment team, says the new team will have access to 20% of its compute

Chief scientist Ilya Sutskever and alignment lead Jan Leike will co-lead a new team tasked with solving the core technical challenges of steering and controlling AI that surpasses human intelligence, within four years.

OpenAI today announced the formation of a new “Superalignment” team, led by chief scientist and co-founder Ilya Sutskever and alignment team lead Jan Leike, dedicated to solving the problem of controlling superintelligent AI within four years.

In a blog post, Sutskever and Leike argue that AI exceeding human intelligence could arrive within the decade and that current alignment techniques, such as reinforcement learning from human feedback, will not scale to supervise systems much smarter than humans. “Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue,” they write.

The team will have access to 20% of the compute OpenAI has secured to date. Their approach involves building a “human-level automated alignment researcher” — using AI to help evaluate and improve other AI systems. The hope is that AI can eventually take over alignment research itself, working alongside humans to ensure successors remain aligned.

The record

One year later — open only if you can handle spoilers

Superalignment was OpenAI's marquee safety initiative, but the team dissolved ten months later following Sutskever's involvement in the board's ouster of Sam Altman and subsequent departure. Jan Leike left OpenAI in May 2024, citing safety culture concerns. The 20% compute pledge remained unfulfilled.

Replay thisPost on X Reddit HN LinkedIn