one year on
Cognition unveils Devin, 'the first AI software engineer' — autonomous coding, debugging, and deployment in a sandboxed environment
The startup shows a system that plans, codes, debugs, and deploys software on its own, scoring 13.86% on SWE-bench unassisted — far above the prior 1.96%
Cognition, an applied AI lab focused on reasoning, today introduced Devin, which it calls the first AI software engineer. Devin can autonomously plan and execute complex engineering tasks involving thousands of decisions, using a shell, code editor, and browser inside a sandboxed compute environment. It reports progress in real time, accepts feedback, and can fix mistakes.
The company evaluated Devin on SWE-bench, a benchmark that tasks agents with resolving real GitHub issues from projects like Django and scikit-learn. Devin was evaluated on a random 25% subset of the dataset. Devin correctly resolved 13.86% of issues end-to-end without assistance, far surpassing the previous state-of-the-art of 1.96% (which was achieved with assistance). Cognition says Devin can learn unfamiliar technologies, build and deploy apps, find and fix bugs, fine-tune AI models, and even complete real Upwork jobs.
Devin is currently in early access as Cognition ramps up capacity, and users can join the waitlist. Cognition says it is well funded, including a $21 million Series A led by Founders Fund, and says it has support from Patrick and John Collison, Elad Gil, Sarah Guo, and others.
The record
Cognition CEO Scott Wu described Devin as a 'tireless, skilled teammate' that can build alongside humans or independently complete tasks for review.
One year later — open only if you can handle spoilers
Within weeks, developers produced debunk videos showing Devin failing at tasks it had handled in demos, and criticism mounted that the benchmark comparisons were misleading. The incident became a touchstone in the debate over AI hype versus real capability in software engineering.