OpenAI launches GPT-4o mini, slashing prices and replacing GPT-3.5 Turbo

The new small model costs just $0.15 per million input tokens and $0.60 per million output tokens, and OpenAI says it outperforms industry-leading small models on reasoning tasks.

OpenAI today released GPT-4o mini, a small AI model that the company claims outperforms industry-leading small models on reasoning while being dramatically cheaper. Priced at $0.15 per million input tokens and $0.60 per million output tokens, GPT-4o mini is more than 60% cheaper than GPT-3.5 Turbo, which it immediately replaces as OpenAI’s smallest offering. The model scores 82% on MMLU benchmark, compared to 79% for Google’s Gemini 1.5 Flash and 75% for Anthropic’s Claude 3 Haiku.

The model supports text and vision in the API today, with video and audio capabilities promised later. It features a 128,000-token context window and a knowledge cutoff of October 2023. OpenAI’s Olivier Godement told TechCrunch that the aggressive pricing is part of a push to make AI accessible globally. Developers and ChatGPT users get access immediately, with enterprise access rolling out next week.

The announcement comes alongside new Enterprise Compliance API tools for auditing ChatGPT Enterprise interactions, including time-stamped conversations and uploaded files.

The record

The room reactsas it happened

Olivier Godement

OpenAI's head of Product API said that to empower every corner of the world with AI, models must be made much more affordable, and GPT-4o mini is a big step in that direction.

George Cameron

Co-Founder at Artificial Analysis said GPT-4o mini is very fast, with a median output speed of 202 tokens per second, more than 2X faster than GPT-4o and GPT-3.5 Turbo, and compelling for speed-dependent use cases.

One year later — open only if you can handle spoilers

GPT-4o mini became OpenAI's most popular API model within months, cementing the trend toward cheaper, smaller models for high-volume tasks. Rivals scrambled to match its pricing, but OpenAI's early lead in cost efficiency forced a market-wide repricing of inference.

Replay thisPost on X Reddit HN LinkedIn