Groq's LPU demos go viral, drawing attention for speed

A chip startup's lightning-fast demos go viral, drawing attention to its LPU as an inference engine.

A chip startup called Groq has seized the AI conversation this weekend with viral demos of its Language Processing Unit (LPU) running large language models at lightning speed. The company’s website lets anyone test the speed firsthand, and third-party benchmark results show Groq generating 247 tokens per second versus Microsoft’s 18 tokens per second, according to Artificial Analysis. Founder and CEO Jonathon Ross, who previously co-founded Google’s AI chip division, claims the LPU eliminates two major bottlenecks: compute density and memory bandwidth.

The name collision has not gone unnoticed. Ross’s company Groq, which Ross says was first in 2016, is frequently confused with Elon Musk’s xAI chatbot Grok, with Ross welcoming Musk via a blog post “Welcome to Groq’s Galaxy, Elon” emphasizing his trademark priority. While Nvidia’s GPUs remain the industry standard, Groq’s demos suggest a viable alternative for inference workloads. The company is positioning itself as an inference engine, not a chatbot, aiming to accelerate existing models rather than build its own. Whether the LPU can scale beyond viral demos remains unproven, but Groq’s increased chip speeds could jumpstart the AI world.

The record

Gizmodo: Meet Groq, the AI Chip That Leaves Elon Musk's Grok in the Dust

One year later — open only if you can handle spoilers

Groq's LPU performance proved real and sustainable, but the company never captured the mass market it envisioned; by mid-2026, it remains a niche inference provider while Nvidia's GPU dominance held, though the speed discourse permanently shifted AI product expectations.

Replay thisPost on X Reddit HN LinkedIn