On June 10, OpenAI slashed the record worth of its flagship reasoning mannequin, o3, by roughly 80% from $10 per million enter tokens and $40 per million output tokens to $2 and $8, respectively. API resellers reacted instantly: Cursor now counts one o3 request the identical as a GPT-4o name, and Windsurf lowered the “o3-reasoning” tier to a single credit score as nicely. For Cursor customers, that’s a ten-fold value minimize in a single day.
Latency improved in parallel. OpenAI hasn’t printed new latency metrics; third-party dashboards nonetheless see time to first token (TTFT) within the 15s to 20s vary for lengthy prompts. Because of contemporary Nvidia GB200 clusters and a revamped scheduler that shards lengthy prompts throughout extra GPUs, o3 feels snappier in actual use. o3 continues to be slower than light-weight fashions, however not coffee-break sluggish.
Claude 4 is quick but sloppy
A lot of the neighborhood’s oxygen has gone to Claude 4. It’s undeniably fast, and its 200k context window feels luxurious. But, in day-to-day coding, I, together with many Reddit and Discord posters, hold tripping over Claude’s motion bias: It fortunately invents stubbed capabilities as an alternative of actual implementations, fakes unit exams, or rewrites mocks that have been instructed to go away alone. The velocity is nice; the follow-through usually isn’t.