OpenAI Releases O3 Model With High Performance and High Cost

Breaking News

Episode 403: THE POLITICS OF POLIO

Google Versus xAI AI Compute Scaling

WE FOUND OUT WHAT THE DRONES ARE!! ft. Dr. Steven Greer

Top Tech News

"I am Exposing the Whole Damn Thing!" (MIND BLOWING!!!!) | Randall Carlson

Researchers reveal how humans could regenerate lost body parts

Antimatter Propulsion Is Still Far Away, But It Could Change Everything

Meet Rudolph Diesel, inventor of the diesel engine

China Looks To Build The Largest Human-Made Object In Space

Ferries, Planes Line up to Purchase 'Solar Diesel' a Cutting-Edge Low-Carbon Fuel...

"UK scientists have created an everlasting battery in a diamond

First look at jet-powered VTOL X-plane for DARPA program

Billions of People Could Benefit from This Breakthrough in Desalination That Ensures...

Tiny Wankel engine packs a power punch above its weight class

News Link • Robots and Artificial Intelligence • 2024-12-21

OpenAI Releases O3 Model With High Performance and High Cost

It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive. It is not just brute force. These capabilities are new territory and they demand serious scientific attention.

Benchmark Performance
ARC-AGI Benchmark
o3 has achieved a breakthrough score on the ARC-AGI benchmark, which is considered an indicator of progress toward artificial general intelligence:

o3 scored 75.7% using standard computing power
With increased resources (high-compute mode), o3 reached an unprecedented 87.5%

This performance surpasses the human-level threshold of 85% and represents a significant leap from its predecessor, o1, which only scored 32%

Mathematics and Problem-Solving
o3 has great mathematical reasoning and problem-solving:

Nearly perfect score (96.7%) on the 2024 American Mathematical Olympiad (AIME)
25.2% on EpochAI's Frontier Math Benchmark, far exceeding previous models that couldn't break 2%

Coding and Software Engineering
In coding-related tasks, o3 shows substantial improvements:

SWE-Bench Verified: 71.7, which is 22.8 points higher than o1
Codeforces: Achieved an Elo rating of 2,727

Other Notable Benchmarks

GPQA Diamond: 87.7%, compared to o1's 78%

Comparison with Gemini 2 and Other Models
While o3 demonstrates exceptional performance, Gemini 2 and other models also show strong capabilities:
Gemini 2.0 Flash

Outperforms its predecessor Gemini 1.5 Pro on key benchmarks6
Excels in competition-level math problems, achieving state-of-the-art results on MATH and HiddenMath6
Performs well in language and multimedia understanding, outperforming GPT-4o on MMLU-Pro6

Read More...

Reported By Freedomsphoenix Readerfour

Forums

Shop

Breaking News

Top Tech News

OpenAI Releases O3 Model With High Performance and High Cost