>
Episode 403: THE POLITICS OF POLIO
Google Versus xAI AI Compute Scaling
OpenAI Releases O3 Model With High Performance and High Cost
WE FOUND OUT WHAT THE DRONES ARE!! ft. Dr. Steven Greer
"I am Exposing the Whole Damn Thing!" (MIND BLOWING!!!!) | Randall Carlson
Researchers reveal how humans could regenerate lost body parts
Antimatter Propulsion Is Still Far Away, But It Could Change Everything
Meet Rudolph Diesel, inventor of the diesel engine
China Looks To Build The Largest Human-Made Object In Space
Ferries, Planes Line up to Purchase 'Solar Diesel' a Cutting-Edge Low-Carbon Fuel...
"UK scientists have created an everlasting battery in a diamond
First look at jet-powered VTOL X-plane for DARPA program
Billions of People Could Benefit from This Breakthrough in Desalination That Ensures...
Tiny Wankel engine packs a power punch above its weight class
It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive. It is not just brute force. These capabilities are new territory and they demand serious scientific attention.
Benchmark Performance
ARC-AGI Benchmark
o3 has achieved a breakthrough score on the ARC-AGI benchmark, which is considered an indicator of progress toward artificial general intelligence:
o3 scored 75.7% using standard computing power
With increased resources (high-compute mode), o3 reached an unprecedented 87.5%
This performance surpasses the human-level threshold of 85% and represents a significant leap from its predecessor, o1, which only scored 32%
Mathematics and Problem-Solving
o3 has great mathematical reasoning and problem-solving:
Nearly perfect score (96.7%) on the 2024 American Mathematical Olympiad (AIME)
25.2% on EpochAI's Frontier Math Benchmark, far exceeding previous models that couldn't break 2%
Coding and Software Engineering
In coding-related tasks, o3 shows substantial improvements:
SWE-Bench Verified: 71.7, which is 22.8 points higher than o1
Codeforces: Achieved an Elo rating of 2,727
Other Notable Benchmarks
GPQA Diamond: 87.7%, compared to o1's 78%
Comparison with Gemini 2 and Other Models
While o3 demonstrates exceptional performance, Gemini 2 and other models also show strong capabilities:
Gemini 2.0 Flash
Outperforms its predecessor Gemini 1.5 Pro on key benchmarks6
Excels in competition-level math problems, achieving state-of-the-art results on MATH and HiddenMath6
Performs well in language and multimedia understanding, outperforming GPT-4o on MMLU-Pro6