AIME 2026 Results: Both Open and Closed Models Score Above 90%

The AIME 2026 (American Invitational Mathematics Examination) results are out, and both closed and open AI models are now scoring above 90% on this challenging mathematical reasoning benchmark.
Key Highlights
- Both proprietary (closed) and open-source models exceed 90% accuracy
- DeepSeek V3.2 can run the entire test for approximately bash.09 in API costs
- This represents a significant milestone in mathematical reasoning capabilities
What This Means
AIME is traditionally one of the most challenging high school mathematics competitions, featuring problems that require sophisticated mathematical reasoning. AI models achieving 90%+ accuracy demonstrates remarkable progress in complex reasoning abilities.
Cost Efficiency
The fact that DeepSeek V3.2 can achieve competitive results at just bash.09 for the entire test highlights the rapidly decreasing cost of advanced AI capabilities, making sophisticated reasoning more accessible.
Why This Matters
The achievement of over 90% accuracy by both closed and open AI models signifies a pivotal moment in the evolution of AI technologies. It showcases the potential for AI to assist not only in educational contexts but also in real-world applications where complex problem-solving is required. This advancement may encourage further investment and development in AI systems, particularly in areas that require high-level cognitive functions.
Key Takeaways
- The performance of AI models in AIME 2026 indicates a leap in their mathematical reasoning capabilities.
- Both proprietary and open-source models are reaching similar levels of accuracy, promoting healthy competition and innovation in the AI space.
- Cost-effective solutions like DeepSeek V3.2 are making advanced AI tools more accessible to a broader audience.
- This progress could inspire educational institutions to integrate AI tools into their curricula, enhancing learning experiences.
Getting Started
For those interested in leveraging AI for mathematical reasoning or other complex tasks, starting with tools like DeepSeek V3.2 is straightforward. Users can sign up for an API key on the DeepSeek website, enabling them to access the model's capabilities. Once registered, developers can integrate the API into their applications or use it for personal projects, allowing for experimentation with AI-driven problem-solving.
Full results: matharena.ai
📖 Read the full source: r/LocalLLaMA
👀 See Also

AI-generated code volume is overwhelming senior engineers, study shows
AI users merge 98% more pull requests with AI assistance, but senior engineers report increased cognitive load and burnout. Research shows defect detection drops from 87% for PRs under 100 lines to 28% for PRs over 1,000 lines.

NVIDIA announces NemoClaw with OpenShell security features
NVIDIA announced NemoClaw at GTC, building on OpenClaw to add enterprise-grade security through OpenShell, which enforces policy-based privacy and security guardrails for AI agents.

OpenAI Training Costs Projected to Exceed Anthropic's by 4-5 Times Annually
According to confidential financials reported by the Wall Street Journal, OpenAI expects to spend 4-5 times more on training than Anthropic each year for the next five years. The expense scale is described as mind-boggling.
