Open-Source LLMs Beat Claude Opus 4.6 in Trading Tasks

A Reddit user on r/LocalLLaMA conducted a comparative test of 10 different large language models to evaluate their performance in generating trading strategies. The results challenge assumptions about cost-performance relationships in commercial LLMs.

Test methodology and models

The user launched 10 LLMs with the same prompt: "create the best trading strategy." The tested models included:

Claude Opus 4.6
Gemini 3, 3.1 Pro, and GPT-5.2
Gemini Flash 3, GPT-5-mini, Kimi K2.5, and Minimax 2.5

The test was run three times to verify consistency of results.

Key findings

According to the source:

Minimax 2.5 and Gemini 3.1 topped the leaderboard
Anthropic's models (including Opus 4.6) performed "lackluster" and didn't crack the top 4
Claude Opus 4.6 cost 10x more than competing models
Open-source models were much slower than Anthropic and Google models

The user noted initial skepticism about the results, stating: "Honestly, I didn't believe the results the first time I did this." After verification, they concluded: "The results are legit."

Practical implications

For developers using AI coding agents, this suggests that for certain specialized tasks like trading strategy generation, open-source models may offer better performance at significantly lower cost. The main trade-off noted is speed - open-source models were described as "much slower" than commercial alternatives from Anthropic and Google.

The user's conclusion was direct: "other than that, there's not a great reason to use Opus or Sonnet for this task."

📖 Read the full source: r/LocalLLaMA