Nemotron 3 4B Underperforms Qwen 3.5 4B in Demanding Benchmarks

Benchmark Results: Qwen 3.5 4B Outperforms Nemotron 3 4B
A detailed benchmark comparison between Qwen 3.5 4B Q8 and Nemotron 3 4B Q8 reveals significant performance differences in mathematical reasoning and structured output tasks.
Test Methodology
The benchmark consisted of five demanding sub-tasks requiring mathematical proofs, modular arithmetic, algorithm design, and multilingual text generation, all wrapped in valid JSON format. The exact prompt asked for:
- Definition and evaluation of S(n) = Σ(-1)^k C(n,k)/(k+1)^2 with closed form in terms of H_{n+1}, evaluation at n=2026, and 8-line proof using integrals
- Computation of T = Σ[floor((17k+8)/29) - floor((17k-4)/29)] from k=1 to 2026 with modular justification
- Möbius + inclusion-exclusion algorithm for counting coprime pairs in a dynamic array with pseudocode in exactly 14 lines (variable names ≤8 characters)
- Computation of C(4052, 2026) mod 7 using Lucas theorem with base-7 conversion
- Portuguese paragraph of exactly 47 words containing "Möbius", "inclusão-exclusão" and "Lucas", ending with "fim."
Results Breakdown
Qwen 3.5 4B Q8 (correct): Produced valid JSON with all sub-tasks correctly completed. The mathematical proofs were accurate, including:
{
"A": {
"closed_form": "H_{n+1}/(n+1)",
"value_2026": "H_{2027}/2027",
"proof_8_lines": "S(n) = sum (-1)^k C(n,k) int_0^1 x^k(-ln x)dx\nSwap sum and integral: -int_0^1 ln(x)(1-x)^n dx\nLet u=1-x: S(n) = -int_0^1 u^n ln(1-u) du\nUsing identity int_0^1 x^n ln(1-x)dx = -H_{n+1}/(n+1)\nWe find S(n) = H_{n+1}/(n+1) for all n>=0\nFor n=2026, value is H_{2027}/2027 exactly.\nQED."
},
"B": {
"value": 838,
"justification": "Difference is 1 iff (17k-4) mod 29 >= 17. gcd(17,29)=1 so residues cycle every 29 terms. Per period, 12 values qualify. 2026 = 69*29 + 25. Full periods: 69*12=828. Remainder k=1..25: residues for k=26..29 include two >= 17, so remainder adds 10. Total: 838."
}
}
Nemotron 3 4B Q8 (incorrect): Failed to produce correct reasoning and structured output. Issues included:
- Mangled integral steps with wrong substitutions in the mathematical proof
- Incomplete pseudocode with placeholder hash symbols instead of the required 14 lines
- Truncated Portuguese paragraph ending mid-sentence
{
"C": {
"pseudocodigo": [
"read n",
"a = []",
"mu = [0]*(max+1)",
"for i in range(n): a.append(int(input()))",
"for x in a:",
" for d in range(1,int(sqrt(x))):",
" if x%d==0",
" mu[d]+=1",
" mu[x//d]-=1",
"#",
"#",
"#",
"#",
"#",
"#",
"#"
],
"complexidade": "O(n√max)"
}
}
Key Finding
The benchmark reveals that Nemotron 3 4B's architectural advantage—support for larger context windows—doesn't translate to better reasoning capability within that context. While Nemotron theoretically offers expanded context capacity, it failed to execute complex mathematical reasoning and structured output generation that Qwen 3.5 4B handled correctly.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Firefox 148 adds AI kill switch and enhanced privacy controls
Firefox 148 introduces an AI kill switch feature that lets users disable all AI functionalities, including chatbot prompts and AI-generated link summaries. The update also provides more control over remote updates and data collection.

Claude Code v2.1.154: Opus 4.8, Dynamic Workflows, and Major Fixes
New release adds Opus 4.8 with high-effort defaults, dynamic workflows orchestrating tens to hundreds of agents, fast mode at 2x standard rate, and over a dozen bug fixes.

The West Forgot How to Build: Defense Supply Chain Collapse and Lessons for Software Engineering
Raytheon had to bring back retired engineers to restart Stinger missile production from 40-year-old paper schematics. The same pattern is now playing out in software, where decades of optimizing for cost have atrophied the talent pipeline and institutional knowledge.

Claude Cowork Usage Limits Doubled to 10 Hours Through July 5
Anthropic doubled the 5-hour usage limits in Claude Cowork to 10 hours for the next month on all paid plans. Available through July 5 via the desktop app.