Claude System Prompt Compliance Degrades in Long Conversations

A Reddit user reports that Claude's system prompt compliance degrades significantly in long conversations, particularly affecting AI coding agents with specific formatting rules and constraints.
Problem Details
The user runs multiple Claude-based agents for internal tooling, each with system prompts containing specific rules about output format, topics to avoid, and edge case handling. While these work perfectly for the first 20-30 exchanges, compliance begins slipping around message 40-50.
Specific issues observed:
- Agents stop following formatting rules
- They become "helpful" in ways the system prompt explicitly prohibits
- They forget constraints that were clear at the start
The user notes this isn't a bug but rather how context windows work under pressure, with system prompts competing with 40+ messages of conversation history for attention weight.
Workarounds and Solutions
The user shares several practical approaches that have worked:
- Restate critical rules: Every 15-20 messages, restate the top 3 rules you can't afford to lose in condensed form (not the full system prompt)
- Keep conversations shorter: If a task requires more than 30 exchanges, start a new session with a summary of what happened
- Strategic prompt placement: Put your most important constraints at both the beginning AND end of the system prompt, as models pay more attention to both positions
- Test at scale: Test your agents at message 50, not just message 5, since happy path demos don't reveal this issue
The user emphasizes that this problem isn't discussed enough and invites others to share reliable patterns for maintaining instruction adherence in long-running sessions.
📖 Read the full source: r/ClaudeAI
👀 See Also

Qwen3.5-122B on Blackwell SM120: fp8 KV Cache Corruption Issue and Performance Findings
Testing Qwen3.5-122B on 8x RTX PRO 6000 Blackwell hardware revealed that fp8_e4m3 KV cache silently produces corrupt output without errors, requiring bf16 KV cache instead. MTP optimization provided a 2.75x single-request speedup while DeltaNet constraints blocked other optimizations.

User reports switching from Gemini Pro to Claude Max for academic project assistance
A user switched from Gemini Pro to Claude Max after experiencing frustration with Gemini's performance on practical tasks. They report Claude successfully reviewed their academic project, asked clarifying questions, and suggested logging learned information to a memory.md file.

Qwen 3.6-35B-A3B KV Cache Bench: f16 vs q8_0 vs Turbo3 vs Turbo4 on M5 Max Up to 1M Context
Benchmarks of TheTom's TurboQuant Metal fork on M5 Max show f16 and q8_0 OOM past 256K, while turbo3 hits 1M at 6.5 tok/s decode. Prefill and decode split favors turbo3 for prefill and turbo4 for decode on long contexts.

AI Is Too Expensive: Hyperscalers Need $3 Trillion to Break Even
Hyperscalers have invested over $800B in AI capex, with $1T more planned for 2027. Microsoft alone spent ~$100B on OpenAI infrastructure, yet AI revenue covers only ~20% of its capex.