Qwen3.5-4B-Safety-Thinking: 4B Parameter Safety Model

Merlin Research has released Qwen3.5-4B-Safety-Thinking, a 4 billion parameter safety-aligned reasoning model built on Qwen3.5. This model is specifically designed for structured 'thinking' and safety applications in real-world scenarios, with particular focus on agent systems.

Key improvements and features

Improved ability to accurately follow strict instructions in prompts
Based on the use of Bloom and Petri methods from Anthropic
Resistant to hacking attempts
Increased resistance to 'abnormal' and adversarial prompts
Up to 1 million token context window
Uses frameworks from Anthropic - Bloom and Petri

The model is available on Hugging Face at MerlinSafety/Qwen3.5-4B-Safety-Thinking.

For developers working with AI agents, this model represents a specialized tool for safety-critical applications where structured reasoning and resistance to prompt manipulation are priorities. The integration of Anthropic's Bloom and Petri methods suggests a focus on constitutional AI approaches to alignment.

📖 Read the full source: r/LocalLLaMA

Merlin Research releases Qwen3.5-4B-Safety-Thinking model for structured reasoning

Key improvements and features

👀 See Also

Claude App Ranks Second in US App Store After Pentagon Dispute

Anthropic API Billing Bug: Sonnet Model Charged at Opus Rates

KV Cache Architecture Evolution: From GPT-2 to Mamba

Benchmark shows smaller 4B model outperforms larger LLMs for phone-to-home chat applications