Claude Code Used to Simulate 4,000+ Blind Werewolf Games with LLMs

Simulation Setup and Results
A developer built a small simulator using Claude Code where large language models play blind one-night Werewolf against each other. The experiment ran approximately 4,600 games across models from OpenAI (GPT-4o-mini, GPT-5-mini) and xAI (Grok-3-fast, Grok-4-1-fast).
The game variant has minimal signals: 7 players, 1 wolf, no roles, one short discussion, then a simultaneous vote. The only differentiating factor between players is their name. Despite this limited setup, the simulation revealed consistent patterns where some names get voted out significantly more often than others across every model tested, while other names almost never get voted out.
Important Caveats and Access
The developer explicitly states this isn't a causal claim — just an outcome pattern from a toy setup. The name groups are broad, some names appear less frequently, and there are multiple ways this could be an artifact of the setup rather than revealing anything fundamental about the models. However, the consistency of these patterns across runs and models was noted as surprising.
For those interested in exploring further:
- Dashboard: https://huggingface.co/spaces/Queue-Bit-1/llm-bias-dashboard
- Code + raw logs: https://github.com/Queue-Bit-1/wolf
The developer is curious if others have observed similar name effects in multi-agent simulations.
📖 Read the full source: r/ClaudeAI
👀 See Also

AutoClaw Local Runner Review: Easy Setup, Credit Costs, and Uninstall Issues
A user tested AutoClaw, a local runner for OpenClaw/AutoGLM from Zai_org, finding the setup smooth but encountering high credit consumption, task failures, and concerning persistence after uninstallation including registry entries and plaintext credentials.

Adeu v1.4: Open-Source MCP for Track Changes in DOCX
Adeu v1.4 surgically injects native OOXML redlines into DOCX files, preserving formatting, numbering, and layouts. Adds footnotes/endnotes inline editing, defined term linting, cross-reference maps, and multi-level list round-tripping.

Developer Tests Apple Intelligence for On-Device Clipboard Tasks
A developer built a clipboard manager using Apple Intelligence's Foundation Models framework, finding it reasonable for everyday tasks like short summaries and rewrites but limited on ambiguous language and detailed work.

Specsmaxxing: Fighting AI Psychosis with YAML Specs and ACAI
Acai.sh introduces Specsmaxxing: a method to combat AI agents losing context by writing requirements in YAML and using numbered Acceptance Criteria for AI (ACAI) that agents reference in code.