Claude CLI v2.1.154 Breaks Local vLLM — One-Line Patch Fixes It

Claude CLI v2.1.154 introduced support for workflows, but in doing so it added three new API message roles (ctx, msg, and system) that broke compatibility with local vLLM servers. The fix is a one-line change to vLLM's Anthropic protocol definitions.
The Problem
Claude CLI versions ≥2.1.154 now send messages with roles beyond user and assistant. vLLM's Anthropic API endpoint only accepted the original two roles, causing requests from the CLI to fail when pointing to a local vLLM instance.
The One-Line Patch
The patch updates the role field in vllm/entrypoints/anthropic/protocol.py to allow the new roles:
--- a/vllm/entrypoints/anthropic/protocol.py
+++ b/vllm/entrypoints/anthropic/protocol.py
@@ -65,7 +65,7 @@ class AnthropicContentBlock(BaseModel):
class AnthropicMessage(BaseModel):
"""Message structure"""
- role: Literal["user", "assistant"]
+ role: Literal["user", "assistant", "ctx", "msg", "system"]That's it. After applying this change, you can use the latest Claude CLI workflows with vLLM-based local models like MiniMax-M2.7 (the only model tested by the author).
If you run a local Anthropic-compatible endpoint on vLLM, apply this patch to keep working with Claude CLI ≥2.1.154.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Agent Framework Token Bloat: A 500:1 Input-to-Output Ratio Is Normal
A self-hosted agent framework user reports ~21k input tokens per message and 500:1 input-to-output ratio from tool definitions, system prompt, and memory. Community confirms 15-25k baseline context is common for tool-using agents.

How Claude Project Instructions Are Injected — And Why Changing Them Mid-Conversation Breaks History
Project Instructions and User Preferences are loaded into the system prompt at conversation start, not re-injected every turn. Changing them mid-conversation causes Claude to overwrite its memory of past instructions, leading to false recollections.

Don't Just Paste the AI — Write Your Own Take
A direct plea to developers: stop copying AI chatbot answers verbatim. Use AI as a drafting partner, then rewrite the reply in your own words.
