antirez's DS4: Running DeepSeek V4 Flash with 1M Context on Mac Metal and DGX

Redis creator Salvatore Sanfilippo (antirez) just released a new project called DS4 on GitHub. The goal: get DeepSeek V4 Flash running with a 1M token context window on Apple Silicon (Metal) hardware. He also posted a video of it running on an NVIDIA DGX system.
What DS4 Does
DS4 leverages novel techniques to fit a 1M context window for DeepSeek V4 Flash on Mac Metal hardware (e.g., M-series chips). It's also been demonstrated on a DGX, suggesting it could work on high-end GPUs like the Pro 6000 at slightly smaller context windows with higher speed. There's speculation about future AMD support.
What's Included
- Server endpoints: The DS4 server already provides OpenAI and Anthropic-compatible API endpoints, making it easy to plug into agentic coding tools like Cursor, Continue.dev, or custom agents.
- GitHub repo: https://github.com/antirez/ds4/ — check the README for setup instructions, which likely involve compiling with Metal support and downloading the DeepSeek V4 Flash weights.
- Video demo: A few hours ago, antirez posted a video on X showing it running on a DGX: https://x.com/antirez/status/2053381973226184749
Who It's For
Developers with high-end Mac hardware (e.g., Mac Studio, MacBook Pro with M1 Max/Ultra or M2/M3) or NVIDIA GPUs who want to run a powerful local LLM with a very large context window for coding agents or research.
Community Call to Action
The Reddit poster encourages anyone with powerful hardware to check out the project and contribute — whether by testing, reporting bugs, or optimizing for AMD GPUs. The project is early stage, so community involvement could accelerate compatibility.
📖 Read the full source: r/LocalLLaMA
👀 See Also

MetaBot: Open-Source Bridge Connects Claude Code to Telegram, Feishu, and WeChat
MetaBot is an open-source TypeScript bridge that connects the Claude Code Agent SDK to messaging platforms like Telegram, Feishu, and WeChat. It provides persistent memory, scheduled tasks, multi-agent collaboration, and real-time streaming of tool calls.

Node Control: Real-Time Multiplayer .io Game Built Entirely with Claude 4.6 and 4.7
Developer built a live competitive multiplayer .io game, Node Control, using Claude 4.6 and 4.7. Features server-authoritative netcode at 60Hz, 4-region deployment on fly.io, and neural-network aesthetic.

Claude Pulse Browser Extension Surfaces Token Counts, Cache Timers, and Rate Limits on Claude.ai
Claude Pulse is a client-side Chrome extension that adds a real-time dashboard to Claude.ai showing per-message token counts, total context usage, prompt cache expiry timer, and rate limit progress bar. Also includes chat export to Markdown.

Reseed CLI: Extract Design Systems from Any Site for Claude Code and Cursor
Reseed is a CLI that extracts design tokens (colors, spacing, type scale, radii) from any website and generates a tailwind.config.ts, design-system.md, and reference HTML for Claude Code and Cursor to use.