Weekly Multimodal AI Roundup: Holotron-12B, Nemotron Omni, GlyphPrinter, and More

Open Multimodal AI Developments
Here are the key open-source multimodal AI releases and projects from the past week, curated from r/LocalLLaMA.
Holotron-12B
Holotron-12B is an open computer-use agent model available on Hugging Face. It's optimized for throughput and long multi-image contexts, serving as an open alternative for the computer-use agent ecosystem beyond closed APIs.
NVIDIA Nemotron Omni + Isaac GR00T N1.7
NVIDIA released open Nemotron 3 omni models that integrate language, vision, and voice in one stack. The GR00T N1.7 is a vision-language-action model specifically designed for robotics applications.
GlyphPrinter
GlyphPrinter addresses text rendering accuracy in AI image generators using Region-Grouped Direct Preference Optimization. It balances artistic styling with accurate text rendering and provides open weights. The approach fixes localized spelling errors in generated images.
SparkVSR
Google's video super-resolution model enhances video quality and clarity. This project focuses on improving video resolution through AI processing.
SegviGen
SegviGen enables 3D object segmentation via colorization by repurposing 3D image generators. The method frames segmentation as a colorization task and reportedly uses less than 1% of the training data required by older methods. The project includes open code and a demo.
OpenMAIC
OpenMAIC (Multi-Agent Interactive Classroom) turns any topic or document into an interactive classroom with AI teachers and classmates. It uses multi-agent orchestration to generate slides, quizzes, simulations, and discussions.
SkillNet
SkillNet provides open infrastructure for creating, evaluating, and organizing AI agent skills at scale. The system enables agents to transition from transient experience to durable mastery.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Why Anthropic's Activation Steering Struggles with Generating Valid JSON
Activation steering, a technique used for AI safety, fails to generate valid JSON, achieving only 24.4% validity compared to 86.8% from the untrained base model.

Current State of Chinese LLMs: Market Leaders, Open Models, and Business Models
A Reddit analysis details the Chinese LLM landscape, identifying ByteDance's Doubao as the proprietary market leader and DeepSeek as the most innovative, while outlining the business models of major players and 'Six AI Small Tigers' focused on open-weight models.

Trading Strategy Benchmark: Cheaper AI Models Outperform Claude Opus 4.6
A benchmark tested 10 LLMs on developing trading strategies, with cheaper models like Minimax 2.5 and Gemini 3.1 outperforming Claude Opus 4.6 despite its 10x higher cost. The experiment was run three times with consistent results.

OpenClaw 2026.3.24: Bridge Config Removed, Heartbeat Token Savings, Loop Detection
OpenClaw 2026.3.24 removes the deprecated bridge configuration section from openclaw.json, adds isolatedSession: true to heartbeat config to reduce token costs from ~100K to 2-5K per run, and introduces new features including imageGenerationModel, tools.loopDetection, channels.modelByChannel, built-in model aliases, and pdfModel.