Microsoft releases Phi-4-reasoning-vision-15B multimodal model with training insights

Model overview and availability
Phi-4-reasoning-vision-15B is a 15 billion parameter open-weight multimodal reasoning model that's available through Microsoft Foundry, HuggingFace, and GitHub. It's designed as a compact model that balances reasoning power, efficiency, and training data needs.
Capabilities and performance
The model handles a wide array of vision-language tasks including image captioning, asking questions about images, reading documents and receipts, helping with homework, and inferring about changes in sequences of images. It particularly excels at math and science reasoning and at understanding and grounding elements on computer and mobile screens.
Performance benchmarks show competitive results compared to slower models that require ten times or more compute-time and tokens, with better accuracy than similarly fast models for math and science reasoning. Benchmarks used include ChartQA_TEST, MathVista_MINI, MMMU_VAL, and ScreenSpot_v2.
Training approach and efficiency
The model was trained with just 200 billion tokens of multimodal data, leveraging Phi-4-reasoning (trained with 16 billion tokens) based on Phi-4 (400 billion unique tokens). This compares to more than 1 trillion tokens used for training other multimodal models like Qwen 2.5 VL, Qwen 3 VL, Kimi-VL, and Gemma3.
Microsoft emphasizes careful architecture choices, rigorous data curation, and using a mixture of reasoning and non-reasoning data as key lessons from training this model. The approach aims to push the pareto-frontier of the tradeoff between accuracy and compute costs.
Target use cases
The model is intended for resource-constrained or interactive settings where smaller, faster vision-language models are needed. It's lightweight enough to run on modest hardware while maintaining structured reasoning capabilities.
📖 Read the full source: HN AI Agents
👀 See Also

Claude AI Suffers Widespread Outage: Web UI Down, API Errors Elevated
Claude.ai is unavailable and the API is returning elevated error rates as of April 28, 2025, 19:15 UTC. Official status page confirms ongoing incident.

Kimi k2.5: Breaking New Ground in AI Automation
Kimi k2.5 has set a new standard for AI automation, boasting advanced capabilities that are turning heads in the tech community. Discover how it is reshaping the landscape.

AI Subscription Pricing Crash: Why Your Enterprise Bill Is About to 10x
AI labs like OpenAI, Anthropic, and Microsoft are losing money on every subscription seat. Agentic workloads have broken the flat-fee model — GitHub Copilot moves to usage-based billing June 1, 2026. Enterprises that built on subsidized pricing face a correction.

Claude Loses Ability to Retrieve Product Pricing Across Retailers
As of April 27, Claude no longer returns pricing for Amazon, Best Buy, Newegg, or B&H Photo. Walmart is the only retailer still showing prices.