MOOSE-Star 7B Model: 54.34% Accuracy in Hypothesis Discovery

MOOSE-Star is out: a 7B parameter model post-trained for scientific hypothesis discovery, plus the TOMATO-Star dataset of 108,717 NCBI papers. Accepted at ICML 2026. The models are fine-tuned from DeepSeek-R1-Distill-Qwen-7B and come in three variants: MS-IR-7B (inspiration retrieval), MS-HC-7B (hypothesis composition), and MS-7B (joint use).

Key Details

Dataset: TOMATO-Star – 108,717 papers from NCBI (biology, chemistry, medicine, medical imaging, psychology, cognitive science), each decomposed into (background, hypothesis, inspirations) with real citations. Built with ~38,400 A800 GPU-hours of preprocessing.
Temporal split: train ≤ Sep 2025, test = Oct 2025 (after base model's knowledge cutoff).
Inspiration retrieval accuracy benchmarks:
- Random Selection: 6.70%
- R1-Distilled-Qwen-7B (base): 28.42%
- Claude Sonnet 4.6: 45.02%
- DeepSeek-R1: 45.11%
- Gemini-3 Flash: 51.44%
- GPT-5.4: 51.50%
- MS-7B (7B, joint IR + HC): 54.34%
- MS-IR-7B (7B, IR-only): 54.37%
- Gemini-3 Pro: 54.89%
Model size & deployment: Standard DeepSeek-R1-Distill-Qwen-7B fine-tune, ~14GB at fp16, runs on single 24GB GPU. Compatible with llama.cpp, vLLM, SGLang.
Licenses: Apache-2.0 for code, CC-BY-4.0 for data.

Paper: arxiv.org/abs/2603.03756 | GitHub: github.com/ZonglinY/MOOSE-Star | Hugging Face collection: huggingface.co/collections/ZonglinY/moose-star-models-and-data

Stress-test it. Disclosure: posted by MiroMind community team.

📖 Read the full source: r/LocalLLaMA

MOOSE-Star: A 7B Model and 108K-Paper Dataset for Scientific Hypothesis Discovery – ICML 2026

Key Details

👀 See Also

Manual-Driven Development: A Method to Prevent Claude Code's Confident Divergence

ClawControl iOS client released for OpenClaw self-hosted servers

MCP Server for Local XMind Mind Map Files Released

NaNMesh MCP checks GitHub issues before Claude recommends libraries