TRELLIS.2 Image-to-3D Ported to Run Natively on Apple Silicon

What This Is
A port of Microsoft's TRELLIS.2 image-to-3D model that runs natively on Apple Silicon via PyTorch MPS, replacing CUDA-only dependencies with pure-PyTorch alternatives.
Key Details
The original TRELLIS.2 requires CUDA with flash_attn, nvdiffrast, and custom sparse convolution kernels that don't work on Mac. This port replaces those with:
- A gather-scatter sparse 3D convolution implementation (backends/conv_none.py)
- SDPA attention for sparse transformers using PyTorch's scaled_dot_product_attention
- Python-based mesh extraction replacing CUDA hashmap operations (backends/mesh_extract.py)
Total changes are a few hundred lines across 9 files. All hardcoded .cuda() calls were patched to use the active device instead.
Performance & Requirements
On M4 Pro (24GB), generates ~400K vertex meshes from single photos in about 3.5 minutes. Memory usage peaks at around 18GB unified memory during generation.
Requirements:
- macOS on Apple Silicon (M1 or later)
- Python 3.11+
- 24GB+ unified memory recommended
- ~15GB disk space for model weights
Setup & Usage
Quick start:
git clone https://github.com/shivampkumar/trellis-mac.git
cd trellis-mac
hf auth login
bash setup.sh
source .venv/bin/activate
python generate.py path/to/image.pngYou need to request access to gated models on HuggingFace: facebook/dinov3-vitl16-pretrain-lvd1689m and briaai/RMBG-2.0.
Basic usage:
python generate.py photo.png
python generate.py photo.png --seed 123 --output my_model --pipeline-type 512Limitations
- No texture export (meshes export with vertex colors only)
- Hole filling disabled (meshes may have small holes)
- Slower than CUDA (~10x slower for sparse convolution)
- Inference only, no training support
Technical Implementation
The sparse 3D convolution builds a spatial hash of active voxels, gathers neighbor features for each kernel position, applies weights via matrix multiplication, and scatter-adds results back. Mesh extraction reimplements flexible_dual_grid_to_mesh using Python dictionaries instead of CUDA hashmap operations.
Benchmarks on M4 Pro (24GB), pipeline type 512:
- Model loading: ~45s
- Image preprocessing: ~5s
- Sparse structure sampling: ~15s
- Shape SLat sampling: ~90s
- Texture SLat sampling: ~50s
- Mesh decoding: ~30s
- Total: ~3.5 min
📖 Read the full source: HN LLM Tools
👀 See Also

Claude Code Studio: Open-Source Desktop App for Managing Multiple Claude Coding Sessions
Claude Code Studio v0.9.3 is an open-source desktop application that provides a multi-pane interface for managing multiple Claude Code CLI sessions. It addresses common workflow issues like juggling terminal tabs, session persistence, and instruction repetition.

Claude Code Plugin Yoink Replaces Library Dependencies to Reduce Supply Chain Risk
Yoink is a Claude Code plugin that removes complex dependencies by reimplementing only needed functions, using a three-step workflow with /setup, /curate-tests, and /decompose commands. It currently supports Python with TypeScript and Rust support underway.

Developer Builds Open Source AI Skill to Validate Startup Ideas, Kills Own Idea in 10 Minutes
A developer built an open source AI skill called startup-design that walks through 8 phases of startup validation from brainstorming to financial projections. When testing it on his own startup idea, the skill asked hard questions that revealed he wasn't the right founder for that particular concept.

Developer builds .NET SaaS template generator with Claude Code, shares workflow insights
A developer open-sourced NETrock, a .NET 10 SaaS starter template with authentication, ORM, and background jobs, then built a client-side generator for it using Claude Code. The generator lets users pick features and download a working .zip project that stays in their browser.