TRELLIS.2 Ported to Apple Silicon: 3D from One Photo in 3.5 Min

What This Is

A port of Microsoft's TRELLIS.2 image-to-3D model that runs natively on Apple Silicon via PyTorch MPS, replacing CUDA-only dependencies with pure-PyTorch alternatives.

Key Details

The original TRELLIS.2 requires CUDA with flash_attn, nvdiffrast, and custom sparse convolution kernels that don't work on Mac. This port replaces those with:

A gather-scatter sparse 3D convolution implementation (backends/conv_none.py)
SDPA attention for sparse transformers using PyTorch's scaled_dot_product_attention
Python-based mesh extraction replacing CUDA hashmap operations (backends/mesh_extract.py)

Total changes are a few hundred lines across 9 files. All hardcoded .cuda() calls were patched to use the active device instead.

Performance & Requirements

On M4 Pro (24GB), generates ~400K vertex meshes from single photos in about 3.5 minutes. Memory usage peaks at around 18GB unified memory during generation.

Requirements:

macOS on Apple Silicon (M1 or later)
Python 3.11+
24GB+ unified memory recommended
~15GB disk space for model weights

Setup & Usage

Quick start:

git clone https://github.com/shivampkumar/trellis-mac.git
cd trellis-mac
hf auth login
bash setup.sh
source .venv/bin/activate
python generate.py path/to/image.png

You need to request access to gated models on HuggingFace: facebook/dinov3-vitl16-pretrain-lvd1689m and briaai/RMBG-2.0.

Basic usage:

python generate.py photo.png
python generate.py photo.png --seed 123 --output my_model --pipeline-type 512

Limitations

No texture export (meshes export with vertex colors only)
Hole filling disabled (meshes may have small holes)
Slower than CUDA (~10x slower for sparse convolution)
Inference only, no training support

Technical Implementation

The sparse 3D convolution builds a spatial hash of active voxels, gathers neighbor features for each kernel position, applies weights via matrix multiplication, and scatter-adds results back. Mesh extraction reimplements flexible_dual_grid_to_mesh using Python dictionaries instead of CUDA hashmap operations.

Benchmarks on M4 Pro (24GB), pipeline type 512:

Model loading: ~45s
Image preprocessing: ~5s
Sparse structure sampling: ~15s
Shape SLat sampling: ~90s
Texture SLat sampling: ~50s
Mesh decoding: ~30s
Total: ~3.5 min

📖 Read the full source: HN LLM Tools

TRELLIS.2 Image-to-3D Ported to Run Natively on Apple Silicon

What This Is

Key Details

Performance & Requirements

Setup & Usage

Limitations

Technical Implementation

👀 See Also

Claude Code Studio: Open-Source Desktop App for Managing Multiple Claude Coding Sessions

Claude Code Plugin Yoink Replaces Library Dependencies to Reduce Supply Chain Risk

Developer Builds Open Source AI Skill to Validate Startup Ideas, Kills Own Idea in 10 Minutes

Developer builds .NET SaaS template generator with Claude Code, shares workflow insights