Trellis 2 Successfully Running on ROCm 7.11 with AMD RX 9070 XT

Getting Trellis 2 Running on AMD Hardware
A developer has successfully run Trellis 2 on an AMD RX 9070 XT GPU using ROCm 7.11 on Linux Mint 22.3. This addresses common issues where users encountered geometry cutoff, preview failures, and other errors when attempting to run Trellis 2 on AMD hardware.
Key Issues and Solutions
The developer identified two main problems that were causing most failures:
1. ROCm Instability with High N Tensors
ROCm operations become unstable with large tensors, causing overflows or NaN values. The original code in linear.py in the sparse folder used:
def forward(self, input: VarLenTensor) -> VarLenTensor:
return input.replace(super().forward(input.feats))The fix implements chunked processing to avoid ROCm issues:
ROCM_SAFE_CHUNK = 524_288
def rocm_safe_linear(feats: torch.Tensor, weight: torch.Tensor, bias=None) -> torch.Tensor:
"""F.linear with ROCm large-N chunking workaround."""
N = feats.shape[0]
if N <= ROCM_SAFE_CHUNK:
return F.linear(feats, weight, bias)
out = torch.empty(N, weight.shape[0], device=feats.device, dtype=feats.dtype)
for s in range(0, N, ROCM_SAFE_CHUNK):
e = min(s + ROCM_SAFE_CHUNK, N)
out[s:e] = F.linear(feats[s:e], weight, bias)
return out
def forward(self, input):
feats = input.feats if hasattr(input, 'feats') else input
out = rocm_safe_linear(feats, self.weight, self.bias)
if hasattr(input, 'replace'):
return input.replace(out)
return out
2. Broken hipMemcpy2D in CuMesh
The hipMemcpy2D function in CuMesh was causing vertices and faces to drop off or become corrupted. The original CuMesh initialization used:
void CuMesh::init(const torch::Tensor& vertices, const torch::Tensor& faces) {
size_t num_vertices = vertices.size(0);
size_t num_faces = faces.size(0);
this->vertices.resize(num_vertices);
this->faces.resize(num_faces);
CUDA_CHECK(cudaMemcpy2D(
this->vertices.ptr,
sizeof(float3),
vertices.data_ptr(),
sizeof(float) * 3,
sizeof(float) * 3,
num_vertices,
cudaMemcpyDeviceToDevice
));
...
} The fix replaces the 2D copy with a 1D version:
CUDA_CHECK(cudaMemcpy(
this->vertices.ptr,
vertices.data_ptr(),
num_vertices * sizeof(float3),
cudaMemcpyDeviceToDevice
)); Results and Performance
With these fixes, the developer successfully got the image-to-3D pipeline working, including preview rendering (without normals) and final GLB export. On a test image with 21,204 tokens, the process took approximately 280 seconds from start to preview generation. The run used 1024 resolution with all samplers set to 20 steps.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw Failure Patterns: 42 Real Incidents in 28 Days
A developer running OpenClaw daily documented 42 specific failures across eight categories, including AI hallucinations, authentication breakdowns, and automation that costs more time than it saves. The source provides concrete examples like Google OAuth 7-day token expiration and Opus 4.6 adding unwanted metadata to files.

Optimizing AutoResearch on RTX 5090: What Failed and What Worked
A developer shares specific configuration details for running AutoResearch on an RTX 5090/Blackwell setup, including failed approaches that appeared functional but performed poorly, and the working configuration that achieved stable results with TOTAL_BATCH_SIZE=2**17 and TIME_BUDGET=1200.

Fix for Claude VS Code Extension Error: 'command claude-vscode.editor.openLast not found'
The Claude VS Code extension version 2.1.51 contains a breaking bug that causes the error 'command claude-vscode.editor.openLast not found'. The workaround is to downgrade to version 2.1.49.

How to Optimize Your OpenClaw Setup with Specific Instructions and Refinements
OpenClaw optimization relies on precise instructions and continuous refinement of agent personalities and cost-effective model utilization.