Modified vLLM 0.17.0 runs on Tesla P40 for real-time transcription with Qwen3 ASR 1.7B

A developer has successfully modified vLLM 0.17.0 to run on Tesla P40 GPUs, enabling real-time lecture transcription with the Qwen3 ASR 1.7B model. The P40 uses the Pascal architecture, which typically lacks support for newer inference engines.
Key Details
The developer was working on a personal project for real-time lecture transcription. They initially planned to use the Qwen3 ASR 1.7B model but found that true real-time transcription is only supported through vLLM. Instead of chunking audio samples as an alternative, they attempted an experimental modification.
Using Codex, they modified vLLM to run on the Pascal architecture. This allowed them to run the Qwen3 ASR 1.7B model on their Tesla P40 server GPU. The result was near-complete hardware acceleration and fully real-time transcription.
The modified vLLM fork is available at: https://github.com/uaysk/vllm-pascal
Next Steps and Challenges
The developer's next goal is to try running Qwen3.5 models on this setup. However, they note several technical issues. The vision functionality appears to be unavailable, and even using only the text capabilities presents challenges. At this point, they are unsure whether it will be possible.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw Orchestrator Routing Issues: When Delegation Fails
A developer reports their OpenClaw main orchestrator incorrectly handles requests itself about 40-50% of the time instead of routing to specialist sub-agents, despite using an explicit routing table and delegation rules. The setup includes 7 specialist agents for services like Gmail, Todoist, Notion, and weather.

Qwen 3.6 27B Q8_k_xl as a Local Daily Driver for VSCode
A developer shares their experience using Qwen-3.6-27B-q8_k_xl by Unsloth in VSCode Insiders via LM Studio on an RTX 6000 Pro, finding it 'good enough' for daily coding tasks without API tokens.

Automated Cold Email System Built with OpenClaw, Neon, and Resend
A developer built a fully automated cold email system using OpenClaw as the orchestrating AI agent, Neon for serverless Postgres, and Resend for email API. The system has sent over 5000 emails and manages lead tracking, automated sending, reply detection, and notifications via iMessage.

Recovering Deleted Apple Music Playlists with Claude Cowork
A user recovered 75 playlists and 8,185 tracks after accidentally deleting their entire Apple Music library. Claude Cowork parsed Apple's data export files, wrote Python scripts for analysis, generated AppleScripts for restoration, and built custom HTML tools to handle missing tracks.