Leveraging Agent Skills for Writing CUDA Kernels with Upskill

Hugging Face has introduced a method to enhance smaller AI models’ performance on complex tasks, such as writing CUDA kernels, through the use of agent skills. This process utilizes the new upskill tool, allowing you to generate and evaluate agent skills with large models and apply these skills to smaller or more cost-effective models.
Agent skills are packaged forms of knowledge that can be exchanged between models and tools, defined as files containing instructions in markdown and scripts. They prove particularly beneficial in niche or hard problem domains where models might not naturally excel.
Steps to Upskill Using Claude and Upskill Tool
1. Building a Kernel with Claude Opus 4.5: The process begins by employing Claude Code to interactively assemble a kernel and export the trace. This involves iterating solutions with draft skills, enabling continuous improvement through smaller model experimentation.
2. Creating an Agent Skill from the Trace: After the kernel is constructed, instruct Claude to generate a skill file for the completed task. Employing the Anthropic ‘skill creator’ can also facilitate this process, creating skills based on the agent's activity trace. upskill enhances usefulness by also providing test cases to assess skill performance.
3. Applying the Skill across Models: Transfer the newly crafted skill to desired models following standard practices, where skills are formatted as directories, e.g., {agent}/skills/{skill_name}/SKILL.md. Use upskill eval commands to run model performance comparisons using these skills, highlighting differences in accuracy and token usage across varied platforms like codex or cursor.
Ultimately, skills can aid in reducing token consumption while maintaining accuracy, critical for recurring tasks on different models. However, variations in effectiveness suggest iterative skill refinement may be necessary.
📖 Read the full source: Hugging Face Blog
👀 See Also

Bug Hunt: WireGuard Crashes and MTU Mismatch in GKE
Lovable engineers traced user errors to anetd crashes from a concurrent map access panic in Google's WireGuard integration, then found a secondary MTU mismatch after disabling encryption.

Fix for 'VM Service Not Running' error in Cowork on Windows 11
A Reddit user shares a PowerShell command fix for the 'VM Service Not Running' error in Cowork when Hyper-V is installed but the hypervisor isn't launching at boot. The solution involves checking hypervisorlaunchtype and setting it to auto.

DeepSeek-V4-Flash W4A16+FP8 with MTP Self-Speculation: 85 tok/s on 2x RTX PRO 6000 Max-Q
DeepSeek-V4-Flash quantized to W4A16+FP8 achieves 85.52 tok/s at 524k context on 2× RTX PRO 6000 Max-Q using a patched vLLM with retrofitted MTP head, up from 52.85 tok/s baseline.

Implementing Time Tracking in Claude AI Projects
A method using Claude AI involves time-stamping responses to track work sessions and send break reminders.