Cloudflare's AI Platform: Unified Inference Layer for AI Agents

What Cloudflare's AI Platform Offers
Cloudflare has expanded its AI capabilities into a unified inference layer designed specifically for AI agents. The platform addresses the challenge of AI models changing rapidly and the need to use multiple models for different tasks within agentic workflows.
Key Features and Implementation
The core offering is one API to access any AI model from any provider. For Workers users, you can call third-party models using the same AI.run() binding already used for Workers AI. Switching between providers requires only a one-line code change.
const response = await env.AI.run('@cf/moonshotai/kimi-k2.5', {
prompt: 'What is AI Gateway?'
}, {
metadata: {
"teamId": "AI",
"userId": 12345
}
});The platform provides access to 70+ models across 12+ providers including Alibaba Cloud, AssemblyAI, Bytedance, Google, InWorld, MiniMax, OpenAI, Pixverse, Recraft, Runway, and Vidu. Model offerings now include image, video, and speech models for building multimodal applications.
Cost Management and BYOM Support
All AI spend can be managed in one place through AI Gateway. By including custom metadata with requests, you can get cost breakdowns by attributes like free vs. paid users, individual customers, or specific workflows.
For custom model needs, Cloudflare is working on letting users bring their own models to Workers AI using Replicate's Cog technology. This involves containerizing machine learning models with a cog.yaml file and Python inference code, abstracting away CUDA dependencies, Python versions, and weight loading.
Recent Updates and Availability
Recent additions include zero-setup default gateways, automatic retries on upstream failures, and more granular logging controls. REST API support for non-Workers users is coming in the coming weeks.
📖 Read the full source: HN AI Agents
👀 See Also

GPT-5.5 Codex vs Claude Opus 4.7: Real-world coding agent benchmarks
A developer pitted GPT-5.5 Codex against Claude Opus 4.7 on two real tasks: a PR triage bot and a real-time code review UI. Claude shipped cleaner with zero errors; Codex was 18% cheaper but needed a patch pass.

Running OpenClaw in an Isolated Micro-VM with Void-Box
OpenClaw can be run as a service inside an isolated micro-VM using Void-Box, a capability-bound runtime that executes workflows in KVM micro-VMs, providing a clean execution boundary without container runtime involvement.

OpenMontage: Open-Source Agentic Video Production System for AI Coding Assistants
OpenMontage is an open-source video production system that transforms AI coding assistants like Claude Code into full production studios. It handles research, scene planning, script writing, voice narration, music selection, subtitle generation, and validation through automated pipelines.

Speak with Claw: Open Source iOS Voice Interface for OpenClaw Telegram Bots
An open source iOS app that enables voice interaction with OpenClaw-powered Telegram bots. The app sends audio to a local Mac server for processing, with responses returned as both text and audio.