Pokemon Showdown AI Agents Built with Free LLM APIs and Tool-Calling

A developer built a system where LLMs like Llama 3, Qwen, and Gemma autonomously play Pokémon Showdown battles. The agents analyze the full battle state each turn—type matchups, HP, weather, field conditions, revealed opponent info—and decide whether to attack or switch using structured tool calls.
Key Details
- Routes everything through LiteLLM and exclusively uses models with free API tiers (Groq, Cerebras, OpenRouter, Google AI Studio).
- Zero inference cost to run locally.
- Two modes: Human vs. AI (play against the bot) and AI vs. AI (pit two models against each other).
- Supports 15+ free models out of the box.
- Full observability via Langfuse to see exact tool calls and reasoning per turn.
Architecture Highlights
The agent uses tool-calling to structure decisions—rather than simple prompt-response—raw battlefield data is fed into the LLM, which then selects attack or switch actions via predefined tool schemas. This allows reasoning about complex board states like type advantages and dynamic field effects.
GitHub Repo
Code and setup instructions: github.com/MohamedMostafa259/pokemon-ai-agent
📖 Read the full source: r/LocalLLaMA
👀 See Also

Qwen Meetup Draft: Function Calling Harness 2 Boosts CoT Compliance from 9.91% to 100% via Structured Schemas
A follow-up to the earlier function-calling harness post extends the pattern to domains without a compiler (investment memos, legal opinions, clinical charts). The schema forces required fields — submission rejected if incomplete. Qwen3.6-27b achieves 100% CoT compliance on these schemas.

Claude Code Skill Converts Stitch Designs to Next.js with Zero Pixel Drift
A Claude Code skill converts Google Stitch AI designs to Next.js components with mandatory verification checkpoints to prevent pixel drift, preserving exact values and handling assets.

Memctl: Open Source MCP Server for Persistent Memory in AI Coding Agents
Memctl is an open source MCP server that provides AI coding agents with persistent memory across sessions, machines, and IDEs. Built primarily with Claude Code in two weeks, it stores project context and serves it back in subsequent sessions.

GitVelocity: AI Scoring of 50k PRs Reveals Insights on Code Complexity
GitVelocity uses Claude to score merged pull requests 0-100 across six dimensions: scope, architecture, implementation, risk, quality, and performance/security. After analyzing 50,000+ PRs across TypeScript, Python, Rust, Go, Java, and Elixir, the team found surprising patterns about PR size, test coverage, and AI adoption.