Building a Discord Cat Monitoring Bot with ESP32-S3, MiniClaw, and Multimodal AI

Edge Agent Setup for Cat Monitoring
A developer created a Discord bot that monitors their cat using an ESP32-S3 Sense as an edge agent. The system captures photos or records audio when triggered via Discord mentions, then sends the media to a multimodal LLM for analysis.
Hardware and Software Stack
The implementation uses specific components:
- Hardware: XIAO ESP32-S3 Sense (Vision version) - small enough to hide in a cat tree
- Communication: Web UI + WebSocket setup for low-latency debugging
- AI Model: Zhipu AI's VLM-4V multimodal model
- Platform: Discord for bot interaction
How It Works
The workflow is straightforward: when someone @mentions the bot on Discord, the ESP32-S3 either snaps a photo or records audio. This media gets sent to the VLM (Vision-Language Model), which analyzes it and returns natural language descriptions of what's happening. Instead of getting "Motion Detected" spam, users receive specific descriptions like "Your cat is sleeping on the couch" or "Cat is playing with a toy."
Current Limitations and Future Plans
The developer identified several areas for improvement:
- Image Quality: Current captures are "pretty blurry" and "mediocre" but functional
- Fixed Position: The device has a fixed POV - considering adding mobility via servo brackets or rover mechanics
- Audio Intelligence: Planning to add vocalization classification to distinguish between hungry meows, zoomies, or general yelling
The developer notes the implementation was "surprisingly straightforward" and works better than expected, with the VLM analysis being "surprisingly spot-on" despite the blurry image quality.
📖 Read the full source: r/openclaw
👀 See Also

Building an AI Cortex with Claude Code: Architecture and Context Library Insights
A developer built a platform where Claude writes, reviews, and auto-merges code, with the key insight being a structured context library that compounds over time. After six weeks, the AI reportedly knows the company better than a new hire after a year.

Non-coder builds full prospecting stack with Claude Code and APIs
A Reddit user with zero coding experience built a complete outbound prospecting system in a weekend using Claude Code, Crustdata for company/people search, FullEnrich for contact enrichment, and Instantly for sending.

Using Claude as a Creative Director in a Sticker Generation Pipeline
A developer built a sticker app where Claude analyzes user-uploaded photos, generates nine sticker concepts, and writes detailed prompts for image models, resulting in personalized stickers rather than generic ones.

Automating a Daily AI News Podcast with Claude Code and Three AI Agents
A developer built a fully automated podcast pipeline using Claude Code to orchestrate three specialized AI agents that curate AI news, write narration scripts, fact-check content, and generate audio with voice cloning. The system publishes daily episodes with minimal manual intervention.