Engram Memory SDK: Graph-Based Memory for AI Agents with Local Models

Graph Memory SDK for Local AI Models
Engram Memory SDK is an open-source graph memory system designed for AI agents that works with local models through LiteLLM integration. The core architecture separates ingestion from recall: you only need the LLM once during ingestion to extract entities and relationships, while recall operates through pure vector search, graph traversal, and scoring without requiring additional LLM calls.
Technical Details
The SDK is built with async Python and uses Neo4j as its backend database. According to the source, it averages ~735 tokens per ingestion operation and achieves 95ms recall latency. The system includes self-restructuring memory features with decay and clustering running in the background.
Setup and Installation
Installation is straightforward:
pip install engram-memory-sdkConfiguration requires a .env file with these variables:
LLM_MODEL=ollama/llama3 # or any LiteLLM-supported local model
NEO4J_URI=bolt://localhost:7687The system supports any model via LiteLLM, including local deployments through Ollama, vLLM, and text-generation-webui. The key advantage is cost efficiency: with a small local model handling extraction, ongoing recall operations have literally $0 cost since they don't consume LLM tokens.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Introducing Lean Collab: A Multi-Agent Orchestrator for Long-Running LLM Tasks
Lean Collab is an open-source orchestrator designed to manage long-running LLM tasks using coordinated, parallel sub-agents.
xAI TTS Integration for Home Assistant Built with Claude — Full Repo
A developer used Claude to build a custom Home Assistant integration for xAI's TTS API (Eve voice) with full UI config, five voices, and speech tags.

Wisepanel MCP Server Enables Multi-LLM Deliberation in Claude Code and Cursor
Wisepanel released an MCP server that runs multi-agent deliberations directly from Claude Code, Cursor, or any MCP client, using a divergent context enhancement system with ChatGPT, Claude, Gemini, and Perplexity models.

Claude-ETA Plugin Adds Task Timing and Repair Loop Detection to Claude Code
Claude-ETA is a Claude Code plugin that times tasks, learns your actual velocity, and feeds real data back into Claude before it responds. It also detects repair loops by fingerprinting error content and intervenes after three identical failures.