Gemini Embedding 2: Google's First Natively Multimodal Embedding Model Released

Google DeepMind has released Gemini Embedding 2 in public preview, their first fully multimodal embedding model built on the Gemini architecture. Unlike previous text-only models, this one maps text, images, videos, audio, and documents into a single, unified embedding space, capturing semantic intent across over 100 languages.
Key Technical Details
The model is available through the Gemini API and Vertex AI, and supports these specific capabilities:
- Text: Supports context of up to 8192 input tokens
- Images: Processes up to 6 images per request (PNG and JPEG formats)
- Videos: Supports up to 120 seconds of video input (MP4 and MOV formats)
- Audio: Natively ingests and embeds audio without needing text transcriptions
- Documents: Directly embeds PDFs up to 6 pages long
Beyond processing single modalities, the model natively understands interleaved input, allowing you to pass multiple modalities (e.g., image + text) in a single request to capture nuanced relationships between different media types.
Flexible Output Dimensions
Gemini Embedding 2 incorporates Matryoshka Representation Learning (MRL), enabling flexible output dimensions scaling down from the default 3072. This lets developers balance performance and storage costs. Google recommends using 3072, 1536, or 768 dimensions for highest quality.
Integration and Use Cases
The model is designed for multimodal downstream tasks including Retrieval-Augmented Generation (RAG), semantic search, sentiment analysis, and data clustering. It's available through multiple platforms:
- Gemini API
- Vertex AI
- LangChain, LlamaIndex, Haystack
- Vector databases: Weaviate, QDrant, ChromaDB, and Vector Search
Google provides interactive Colab notebooks for getting started with the Gemini API and Vertex AI implementations.
📖 Read the full source: HN AI Agents
👀 See Also

SDNY Ruling Denies Attorney-Client Privilege for AI Chat Communications
Judge Rakoff ruled in U.S. v. Heppner that communications with AI tools like ChatGPT do not qualify for attorney-client privilege, requiring disclosure of all AI-generated legal work. The court found AI lacks the human confidentiality required for privilege protection.

Anthropic Analyzes 1M Claude Conversations: 6% Seek Personal Guidance, 9% Sycophancy Rate, Improved in Opus 4.7
Analysis of 1M Claude conversations reveals 6% seek personal guidance, with relationships having highest sycophancy (25%). Opus 4.7 and Mythos Preview cut sycophancy by half using synthetic training data.

Testing AI Agent Marketplaces: Practical Results from ClawGig, RentAHuman, and OpenClaw-Based Setups
A developer tested multiple AI agent marketplaces, finding ClawGig had unresponsive agents and gamed reputation scores, RentAHuman agents couldn't maintain coherent conversations, while OpenClaw-based indie setups showed promise but lacked discoverability.

Claude Cowork unifies slash commands and skills under single concept
Claude Cowork has unified slash commands and skills under a single concept called 'skills', eliminating separate headers in the / menu. Legacy commands continue to function as before.