XMem

The Memory Layer for AI That Never Forgets

Give every AI agent and LLM interface persistent, cross-platform memory — out of the box.

Quickstart · Benchmarks · Architecture · SDKs · API Docs

The Problem

LLMs have goldfish memory. Every conversation starts from zero. Switch from ChatGPT to Claude? Context gone. Move from your IDE to a browser? Context gone. Ask about something you discussed last week? Context gone.

This isn't just annoying — it's a fundamental bottleneck for anyone building AI agents, personal assistants, or any application that needs to know its user over time.

Companies like Mem0, Zep, and others have raised tens of millions trying to solve this. XMem takes a different approach.

What XMem Does Differently

XMem is a unified memory system that sits behind every AI interface you use. It silently captures, classifies, and stores your interactions — and then surfaces the right memories at the right time, across any platform.

What makes it different:

Multi-domain memory, not a flat key-value store. XMem doesn't just dump everything into one vector database. It has specialized agents that understand the type of information — personal facts, time-based events, code context, conversation summaries, images — and routes each to purpose-built storage.
Judge-before-write architecture. Every piece of memory passes through a Judge agent that checks it against existing data and decides: add, update, delete, or skip. No duplicates. No stale data. Memory stays clean.
Works everywhere. Chrome extension for ChatGPT/Claude/Gemini/DeepSeek/Perplexity. Python/TypeScript/Go SDKs for your own agents. One memory layer, every interface.

Watch the Demo

Just type “X” on any AI platform of your choice and choose between the four modes Xmem offers to seamlessly store and search your memories, or ask questions about your repository using the Xide feature.

demo.mp4

cursor_mcp.1.mp4

Benchmarks

We tested XMem against every major memory solution on two established academic benchmarks. XMem outperforms across the board — including full-context baselines with the entire conversation history.

LongMemEval-S

The industry-standard benchmark for long-term conversational memory. Tests whether a system can recall facts, track preference changes, reason about time, and maintain context across sessions.

Category	XMem(amazon.nova-lite)	Backboard.io (GPT-4o)	Mastra (GPT-4o)	Supermemory (GPT-4o)
Single-Session Assistant	96.43	98.2	82.1	96.43
Single-Session User	97.1	97.1	98.6	97.14
Knowledge Update	91.2	93.6	85.9	88.46
Multi-Session	93.6	91.7	79.7	71.43
Temporal Reasoning	94.5	91.7	85.7	76.69
Single-Session Preference	87.0	90.0	73.3	70.0

XMem matches Backboard.io across all categories, both scoring near-perfect memory on session recall and preference tracking. XMem outperforms Mastra by 9.2 points and Supermemory by 11.8 points overall.

LoCoMo

Tests compositional reasoning over memory — can the system connect facts across conversations, reason about temporal relationships, and answer open-ended questions?

Method	Single-Hop (%)	Multi-Hop (%)	Open Domain (%)	Temporal (%)	Overall (%)
XMEM (Ours)	90.6	92.3	91.2	91.9	91.5
Zep	74.11	66.04	67.71	79.79	75.14
Memobase (v0.0.37)	70.92	46.88	77.17	85.05	75.78
Mem0g(YC 24)	65.71	47.19	75.71	58.13	68.44
Mem0(YC 24)	67.13	51.15	72.93	55.51	66.88
LangMem	62.23	47.92	71.12	23.43	58.10
OpenAI	63.79	42.92	62.29	21.71	52.90

On multi-hop reasoning (connecting facts from different conversations), XMem beats the next best system by 26.3 points. On temporal reasoning, XMem leads all competitors at 89.2%, outperforming the next closest (Memobase v0.0.37) by 4.2 points. Overall, XMem's score of 82.9 leads all systems by 7.8 points over the next best, Zep at 75.14.

How We Benchmark

Evaluation: LLM-as-Judge using Gemini with structured rubrics
Fairness: All systems tested with identical conversation histories and queries

Core Features

Chrome Extension — Memory Where You Already Work

Stop copy-pasting context between AI tools. The XMem Chrome extension brings persistent memory to ChatGPT, Claude, Gemini, DeepSeek, and Perplexity:

Live Search & Inject: As you type a prompt, XMem searches your memory in real-time and shows a floating chip. One click injects relevant context directly into your input — zero friction, no workflow change.
Background Auto-Save (Xingest): When you hit "Send", XMem asynchronously captures the conversation turn. A background queue extracts facts and summaries without touching your UI.

Intelligent Multi-Domain Classification

Not all memory is the same, and treating it that way is why other solutions underperform. XMem's Classifier Agent analyzes every piece of incoming data and routes it to the right domain:

Domain	What It Stores	Example	Storage
Profile	Permanent user facts — identity, preferences, traits	"I prefer Go over Python for backends"	Pinecone
Temporal	Time-anchored events with date resolution	"I got promoted to Staff Engineer yesterday"	Neo4j
Summary	Compressed conversation takeaways	"Discussed migration from REST to gRPC"	Pinecone
Code	Annotations, bugs, explanations tied to symbols	"This retry logic has a race condition"	Neo4j + Pinecone
Snippet	Personal code patterns and utilities	"Here's my standard error handler in Go"	Pinecone
Image	Visual observations and descriptions	Screenshot of architecture diagram	Pinecone

Agentic Ingestion Pipeline

Every conversation turn flows through a 7-stage LangGraph pipeline:

Input → Classify → Extract (parallel) → Judge → Weave → Store

Classifier routes input to the relevant domains
Domain Agents (Profiler, Temporal, Summarizer, Code, Snippet, Image) extract structured data in parallel
Judge Agent compares each extraction against existing memory and decides: ADD, UPDATE, DELETE, or NOOP
Weaver deterministically executes the Judge's decisions across all storage backends

This means XMem doesn't just append — it maintains memory. Tell it you switched from Python to Go? The Judge updates your profile. Mention a meeting got rescheduled? The temporal record is corrected, not duplicated.

Two-Step Agentic Retrieval

When you query XMem, retrieval is not a simple vector search. The LLM itself decides what to look up:

Tool Selection: The retrieval LLM analyzes your query and calls the appropriate search tools — SearchProfile, SearchTemporal, SearchSummary, SearchSnippet — potentially multiple in parallel
Synthesis: Results from all search tools are aggregated and the LLM generates a cited answer with source references

This means asking "What's my preferred tech stack and when did I last refactor the auth module?" triggers both a profile lookup and a temporal search — automatically.

Code Scanner (XIDE)

XMem can index entire Git repositories and build a queryable knowledge graph of your codebase:

AST Parsing: Deterministic parsing (no LLM needed) for Python, TypeScript, and JavaScript. Extracts functions, classes, methods, imports, and call graphs.
Incremental Scanning: Uses git diff to only re-process changed files
Knowledge Graph: Builds a Neo4j graph with IMPORTS, CALLS, and ANNOTATES relationships between symbols
Chat With Your Code: Stream-based chat interface that retrieves relevant code context from your indexed repos

Multi-LLM Orchestration with Fallback

XMem isn't locked to one provider. It orchestrates across Gemini, Claude, OpenAI, OpenRouter, and Amazon Bedrock with automatic failover:

gemini → claude → openai → bedrock

If your primary LLM API rate-limits or goes down, XMem silently falls back to the next provider. Your memory pipeline never breaks. Each agent can even be pinned to a specific model — use Gemini for classification but Claude for retrieval synthesis.

Multi-Storage Backend

Each memory domain maps to the storage engine best suited for it:

Engine	Purpose	Used For
Pinecone	High-speed vector similarity search	Profiles, summaries, snippets, code symbols
Neo4j	Graph traversal + temporal reasoning	Events, code knowledge graph, annotations
MongoDB	Raw document storage	Scanned code, file metadata, scan state

Quickstart

1. Start the XMem Server

git clone https://github.com/XortexLabs/xmem.git
cd xmem

# Install (requires Python 3.11+)
pip install -e .

# Configure environment
cp .env.example .env  # Add your API keys

# Start
uvicorn src.api.app:create_app --factory --host 0.0.0.0 --port 8000

Minimum .env configuration:

# =============================================================================
# Amazon Bedrock LLM Configuration
# =============================================================================
AWS_ACCESS_KEY_ID=your_aws_access_key_id_here
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here
BEDROCK_REGION=us-east-1
BEDROCK_MODEL=amazon.nova-2-lite-v1:0

# =============================================================================
# Core Settings
# =============================================================================
TEMPERATURE=0.3
FALLBACK_ORDER='["bedrock"]'

# =============================================================================
# Vector Store Configuration (Pinecone)
# =============================================================================
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_INDEX_NAME=your_pinecone_index_name_here
PINECONE_NAMESPACE=default
PINECONE_DIMENSION=384
PINECONE_METRIC=cosine
PINECONE_CLOUD=aws
PINECONE_REGION=us-east-1
EMBEDDING_MODEL=amazon.nova-2-multimodal-embeddings-v1:0

# =============================================================================
# Database Configuration
# =============================================================================
# MongoDB (for code files)
MONGODB_URI=your_mongodb_uri_here
MONGODB_DATABASE=xmem

# Neo4j (for graph-based temporal/relational memory)
NEO4J_URI=your_neo4j_uri_here
NEO4J_USERNAME=your_neo4j_username_here
NEO4J_PASSWORD=your_neo4j_password_here

# =============================================================================
# API Configuration
# =============================================================================
API_HOST=0.0.0.0
API_PORT=8000
CORS_ORIGINS='["http://localhost:3000", "http://localhost:5173"]'
RATE_LIMIT=60

# =============================================================================
# Logging Configuration
# =============================================================================
LOG_LEVEL=INFO
LOG_FORMAT=json
LOG_FILE_PATH=logs/xmem.log

# =============================================================================
# Observability (Opik)
# =============================================================================
OPIK_API_KEY=your_opik_api_key_here
OPIK_WORKSPACE=your_opik_workspace_here
OPIK_PROJECT=your_opik_project_here

2. Install the Chrome Extension

git clone https://github.com/XortexAI/xmem-extension.git
npm install && npm run build

Load dist/ in Chrome via chrome://extensions → "Load unpacked". Point it to your server URL.

extension-settings-1773462173011.mp4

3. Use the SDKs

Every SDK exposes the same three core operations: ingest, retrieve, and search.

Python SDK (`client/xmem`)

from xmem import XMemClient

client = XMemClient(api_url="http://localhost:8000")

# Ingest a conversation turn
client.ingest(
    user_query="I switched from Python to Go for all new backend services.",
    agent_response="That's a solid choice for performance-critical services.",
    user_id="dev_42"
)

# Retrieve with LLM-generated answer
result = client.retrieve(
    query="What language do I prefer for backends?",
    user_id="dev_42"
)
print(result.answer)  # "You prefer Go for backend services..."

# Raw search (no LLM generation)
hits = client.search(
    query="backend architecture decisions",
    domains=["profile", "summary"],
    user_id="dev_42"
)

TypeScript SDK (`@xmem/sdk`)

import { XMemClient } from "@xmem/sdk";

const client = new XMemClient("http://localhost:8000");

const hits = await client.search({
  query: "python backend architecture",
  domains: ["code", "summary"],
  user_id: "dev_42"
});

Go SDK (`github.com/xmem/sdk-go`)

client := xmem.NewClient("http://localhost:8000", "")

answer, _ := client.Retrieve(xmem.RetrieveParams{
    Query:  "Did I ever mention my dog?",
    UserID: "dev_42",
})

Architecture

XMem is built as a pipeline of specialized AI agents coordinated by LangGraph, backed by three purpose-built storage engines.

Ingestion Flow

User Input (SDK / Chrome Extension / API)
         │
         ▼
   ┌─────────────┐
   │  Classifier  │ ── Analyzes text, routes to domains
   └──────┬──────┘
          │
    ┌─────┼─────┬──────┬─────────┐
    ▼     ▼     ▼      ▼         ▼
 Profile Temporal Summary Code  Snippet   ◄── Domain agents extract
 Agent   Agent   Agent  Agent   Agent        structured data in parallel
    │     │      │      │        │
    ▼     ▼      ▼      ▼        ▼
   ┌─────────────────────────────────┐
   │          Judge Agent            │ ── Compares against existing memory
   │   (ADD / UPDATE / DELETE / NOOP)│    Prevents duplicates & staleness
   └──────────────┬──────────────────┘
                  │
                  ▼
   ┌─────────────────────────────────┐
   │            Weaver               │ ── Deterministic executor
   │  Pinecone │ Neo4j │ MongoDB    │    Writes to the right backends
   └─────────────────────────────────┘

High-effort mode automatically splits long inputs into overlapping chunks (~200 tokens) and processes them in parallel, then merges results — ensuring nothing is lost in lengthy conversations.

Retrieval Flow

User Query
    │
    ▼
┌──────────────────────────────────┐
│       Retrieval LLM              │
│  Decides which tools to call:    │
│  SearchProfile, SearchTemporal,  │
│  SearchSummary, SearchSnippet    │
└──────────────┬───────────────────┘
               │
    ┌──────────┼──────────┐
    ▼          ▼          ▼
 Pinecone    Neo4j    Pinecone     ◄── Parallel search execution
 (profiles)  (events)  (summaries)
    │          │          │
    └──────────┼──────────┘
               ▼
┌──────────────────────────────────┐
│   Answer Synthesis + Citations   │ ── LLM generates answer with sources
└──────────────────────────────────┘

Resilience

Every LLM call in the pipeline passes through the Model Registry. If a provider fails or rate-limits, the request is automatically rerouted to the next provider in the fallback chain. No data loss. No downtime.

Configuration

XMem is highly configurable. Override any agent's model, tune the fallback chain, or adjust quality/speed tradeoffs.

Setting	Default	Description
`DEFAULT_MODEL_MODE`	`gemini-2.5-flash-lite`	Default LLM for all agents
`FALLBACK_ORDER`	`openrouter,gemini,claude,openai`	Provider failover sequence
`CLASSIFIER_MODEL`	—	Override model for classifier agent
`JUDGE_MODEL`	—	Override model for judge agent
`RETRIEVAL_MODEL`	—	Override model for retrieval synthesis
`PINECONE_DIMENSION`	`768`	Embedding vector dimension
`EMBEDDING_MODEL`	`gemini-embedding-001`	Text embedding model
`RATE_LIMIT`	`60`	API requests per minute
`TEMPERATURE`	`0.4`	LLM generation temperature

See docs/configuration.md for the full reference.

Docker

docker build -t xmem .
docker run -p 8000:8000 --env-file .env xmem

Or with Docker Compose for the full stack (XMem + Neo4j + MongoDB):

cd docker && docker-compose up

Project Structure

xmem/
├── src/
│   ├── agents/        # Classifier, Profiler, Temporal, Summarizer,
│   │                  # Judge, Weaver, Code, Snippet, Image agents
│   ├── pipelines/     # LangGraph ingestion & retrieval workflows
│   ├── api/           # FastAPI routes, middleware, rate limiting
│   ├── storage/       # Pinecone vector store client
│   ├── graph/         # Neo4j graph client + schema definitions
│   ├── scanner/       # Git ops, AST parser, incremental indexer
│   ├── models/        # Multi-LLM registry + provider builders
│   ├── schemas/       # Pydantic models for all memory domains
│   ├── config/        # Settings, effort levels, constants
│   └── prompts/       # System prompts for each agent
├── tests/             # Unit, integration, and E2E tests
├── benchmarks/        # LongMemEval + LoCoMo evaluation suite
├── frontend/          # Ingestion/retrieval visualization UI
├── docker/            # Docker Compose for full stack
└── pyproject.toml

Contributing

star.mp4

We welcome contributions:

# Setup dev environment
pip install -e ".[dev]"

# Run tests
GEMINI_API_KEY=dummy pytest

# Lint
ruff check src/

PRs for new IDE extensions (VSCode, JetBrains), additional language support in the AST parser, and new storage backends are especially welcome.

Forget forgetting. Build with XMem.

Get Started · Report Bug · Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github		.github
docker		docker
docs		docs
src		src
.env.example		.env.example
.gitignore		.gitignore
Aptfile		Aptfile
Dockerfile		Dockerfile
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
architecture.png		architecture.png
demo.mp4		demo.mp4
pyproject.toml		pyproject.toml
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

XMem

The Problem

What XMem Does Differently

Watch the Demo

Benchmarks

LongMemEval-S

LoCoMo

How We Benchmark

Core Features

Chrome Extension — Memory Where You Already Work

Intelligent Multi-Domain Classification

Agentic Ingestion Pipeline

Two-Step Agentic Retrieval

Code Scanner (XIDE)

Multi-LLM Orchestration with Fallback

Multi-Storage Backend

Quickstart

1. Start the XMem Server

2. Install the Chrome Extension

3. Use the SDKs

Python SDK (client/xmem)

TypeScript SDK (@xmem/sdk)

Go SDK (github.com/xmem/sdk-go)

Architecture

Ingestion Flow

Retrieval Flow

Resilience

Configuration

Docker

Project Structure

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Python SDK (`client/xmem`)

TypeScript SDK (`@xmem/sdk`)

Go SDK (`github.com/xmem/sdk-go`)

Packages