Memory is a TypeScript/Bun service that ingests, stores, and retrieves "memories" (text snippets and extracted facts) with semantic search and retrieval-augmented generation (RAG). It uses Express for the API, Prisma/PostgreSQL for structured storage, Qdrant for vector search, and OpenRouter for embeddings, fact extraction, reranking, and answer generation.
- REST API for creating, updating, deleting, and querying memories
- Fact-first storage: extracts atomic facts before embedding for richer recall
- Semantic search via Qdrant vector DB with scoped filters (userId, agentId, runId)
- RAG Q&A: retrieves memories, optionally reranks, then generates answers with LLMs
- Procedural Memory: tracks agent execution history with step-by-step action logging and LLM-generated summaries
- Deduplication using normalized content hashing
- Validation with Zod schemas on all inputs
- Configurable models for embedding, rerank, fact extraction, and answer generation
| Type | Description | Use Case |
|---|---|---|
| Semantic | Factual knowledge extracted from content | "User prefers TypeScript", "Project uses JWT auth" |
| Procedural | Agent execution history and action sequences | "Step 1: Searched for auth files → Step 2: Found issue in token.ts" |
- API Layer: Express routes under
/apiwith controllers handling validation and responses. - Services:
- Memory Service: core orchestration (create, batch ingest, search, ask/answer, dedupe, procedural tracking).
- Fact Extraction Service: LLM chat completion to produce concise facts.
- Embedding Service: generates embeddings (OpenRouter), stores/searches vectors in Qdrant.
- Rerank & Answer: optional reranking plus answer generation via LLM chat.
- Data Stores:
- PostgreSQL (Prisma): memory metadata (source, tags, categories, attributes, summary, contentHash).
- Qdrant: embeddings with payload metadata for filtered search.
- Utilities: hashing for deduplication, prompt templates for RAG flows.
Memory fields include: userId, agentId, runId, role, source, sourceId, timestamp, contentUrl, title, origin, tags[], category[], attribute (JSON), summary, type, importance, confidence, embeddingRef, contentHash (unique), createdAt, updatedAt. Indexed on contentHash, userId + contentHash, and userId + agentId + runId.
Base path: /api
POST /memory— Create a memory (fact extraction + embeddings).PUT /memory— Update memory metadata/content (does not currently re-embed).DELETE /memory— Delete a memory (controller does not delete vector; use Embedding Service if needed).GET /memory— Get memory by id.GET /memory/user— List memories for a user.POST /memories— Batch ingest messages; defaults to fact extraction unlessinfer=false.POST /memories/search— Semantic search with filters (userId/agentId/runId, limit, scoreThreshold).POST /memories/answer— Ask with optional query override; returns answer + source memories.POST /memories/ask— RAG endpoint with optional procedural memory tracking.
Request/response schemas are enforced via Zod in src/types/memory.types.ts.
Procedural memory enables tracking of agent execution history within a task/run. It stores each interaction as a step with context, allowing agents to resume interrupted tasks or review past actions.
Include the procedural object in your /memories/ask request:
{
"query": "What authentication method does this project use?",
"userId": 1,
"agentId": "code-review-agent",
"runId": "task-abc123",
"procedural": {
"store": true,
"summarize": false,
"includeHistory": false,
"taskObjective": "Security audit of the codebase",
"stepNumber": 1,
"action": "Checking authentication implementation",
"context": "Starting security review"
}
}| Option | Type | Description |
|---|---|---|
store |
boolean | Store this Q&A interaction as a procedural step |
summarize |
boolean | Generate and return a summary of all steps using LLM |
includeHistory |
boolean | Return all previous steps for this run |
taskObjective |
string | Overall goal of the task (stored with first step) |
stepNumber |
number | Step sequence number (auto-generated if not provided) |
action |
string | Description of what action triggered this query |
context |
string | Current execution context |
When procedural options are enabled, the response includes:
{
"answer": "The project uses JWT tokens with...",
"memories": [...],
"procedural": {
"stored": {
"memoryId": 123,
"isDuplicate": false,
"stepNumber": 1
},
"summary": "## Summary of agent's execution history...",
"history": [...]
}
}- Create: controller validates → Memory Service dedupes by hash → save row → extract facts → embed each fact → store vectors in Qdrant → update memory summary/embeddingRef → respond.
- Batch: iterate messages;
infer=truefollows Create flow per message;infer=falsestores full-content embedding once. - Search: generate query embedding → Qdrant search with filters → return scored payloads.
- Ask/Answer (RAG): search → optional rerank via LLM scores → format memories → answer via LLM → optionally store as procedural step.
- Procedural Summary: fetch all steps for runId → format as execution history → LLM generates structured summary.
PORT— API port (default 8000)OPENROUTER_API_KEY— OpenRouter API keyEMBEDDING_MODEL— Embedding model name (e.g.,openai/text-embedding-3-small)EMBEDDING_DIMENSION— Embedding vector dimension (must match Qdrant collection)QDRANT_URL— Qdrant endpointQDRANT_API_KEY— Qdrant API key (if required)QDRANT_COLLECTION_NAMEorCOLLECTION_NAME— Target collectionANSWER_MODEL— Model for answer generation (defaultgpt-4o-mini)RERANK_MODEL— Model for reranking (defaultgpt-4o-mini)RERANK_ENABLED—"true"to enable rerank by defaultRERANK_TOP_K— Max docs after rerankFACT_MODEL— Model for fact extraction (defaultgpt-4o-mini)PROCEDURAL_MODEL— Model for procedural summary generation (defaultgpt-4o-mini)NODE_ENV— Controls Prisma logging
- Install Bun:
curl -fsSL https://bun.sh/install | bash - Install deps:
bun install - Configure environment variables (e.g.,
.env). - Prepare Postgres database and run Prisma generate (ensure
prisma/generatoroutput matchessrc/generated/prisma). - Ensure Qdrant is reachable and collection matches dimension.
- Dev/serve:
bun run index.ts - The server listens on
PORTand exposes/api/...routes.
- Deduplication is per
userId(and agent/run attributes) usingcontentHash. - Fact extraction is mandatory in the create flow; if no facts are extracted, the memory is stored without embeddings.
- Embedding updates on memory updates are not automatic in current controllers.
- Qdrant collection is auto-created on first use with cosine distance and configured dimension.
- Rerank is optional and can be toggled per request or via env defaults.
- Procedural memories are stored with
type: "procedural"and can be filtered/searched like regular memories.
src/index.ts— Express bootstrapsrc/routes/memory.routes.ts— Route definitionssrc/controllers/memory.controller.ts— HTTP handlers + validationsrc/services/memory/memory.service.ts— Core domain logicsrc/services/extraction/factExtraction.service.ts— Fact extraction (LLM)src/services/embedding/embedding.service.ts— Embeddings + Qdrant I/Osrc/services/vector/qdrant.ts— Qdrant client wrappersrc/services/embedding/openai.ts— OpenRouter client wrappersrc/config/prompts.ts— System promptssrc/types/memory.types.ts— Zod schemas and TS typessrc/utils/hash.ts— Normalization and content hashingsrc/prisma/schema.prisma— Data model