OpenClaw Memory

Your memory,
on your machine.

OpenClaw's default memory search sends your queries to OpenAI for embedding. QMD does the same job locally, using two small models on your own hardware. Nothing gets sent anywhere.

Wise owl on a circuit board

Tell your Claw to set it up

Send this to your OpenClaw agent and it'll handle the rest. The first embed takes a while (downloading models + processing), but after that it's automatic.

Click to copy
Set up QMD as my local memory search backend. Here's what to do:

1. Install QMD: `bun install -g https://github.com/tobi/qmd` (requires bun and sqlite — `brew install oven-sh/bun/bun sqlite` if needed)

2. Find the qmd binary path (likely `~/.bun/bin/qmd`) and update my openclaw.json:
```json
"memory": {
  "backend": "qmd",
  "qmd": {
    "command": "/path/to/qmd",
    "sessions": { "enabled": true },
    "update": { "embedInterval": "30m" }
  }
}
```

3. Restart the gateway: `openclaw gateway restart`

The gateway will auto-create collections for MEMORY.md, memory/*.md, and session transcripts. First run downloads ~2GB of local models and embeds everything — this can take 30-60 min depending on how many sessions you have. After that, keyword index refreshes every 5 min and embeddings every 30 min.

If anything fails, OpenClaw falls back to built-in search automatically. Confirm it's working by running `memory_search` on something from a past conversation.
Two owls — cloud vs local

Two ways to search the same files

Both read from MEMORY.md and memory/*.md. The difference is where the search happens.

Default · Built-in

Original Memory

  • Sends each query to OpenAI's API for embedding (text-embedding-3-small by default)
  • Matches against pre-embedded chunks using cosine similarity
  • Optional BM25 hybrid search for keyword matching
  • Needs an API key (OpenAI, Gemini, or Voyage)
  • Your memory content goes to external servers for embedding
  • Fast, cheap, works everywhere. But cloud-dependent.
Experimental · Local-first

QMD Memory

  • A local 1.7B LLM rewrites your query into multiple search variants before searching
  • Searches both keywords (BM25) and vectors at the same time
  • A second local model (0.6B Qwen3) re-scores results for relevance
  • No API key needed. No cloud calls at all.
  • Your data never leaves the machine
  • ~2GB of disk for the two models, downloaded once
Owl librarian pulling books

What happens when you search

Stage 01

Query expansion

A local 1.7B LLM takes your query and rewrites it several ways: keyword variants, semantic rephrases, and a hypothetical document that would answer your question. One search becomes five.

Stage 02

Hybrid retrieval

Each variant runs against both a keyword index (BM25) and a vector index at the same time. Results are pooled and deduplicated.

Stage 03

Reranking

A 0.6B Qwen3 model reads every candidate and scores it for actual relevance. The top results come back with confidence scores.

Smug owl on zero coins

What it costs

~1.9GB
Disk space for both models (downloaded once)
$0.00
Per search. It's all local.
0
Bytes sent to the cloud
1.7B
Parameters in the query expansion model
0.6B
Parameters in the reranker (Qwen3)
5m
How often the index refreshes
Owl protecting data in nest

Where your data actually goes

Original

Your query

→ OpenAI API

Embedding returned

Cosine match locally

Results
vs

QMD

Your query

→ Local LLM (expand)

→ Local index (search)

→ Local LLM (rerank)

Results
Owl with hard hat and wrench

Three commands, one config line

# Install dependencies
brew install oven-sh/bun/bun sqlite
bun install -g https://github.com/tobi/qmd
openclaw.json
{
  "memory": {
    "backend": "qmd",
    "qmd": {
      "command": "/Users/you/.bun/bin/qmd",  // if not on system PATH
      "sessions": {
        "enabled": true  // optional: index chat history too
      }
    }
  }
}

The first search is slow because it downloads two GGUF models (~1.9GB). After that, everything runs from cache. If QMD fails for any reason, OpenClaw falls back to the built-in vector search automatically.

Same Markdown. Better search.
Your hardware.

Memory files don't change. QMD just replaces the cloud embedding call with two small models running on your own machine.

memory.backend = "qmd"