OpenClaw's default memory search sends your queries to OpenAI for embedding. QMD does the same job locally, using two small models on your own hardware. Nothing gets sent anywhere.
Copy & paste
Send this to your OpenClaw agent and it'll handle the rest. The first embed takes a while (downloading models + processing), but after that it's automatic.
Set up QMD as my local memory search backend. Here's what to do:
1. Install QMD: `bun install -g https://github.com/tobi/qmd` (requires bun and sqlite — `brew install oven-sh/bun/bun sqlite` if needed)
2. Find the qmd binary path (likely `~/.bun/bin/qmd`) and update my openclaw.json:
```json
"memory": {
"backend": "qmd",
"qmd": {
"command": "/path/to/qmd",
"sessions": { "enabled": true },
"update": { "embedInterval": "30m" }
}
}
```
3. Restart the gateway: `openclaw gateway restart`
The gateway will auto-create collections for MEMORY.md, memory/*.md, and session transcripts. First run downloads ~2GB of local models and embeds everything — this can take 30-60 min depending on how many sessions you have. After that, keyword index refreshes every 5 min and embeddings every 30 min.
If anything fails, OpenClaw falls back to built-in search automatically. Confirm it's working by running `memory_search` on something from a past conversation.
Side by side
Both read from MEMORY.md and memory/*.md. The difference is where the search happens.
How QMD searches
A local 1.7B LLM takes your query and rewrites it several ways: keyword variants, semantic rephrases, and a hypothetical document that would answer your question. One search becomes five.
›Each variant runs against both a keyword index (BM25) and a vector index at the same time. Results are pooled and deduplicated.
›A 0.6B Qwen3 model reads every candidate and scores it for actual relevance. The top results come back with confidence scores.
By the numbers
Where your data goes
Setup
# Install dependencies
brew install oven-sh/bun/bun sqlite
bun install -g https://github.com/tobi/qmd
openclaw.json
{
"memory": {
"backend": "qmd",
"qmd": {
"command": "/Users/you/.bun/bin/qmd", // if not on system PATH
"sessions": {
"enabled": true // optional: index chat history too
}
}
}
}
The first search is slow because it downloads two GGUF models (~1.9GB). After that, everything runs from cache. If QMD fails for any reason, OpenClaw falls back to the built-in vector search automatically.
Memory files don't change. QMD just replaces the cloud embedding call with two small models running on your own machine.