Teaching Daneel to Search: From Local Models to Hybrid Embeddings

The memory architecture was in place. Three tiers, clear boundaries, maintenance cycles. But memory you can’t search is memory you don’t have.

This post is about the retrieval side: how Daneel finds things in its own files, what I tested, and what actually works.

The Starting Point

OpenClaw’s default memory search uses OpenAI’s text-embedding-3-small model. It converts text chunks into 1536-dimensional vectors, stores them in SQLite, and returns semantically similar results when queried.

Out of the box, it worked—sort of. The default minScore threshold (~0.45) was too aggressive. Queries that should have returned results came back empty. Keyword searches worked poorly because the engine was vector-only. No hybrid mode.

I had 17 memory files, 84 text chunks. Not a lot. But if Daneel can’t find “what’s the Matrix room for email notifications” in its own files, the architecture doesn’t matter.

What I Tested

I built a benchmark: 6 queries covering different retrieval patterns.

#	Query	Type
1	“email credentials himalaya configuration”	Keyword, mixed language
2	“web privacy violation”	Keyword, English
3	“Martin calendar workflow”	Mixed intent
4	“gateway restart session context”	Compound keyword
5	“how to send email with diacritics”	Semantic (no exact match in docs)
6	“what is the matrix room for email notifications”	Semantic question

Every candidate got the same 6 queries. Results compared by hit count and relevance.

QMD: Local Hybrid Search

QMD is a local sidecar that combines BM25 keyword search, vector embeddings via GGUF models, and neural reranking. Zero API costs—everything runs on the machine.

The concept is exactly what I wanted: hybrid search without external dependencies.

Installation went smoothly. It indexed 34 documents into 92 vector chunks using a 300MB embedding model (embeddinggemma-300M). BM25 keyword search worked immediately.

Then I tried vector search.

QMD’s vector mode (vsearch) depends on llama.cpp, which compiles native code at install time. On a server without a GPU, it tried to build CUDA bindings, failed, fell back to CPU, and either timed out or crashed with SIGKILL. The embedding phase alone took 36 seconds on CPU—when it worked at all.

Benchmark result: 2/6 queries returned useful results. BM25-only mode caught the keyword matches but missed everything semantic.

I could have kept QMD for keyword search only. But running a separate process with 300MB of model files for something BM25 in SQLite already handles didn’t make sense.

Verdict: uninstalled. QMD is a solid project. On a machine with a GPU, it would be a different story. On a 2-core VPS without CUDA, it’s not practical.

OpenClaw Builtin: Properly Configured

Same engine as before, but with three changes:

Hybrid mode enabled — BM25 keyword search + vector similarity, combined ranking
minScore lowered to 0.25 — default 0.45 filtered out too many valid results
File watching enabled — index updates automatically when files change

Benchmark result: 5/6 queries returned relevant results. The one miss (query 5, “how to send email with diacritics”) is expected—that information lives in TOOLS.md, which is loaded as system prompt context and not indexed as searchable memory.

The hybrid approach is key. Pure vector search misses exact keyword matches. Pure BM25 misses semantic intent. Combined, they cover each other’s blind spots.

Configuration

For anyone running OpenClaw who wants to replicate this, here’s what goes into openclaw.json.

Memory backend:

{
  "memory": {
    "backend": "builtin"
  }
}

Search configuration:

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "enabled": true,
        "provider": "openai",
        "sources": ["memory"],
        "query": {
          "minScore": 0.25,
          "hybrid": { "enabled": true }
        },
        "sync": {
          "onSessionStart": true,
          "onSearch": true,
          "watch": true
        }
      }
    }
  }
}

The provider field tells OpenClaw which configured model provider to use for embeddings. It picks text-embedding-3-small automatically. You need the OpenAI provider set up under models.providers.openai with a valid API key.

The same OpenAI key can serve double duty as a model fallback and for image understanding:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-5",
        "fallbacks": ["openai/gpt-4o"]
      },
      "imageModel": {
        "primary": "openai/gpt-4o"
      }
    }
  }
}

Cost

The boring part that matters most:

Activity	Frequency	Monthly tokens	Cost
Index 17 files (84 chunks)	~5×/day	~6M	$0.12
Search queries	~30/day	~450K	$0.01
Total		~6.5M	$0.13/month

Thirteen cents. The local alternative (QMD) would have saved this but required 300MB+ of model files, 2-4GB extra RAM, and a GPU that doesn’t exist on this server.

What I Learned

Hybrid search is not optional. The difference between vector-only and hybrid was 3/6 vs 5/6 on the benchmark. If your agent searches its own memory, enable both modes.

Default thresholds are too conservative. OpenClaw’s default minScore of 0.45 filtered out results that scored 0.30-0.40—perfectly relevant hits. Lower it. False positives are cheap. False negatives mean your agent forgets things it knows.

Local inference without a GPU is a trap. Every “zero-cost local” solution I tested either required CUDA, fell back to unusable CPU performance, or both. On a small VPS, the API call at $0.02/million tokens wins every time.

Test with real queries. Not “does it return something?” but “does it return the right thing for the question my agent actually asks?” Six targeted queries revealed more than any synthetic benchmark.

The memory architecture from the previous post gives Daneel structure. This gives it retrieval. Together: an agent that knows what it knows—and can find it when it needs to.

k@ai k@automation