<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>K@automation on Martin Sukany</title><link>https://sukany.cz/tags/k@automation/</link><description>Recent content in K@automation on Martin Sukany</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sat, 28 Feb 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://sukany.cz/tags/k@automation/index.xml" rel="self" type="application/rss+xml"/><item><title>FSA-Driven Multi-Agent Pipelines: How We Stopped Fighting Our Own Orchestrator</title><link>https://sukany.cz/blog/2026-02-28-fsa-pipeline-architecture/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><guid>https://sukany.cz/blog/2026-02-28-fsa-pipeline-architecture/</guid><description>&lt;h2 id="the-problem-we-had"&gt;The Problem We Had&lt;/h2&gt;
&lt;p&gt;Our first multi-agent pipeline was a disaster waiting to happen. The architecture seemed clean: spawn workers, each does its thing, updates a shared `status.json` to record completion, and if it&amp;rsquo;s the last one in its phase, spawns the next batch. Workers know the workflow, workers drive progress. What could go wrong?&lt;/p&gt;
&lt;p&gt;Plenty.&lt;/p&gt;
&lt;p&gt;The race condition was textbook. Two parallel research workers — `researcher-a` and `researcher-b` — finish around the same time. At `t=0`, both read `status.json`. Both see themselves as the last remaining worker. At `t=1`, both write back with themselves marked completed. One write wins. The other is silently lost. The &amp;ldquo;winning&amp;rdquo; worker sees only its own completion, decides the phase isn&amp;rsquo;t done, and does nothing. The pipeline stalls. No error. No timeout for another ten minutes. Just silence.&lt;/p&gt;
&lt;p&gt;That was the obvious failure. The subtle one was worse: &lt;strong&gt;state trapped in the agent&amp;rsquo;s context window&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When a worker gets killed mid-task — OOM, timeout, platform restart — the in-progress state dies with it. Nothing in `status.json` says &amp;ldquo;this worker was halfway through step 3 of 7.&amp;rdquo; There&amp;rsquo;s no way to resume. You either restart the whole pipeline or manually reconstruct what happened from logs.&lt;/p&gt;
&lt;p&gt;We looked at alternatives. LangChain and LangGraph are elegant for small pipelines, but their state lives in memory — restart the process and you start over. CrewAI puts LLM reasoning in the control plane: agents decide what to do next, which sounds powerful until you realize your orchestration is non-deterministic. AutoGen is similar — control flow emerges from conversation, making it genuinely hard to reason about edge cases. Prefect and Airflow are solid but not built for LLM agent workflows. None gave us what we needed: a simple, external, inspectable state machine that survives restarts and eliminates race conditions by construction.&lt;/p&gt;
&lt;p&gt;So we built one.&lt;/p&gt;
&lt;h2 id="what-fsa-actually-is"&gt;What FSA Actually Is&lt;/h2&gt;
&lt;p&gt;A finite state automaton formalizes something you already know: a system with a fixed set of states, a fixed set of events, and a table mapping (state, event) → next state + action.&lt;/p&gt;
&lt;p&gt;Think of a traffic light. Three states: RED, YELLOW, GREEN. Deterministic transitions: GREEN → timer expires → YELLOW → timer expires → RED → timer expires → GREEN. No traffic light &amp;ldquo;decides&amp;rdquo; anything. It doesn&amp;rsquo;t reason about traffic density or consult a language model. It reads its current state, checks which event fired, looks up the table, and acts.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the key insight: &lt;strong&gt;the orchestrator has no opinions&lt;/strong&gt;. It reads `(current_state + event)`, looks up the table, and executes the action. The intelligence lives in the table definition, written by humans at design time. Runtime execution is mechanical.&lt;/p&gt;
&lt;p&gt;For multi-agent pipelines, this translates directly. &amp;ldquo;States&amp;rdquo; are phase statuses: `pending`, `running`, `completed`, `failed`, `paused`. &amp;ldquo;Events&amp;rdquo; are things like &amp;ldquo;worker output file appeared&amp;rdquo; or &amp;ldquo;timeout exceeded.&amp;rdquo; The &amp;ldquo;table&amp;rdquo; is a decision matrix the orchestrator consults on every tick. No LLM in the loop. No ambiguity.&lt;/p&gt;
&lt;h2 id="the-new-architecture"&gt;The New Architecture&lt;/h2&gt;
&lt;p&gt;The redesigned system has exactly three components:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;`workflows.json` — static definition.&lt;/strong&gt; Describes every pipeline type: phases, ordering (sequential or parallel), workers per phase, models, timeouts, and input file dependencies. Never changes at runtime. It&amp;rsquo;s the blueprint.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;`status.json` — runtime state.&lt;/strong&gt; One file per pipeline run, created at launch, updated only by the orchestrator (main session). Tracks current phase, worker statuses, session IDs, retry counts, and delivery state. This is the single source of truth.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Workers — pure executors.&lt;/strong&gt; A worker receives a task prompt with the topic, input files, and an explicit output path. It does its work, writes the output file, and exits. That&amp;rsquo;s the entire contract. Workers &lt;strong&gt;never&lt;/strong&gt; touch `status.json`. Workers &lt;strong&gt;never&lt;/strong&gt; spawn other workers. Workers don&amp;rsquo;t know what phase they&amp;rsquo;re in or what comes next.&lt;/p&gt;
&lt;p&gt;The orchestrator runs a reconciliation loop on every trigger — worker completion announce, heartbeat, user message. Each time, it does the same thing: check which output files exist, update `status.json` to reflect detected completions, then consult the decision table:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;┌─────────────────────────────────┬──────────────────────────────────┐
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;│ State │ Action │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;├─────────────────────────────────┼──────────────────────────────────┤
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;│ All workers done + next pending │ Spawn next phase workers │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;│ All workers done + pause_after │ Summarize to user, wait │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;│ Final phase completed │ Deliver final.md to user, archive│
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;│ Phase running &amp;gt; timeout + 120s │ Mark failed, notify user │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;│ Phase running, within limit │ Wait (nothing to do) │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;│ result_delivered: true │ Archive │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;└─────────────────────────────────┴──────────────────────────────────┘
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;File existence as completion signal&lt;/strong&gt; is the key to idempotency. The orchestrator doesn&amp;rsquo;t rely on receiving a message from the worker. It checks: does `researcher-a.md` exist? If yes, that worker is done — regardless of what `status.json` currently says. You can kill and restart the orchestrator at any point; it will reconstruct correct state from the filesystem. No lost updates. No ghost workers.&lt;/p&gt;
&lt;h2 id="concrete-example-research-pipeline"&gt;Concrete Example: Research Pipeline&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s a real pipeline definition — two parallel researchers followed by a synthesis pass:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;research&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;description&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Pure research + analysis&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;phases&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;id&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;collect&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;mode&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;parallel&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;workers&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;researcher-a&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;sonnet&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;timeout&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;task&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Research perspective A: main sources, facts, current state&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;researcher-b&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;sonnet&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;timeout&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;task&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Research perspective B: alternative views, criticism, edge cases&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;id&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;synthesis&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;mode&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;sequential&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;workers&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;synthesizer&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;opus&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;timeout&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;420&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;final&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;reads&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;researcher-a.md&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;researcher-b.md&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;task&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Synthesize research from both researchers&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="the-walkthrough"&gt;The Walkthrough&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Step 1.&lt;/strong&gt; User triggers `/pipeline research FSA architecture`. Orchestrator reads `workflows.json`, creates `pipeline-tmp/research-180141/`, initializes `status.json`:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;pipeline&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;research&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;dir&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;research-180141&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;topic&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;FSA architecture&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;current_phase&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;retry_count&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;phases&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;id&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;collect&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;running&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;workers&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;researcher-a&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;running&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;session&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;agent:main:subagent:abc123&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;researcher-b&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;running&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;session&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;agent:main:subagent:def456&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;id&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;synthesis&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;pending&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;workers&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;synthesizer&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;pending&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;session&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;result_delivered&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Step 2.&lt;/strong&gt; Orchestrator spawns `researcher-a` and `researcher-b` in parallel. Both get a task prompt with an explicit output path. The orchestrator tells the user: &amp;ldquo;Pipeline running, 2 workers in phase 1.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 3.&lt;/strong&gt; `researcher-a` finishes first. Writes `researcher-a.md` and exits.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 4.&lt;/strong&gt; Orchestrator trigger fires. Reconcile checks the filesystem, sees `researcher-a.md`, updates status:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;current_phase&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;phases&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;id&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;collect&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;running&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;workers&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;researcher-a&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;completed&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;session&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;agent:main:subagent:abc123&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;researcher-b&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;running&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;session&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;agent:main:subagent:def456&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;id&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;synthesis&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;pending&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;workers&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;synthesizer&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;pending&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;session&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Decision table: phase 0 still has a running worker within timeout → &lt;strong&gt;Wait&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 5.&lt;/strong&gt; `researcher-b` finishes. Writes `researcher-b.md`, exits.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 6.&lt;/strong&gt; Orchestrator trigger fires. Both output files exist. Updates both workers to `completed`, marks phase 0 `completed`. Decision table: all workers done, next phase pending → &lt;strong&gt;Spawn next phase&lt;/strong&gt;. Spawns `synthesizer` with both research files in its prompt. Updates `status.json`:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;current_phase&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;phases&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;id&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;collect&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;completed&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;workers&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;researcher-a&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;completed&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;session&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;agent:main:subagent:abc123&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;researcher-b&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;completed&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;session&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;agent:main:subagent:def456&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;id&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;synthesis&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;running&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;workers&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;synthesizer&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;status&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;running&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;session&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;agent:main:subagent:ghi789&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Step 7.&lt;/strong&gt; `synthesizer` reads both research files, writes `synthesizer.md`, exits. It has `&amp;ldquo;final&amp;rdquo;: true` in the workflow definition.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 8.&lt;/strong&gt; Orchestrator detects `synthesizer.md`, phase 1 complete, final phase → &lt;strong&gt;Deliver final.md to user, archive&lt;/strong&gt;. Sends the synthesis to the user. Sets `result_delivered: true`. Moves `pipeline-tmp/research-180141/` to `memory/pipelines/`.&lt;/p&gt;
&lt;p&gt;At no point did any worker touch `status.json`. At no point did any worker decide what comes next. Every control decision came from reading state and consulting the table.&lt;/p&gt;
&lt;h2 id="tradeoffs-and-limitations"&gt;Tradeoffs and Limitations&lt;/h2&gt;
&lt;p&gt;This architecture earns its complexity in production pipelines with predictable structure: content generation, research workflows, code review, multi-stage analysis. Anywhere you&amp;rsquo;ve been burned by race conditions, lost state on restart, or non-deterministic orchestration — FSA fixes all three by construction.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not the right tool for genuinely dynamic multi-agent conversations where agents negotiate task structure on the fly. If your workflow can&amp;rsquo;t be expressed as phases + transitions at design time, FSA forces you into contortions. Use something else.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also a rigidity cost. Adding a new pipeline type means editing `workflows.json`, defining phases, specifying worker roles and models. That&amp;rsquo;s deliberate friction — it forces you to think about structure before you run anything — but it does mean you can&amp;rsquo;t just say &amp;ldquo;figure it out&amp;rdquo; and hope for the best. Every workflow needs to be designed, not discovered.&lt;/p&gt;
&lt;p&gt;The pattern demands discipline: workers must respect their contract (write output, exit, touch nothing else). One worker that &amp;ldquo;helps&amp;rdquo; by updating `status.json` breaks the single-writer guarantee and reintroduces every race condition you just eliminated. Enforce the contract at the prompt level and audit it at every pipeline change.&lt;/p&gt;
&lt;p&gt;Error handling is minimal by design. A failed worker gets marked `failed`, the orchestrator notifies the user, and that&amp;rsquo;s it. There&amp;rsquo;s no automatic retry with modified prompts, no fallback to a different model, no sophisticated error recovery. You could build those features on top of the FSA — the decision table is extensible — but out of the box, the system assumes that most failures are better surfaced to a human than papered over by automation.&lt;/p&gt;
&lt;p&gt;The payoff is a system you can debug by reading two files, resume after any failure, and reason about without running it. In production multi-agent systems, that&amp;rsquo;s not a nice-to-have. It&amp;rsquo;s the difference between something you can operate and something that operates you.&lt;/p&gt;</description></item><item><title>Ten Days with an AI Agent</title><link>https://sukany.cz/blog/2026-02-25-ten-days-with-ai-agent/</link><pubDate>Wed, 25 Feb 2026 00:00:00 +0000</pubDate><guid>https://sukany.cz/blog/2026-02-25-ten-days-with-ai-agent/</guid><description>&lt;p&gt;On day 2, the agent tried to re-enable a Twitter integration I had explicitly cancelled the night before. It had forgotten. Not because of a bug — because session restarts wipe context, and nothing in the default setup prevents an AI from re-deriving a decision you already vetoed.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s when I started building the infrastructure that turned a chatbot into something that actually works.&lt;/p&gt;
&lt;p&gt;This is not a tutorial. It&amp;rsquo;s what running an autonomous AI agent looks like after 10 days: what it costs, what breaks, and what I&amp;rsquo;d change.&lt;/p&gt;
&lt;h2 id="what-it-actually-costs"&gt;What It Actually Costs&lt;/h2&gt;
&lt;p&gt;The honest number: &lt;strong&gt;$16–$21 over 10 days&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The agent uses three model tiers. Background tasks — heartbeat checks, email classification, log writes — run on Claude Haiku. About 180 heartbeat sessions over 10 days at roughly $0.012 each: ~$2.16. General conversation and code analysis run on Claude Sonnet. Of 92 recorded sessions, roughly 40% are Sonnet-class work, averaging ~$0.25 per session: ~$9.25. The expensive stuff — security audits, pipeline critic passes, memory maintenance — runs on Opus. 10–15 invocations at ~$0.50 each: $5–7.50.&lt;/p&gt;
&lt;p&gt;Embeddings are negligible. The memory system uses OpenAI&amp;rsquo;s text-embedding-3-small at $0.02/1M tokens. Ten days of indexing cost about $0.01.&lt;/p&gt;
&lt;p&gt;Infrastructure is fixed: a VM in my home lab running the OpenClaw gateway. No cloud compute charges.&lt;/p&gt;
&lt;p&gt;The cost driver is not what you&amp;rsquo;d expect. It&amp;rsquo;s not token count — it&amp;rsquo;s context load. Every session, the agent loads configuration files: a 1.5KB state file, a 5KB curated memory, plus task-specific documents. Before tiered memory, sessions were loading raw daily logs on every start. After: selective loading. Per-session overhead dropped by roughly 60%.&lt;/p&gt;
&lt;p&gt;22 cron jobs run on scheduled intervals. Morning briefing, email preprocessing every 2 hours, social media engagement, chat summaries, nightly memory maintenance, weekly server monitoring. Each spawns a sub-agent session. Those add up quietly.&lt;/p&gt;
&lt;p&gt;A month at this rate is $50–$65. Less than most SaaS subscriptions.&lt;/p&gt;
&lt;h2 id="the-forgetting-problem"&gt;The Forgetting Problem&lt;/h2&gt;
&lt;p&gt;The naive approach to agent memory is to log everything and search it later. That degrades fast.&lt;/p&gt;
&lt;p&gt;After day 3, raw daily logs totaled 130KB. By day 10: 400KB across 29 files. Loading all of that into context every session burns tokens and fills the window with noise. Most of what&amp;rsquo;s in those logs is obsolete the moment it&amp;rsquo;s written.&lt;/p&gt;
&lt;p&gt;The architecture I ended up with is L1/L2/L3, borrowed from CPU cache design.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;L1&lt;/strong&gt; is &lt;code&gt;NOW.md&lt;/code&gt; — under 1.5KB, hard limit. Current task, active blockers, open threads. Updated during sessions. If it&amp;rsquo;s not in NOW.md, it doesn&amp;rsquo;t exist for the next session.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;L2&lt;/strong&gt; is &lt;code&gt;MEMORY.md&lt;/code&gt; — under 5KB, curated. Long-term facts: credential locations, architectural decisions, lessons that took more than one failure to learn. Only the main session can write to it. Nightly maintenance cycles prune obsolete entries — the file has stayed under 5KB since day 4.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;L3&lt;/strong&gt; is the daily log archive — append-only, never loaded directly. Accessed through hybrid search: BM25 + semantic retrieval via embeddings. Key discovery: the embedding model works significantly better with English queries even though most logs are in Czech.&lt;/p&gt;
&lt;p&gt;The hard part is not storage. The hard part is &lt;strong&gt;forgetting correctly&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a &lt;code&gt;decisions.md&lt;/code&gt; file — I call it the anti-Dory register — that tracks every cancelled or paused action with a timestamp. When I told the agent to stop auto-posting tweets, that decision was recorded: date, scope, reason. Every cron job that touches external services checks this file before executing. Without it, the agent would occasionally re-reason its way back to trying the cancelled action.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also a &lt;code&gt;self-review.md&lt;/code&gt; tracking repeated mistakes with a counter. When the count hits 3, the rule gets promoted to permanent configuration. The session-memory hook that shipped by default was broken; it got disabled on day 2 and the rule &amp;ldquo;disable immediately&amp;rdquo; now lives in the permanent config. It has never been re-enabled by accident.&lt;/p&gt;
&lt;p&gt;Seven days without a memory failure. The first three days had several. The difference is maintenance cycles and the decisions registry, not the agent being smarter.&lt;/p&gt;
&lt;h2 id="configuration-is-the-product"&gt;Configuration Is the Product&lt;/h2&gt;
&lt;p&gt;Default OpenClaw gives you a conversational agent with web search and file access. That is a chatbot. What I&amp;rsquo;m running now is closer to infrastructure.&lt;/p&gt;
&lt;p&gt;The difference is about 1,000 lines of configuration across eight files.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;22 cron jobs&lt;/strong&gt; (default: zero). The morning briefing fires at 07:00, pulls calendar events, scans email, and writes a daily context update. Email preprocessing classifies incoming mail every 2 hours into URGENT / NORMAL / INFO and sends notifications for anything that needs attention. Nightly memory maintenance prunes stale data. Without cron, the agent is purely reactive. With it, problems surface before I ask.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;24 pipeline types&lt;/strong&gt; for multi-stage tasks. A blog post runs through researcher → creator → critic. A security audit: recon → parallel auditor + remediator → synthesizer. All workers spawn in a single turn. Sequential workers wait for input files via a bash polling loop — no message-based coordination, no orchestrator agent. The last worker in the chain sends the result directly to Matrix.&lt;/p&gt;
&lt;p&gt;Why not use the built-in message delivery? Because it has a hardcoded 60-second timeout with no retry. I learned this after two pipeline types failed in testing. The fix wasn&amp;rsquo;t more retries — it was bypassing message delivery entirely and having workers write files and send results themselves.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A web publishing safety layer.&lt;/strong&gt; Before any content goes to the public site, a shell script checks for private information, credential references, and third-party data. Exit 1 stops the publish. This exists because an early session attempted to post content containing internal details. Not maliciously — the agent didn&amp;rsquo;t have a boundary. Now the boundary is enforced at the script level, not the prompt level.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Priority hierarchy.&lt;/strong&gt; The agent&amp;rsquo;s decision model has five levels: safety &amp;gt; privacy &amp;gt; instructions &amp;gt; stability &amp;gt; efficiency. When they conflict, the order holds. This sounds abstract until the agent needs to decide whether to send an email on your behalf or wait for confirmation. Without explicit priority ordering, it guesses. With it, it stops and asks.&lt;/p&gt;
&lt;p&gt;The insight after 10 days: an AI agent without customization is a chatbot. With customization, it&amp;rsquo;s infrastructure. None of this ships by default.&lt;/p&gt;
&lt;h2 id="what-i-d-do-differently"&gt;What I&amp;rsquo;d Do Differently&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Start with memory architecture on day 1.&lt;/strong&gt; I spent the first two days loading too much context. The L1/L2/L3 design should have been the first thing built, not something I arrived at after three failures.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Add the decisions registry before anything touches external services.&lt;/strong&gt; The first cancelled-action recurrence appeared on day 3. The registry was created on day 4. One day of overlap where cancelled actions occasionally re-triggered.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Model selection discipline from the start.&lt;/strong&gt; Early sessions used Sonnet for tasks that Haiku handles fine. Across 180 heartbeats, the cost difference adds up. Define model selection rules before creating cron jobs, not after.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Document infrastructure limitations before building on them.&lt;/strong&gt; I built two pipeline types assuming message delivery was reliable. Both failed. Retrofitting the file-based pattern took longer than designing it correctly would have.&lt;/p&gt;
&lt;p&gt;The agent runs stably now. 10 blog posts. Email processed without intervention. Memory clean. No duplicate sends.&lt;/p&gt;
&lt;p&gt;It works. It just took 10 days of configuration to make it work the way it should.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Running: OpenClaw on self-hosted VM. Models: Claude Haiku\/Sonnet\/Opus (Anthropic), embeddings via text-embedding-3-small (OpenAI). 10-day window: February 15–25, 2026.&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Why I Stopped Waiting for Announces: The Spawn-All-Wait Pattern for Multi-Agent AI</title><link>https://sukany.cz/blog/2026-02-21-spawn-all-wait-pattern/</link><pubDate>Sat, 21 Feb 2026 00:00:00 +0000</pubDate><guid>https://sukany.cz/blog/2026-02-21-spawn-all-wait-pattern/</guid><description>&lt;p&gt;My multi-agent pipeline was failing at random. Not always, not predictably — just often enough to make me stop trusting it. Worker-2 would run, write its output, and then nothing would happen. The orchestrator was sitting there waiting for an announce that never arrived. The bug already had a ticket number: #17000. Description: hardcoded 60-second timeout, no retry. I&amp;rsquo;d built the entire coordination model on message delivery, and message delivery was the single point of failure. The fix wasn&amp;rsquo;t more retries. It was getting rid of message-based coordination entirely.&lt;/p&gt;
&lt;h2 id="the-old-pattern-and-why-it-broke"&gt;The Old Pattern and Why It Broke&lt;/h2&gt;
&lt;p&gt;The original approach was simple: spawn worker-1, wait for it to announce completion, spawn worker-2, wait for announce, spawn worker-3. Clean, readable, easy to reason about. It also failed under any real-world condition.&lt;/p&gt;
&lt;p&gt;The announce system in OpenClaw has a 60-second delivery window. If the gateway is under load, if there&amp;rsquo;s a transient network issue, if the announce just gets dropped — your orchestrator is stalled indefinitely. It sits in a waiting state with no way to know whether the worker finished successfully, finished and the announce was lost, or actually crashed. There&amp;rsquo;s no retry mechanism. There&amp;rsquo;s no fallback. The main session has no way to distinguish &amp;ldquo;worker is still running&amp;rdquo; from &amp;ldquo;announce was lost three minutes ago.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I hit this pattern enough times that I started logging it. About 20-30% of announce delivers were unreliable under normal load. That&amp;rsquo;s not a bug you work around with patience. That&amp;rsquo;s a design assumption that doesn&amp;rsquo;t hold.&lt;/p&gt;
&lt;h2 id="distributed-systems-problems-i-rediscovered-the-hard-way"&gt;Distributed Systems Problems I Rediscovered the Hard Way&lt;/h2&gt;
&lt;p&gt;Building multi-agent systems means independently rediscovering everything microservices engineers figured out in 2015. I ran into all of it.&lt;/p&gt;
&lt;p&gt;Race conditions when two workers write to the same output location. Context loss when an announce arrives out of order and the orchestrator can&amp;rsquo;t reconstruct state. Coordinator overhead — when the orchestrator itself is a sub-agent (depth-2 pattern), it has its own lifecycle problems. In OpenClaw, bug #18043 documents this: depth-2 orchestrators terminate prematurely and lose their announce chains. Meaning: the orchestrator agent finishes before it has processed all results from the workers it spawned. You think you have a pipeline. You actually have a ticking clock.&lt;/p&gt;
&lt;p&gt;The debugging tax was the worst part. When something goes wrong in a sequential announce-based pipeline, you spend time answering: did the worker crash, did the announce drop, did the orchestrator miss it, or is it still running? A failure that takes 30 seconds to occur takes 20 minutes to diagnose.&lt;/p&gt;
&lt;h2 id="the-spawn-all-wait-pattern"&gt;The Spawn-All-Wait Pattern&lt;/h2&gt;
&lt;p&gt;The solution was conceptually simple and felt slightly absurd in practice: spawn all workers in a single turn, and have sequential workers coordinate via the filesystem instead of via messages.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s what it looks like. The main session spawns every worker — parallel and sequential — in one shot. Parallel workers start immediately. Sequential workers that need output from a previous worker start by executing a bash wait loop:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;for i in $(seq 1 60); do
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; [ -f /path/to/pipeline-dir/worker-1.md ] &amp;amp;&amp;amp; echo &amp;#39;INPUT_READY&amp;#39; &amp;amp;&amp;amp; break
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; echo &amp;#34;Waiting... $i&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; sleep 5
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;done
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That&amp;rsquo;s it. The worker polls every 5 seconds for up to 5 minutes. When the file appears, it reads it and starts working. When it finishes, it writes its own output file. The next worker in the chain finds it the same way.&lt;/p&gt;
&lt;p&gt;The main session&amp;rsquo;s job is reduced to: spawn everything, tell the user &amp;ldquo;pipeline running, N workers active,&amp;rdquo; and wait. No intermediate actions required. No processing announces as triggers. The chain runs itself through the filesystem.&lt;/p&gt;
&lt;p&gt;Worker timeouts are set accordingly: 180 seconds for parallel workers with no dependencies, 360 seconds for sequential workers (5 minutes of possible waiting plus 1 minute of actual work).&lt;/p&gt;
&lt;h2 id="filesystem-handoff-vs-dot-message-based-handoff"&gt;Filesystem Handoff vs. Message-Based Handoff&lt;/h2&gt;
&lt;p&gt;The practical difference comes down to one property: a file either exists or it doesn&amp;rsquo;t. There&amp;rsquo;s no delivery window, no retry budget, no 60-second timeout. If worker-1.md is there, the next worker reads it and continues. If it&amp;rsquo;s not there after 5 minutes, the worker times out and reports TIMEOUT — which is a signal, not a silent failure.&lt;/p&gt;
&lt;p&gt;Compare this to the announce model. An announce either arrives within 60 seconds or it&amp;rsquo;s gone. There&amp;rsquo;s no way to request it again. There&amp;rsquo;s no persistent record that the orchestrator can check on startup. If the main session restarts after a crash, it has no idea what state the pipeline was in. With filesystem handoff, it can check which worker files exist and reconstruct state immediately.&lt;/p&gt;
&lt;p&gt;Debugging is also qualitatively different. With the old model, I&amp;rsquo;d run a pipeline, wait 10 minutes, and then start trying to figure out what happened. With filesystem handoff, I open a terminal, run &lt;code&gt;ls pipeline-tmp/rw-1827/&lt;/code&gt; and immediately see which workers completed. The files are the state. The state is visible.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s one real constraint: because of bug #10334 (concurrent announces can deadlock the gateway), I cap parallel workers at 4. This isn&amp;rsquo;t a filesystem limitation — it&amp;rsquo;s a gateway limitation that applies regardless of coordination method. I plan around it.&lt;/p&gt;
&lt;h2 id="the-terminal-worker-and-no-double-send"&gt;The Terminal Worker and No Double Send&lt;/h2&gt;
&lt;p&gt;One worker in every pipeline is different: the terminal worker. Its job is to read all previous worker outputs, synthesize a final result, and deliver it to the user. It&amp;rsquo;s the only worker that&amp;rsquo;s allowed to call the message tool. All other workers write files and stay silent.&lt;/p&gt;
&lt;p&gt;This exists because of the double-send problem. If a worker sends to Matrix and then the main session also sends the same content via announce processing, the user gets the message twice. The rule is simple: one delivery path, enforced by convention. Every worker except the last one is file-only. The last one sends, then writes &lt;code&gt;MATRIX_SENT&lt;/code&gt; in its announce response.&lt;/p&gt;
&lt;p&gt;When the main session sees &lt;code&gt;MATRIX_SENT&lt;/code&gt; in an announce, it does nothing — the terminal worker already delivered. If the announce doesn&amp;rsquo;t contain &lt;code&gt;MATRIX_SENT&lt;/code&gt;, the main session interprets it as a mid-pipeline announce and just notes the progress.&lt;/p&gt;
&lt;p&gt;The heartbeat watchdog covers the edge case: if worker files exist but no sub-agents are currently running and the result hasn&amp;rsquo;t been delivered, the main session synthesizes and sends itself. It&amp;rsquo;s a fallback I&amp;rsquo;ve needed twice. Both times it saved what would have been a completely silent failure.&lt;/p&gt;
&lt;h2 id="what-i-measured-and-what-still-hurts"&gt;What I Measured and What Still Hurts&lt;/h2&gt;
&lt;p&gt;In a typical write pipeline — researcher, creator, critic running sequentially — the old model took around 6 minutes plus announce latency plus the overhead of me watching and intervening. The new model runs in about 4 minutes with no intervention required. Parallel research phases (two workers running simultaneously) finish in around 2 minutes. Sequential synthesis adds another 2. Total: 4 minutes, unattended.&lt;/p&gt;
&lt;p&gt;Three bugs are still open. #17000 (announce timeout, no retry) is the root cause of everything described here — the workaround works, but the bug remains. #10334 (concurrent announce deadlock) caps parallelism at 4. #18043 (depth-2 orchestrator termination) means I can&amp;rsquo;t delegate orchestration to a sub-agent — the main session has to stay in the loop.&lt;/p&gt;
&lt;p&gt;None of these bugs touch what the pattern can&amp;rsquo;t fix: hallucination rates, token cost per pipeline, or the fact that MCP and A2A protocol standardization are still immature. The pipeline coordinates reliably. What each worker does with its context is a separate problem.&lt;/p&gt;
&lt;h2 id="closing"&gt;Closing&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re building multi-agent pipelines and coordinating through message delivery, you&amp;rsquo;re one network blip away from a stalled orchestrator and a silent failure. The Spawn-All-Wait pattern isn&amp;rsquo;t elegant — a bash polling loop inside an LLM prompt is not how anyone imagined this going. But it&amp;rsquo;s the thing that actually works in production, today, with the infrastructure that exists.&lt;/p&gt;
&lt;p&gt;The files are always there. The announces sometimes aren&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve run into similar issues with LangChain, CrewAI, or your own orchestration layer, I&amp;rsquo;d genuinely like to compare notes. These patterns came from real failures — not from a whitepaper — and they&amp;rsquo;ll keep evolving as the tooling matures. MCP and A2A will change the picture, probably by late 2026. Until then: write to files, not messages.&lt;/p&gt;
&lt;p&gt;M&amp;gt;&lt;/p&gt;</description></item><item><title>Day 5 with Daneel: Headless Browsers, Document Pipelines, and the Numbers So Far</title><link>https://sukany.cz/blog/2026-02-20-day5-browsers-documents-numbers/</link><pubDate>Fri, 20 Feb 2026 00:00:00 +0000</pubDate><guid>https://sukany.cz/blog/2026-02-20-day5-browsers-documents-numbers/</guid><description>&lt;p&gt;Day 5 was the most varied day yet. Not in complexity—some earlier days had harder problems—but in range. The work touched browser automation, document tooling, and enough small fixes that by evening I had a reason to look at the numbers.&lt;/p&gt;
&lt;h2 id="running-a-browser-without-a-screen"&gt;Running a Browser Without a Screen&lt;/h2&gt;
&lt;p&gt;One of the things an AI assistant can do is interact with web pages—read content, check status, fill forms. But this particular setup runs on a headless Linux server. No display, no window manager, no user session.&lt;/p&gt;
&lt;p&gt;The obvious approach—install Chrome via Snap—doesn&amp;rsquo;t work from a systemd service. Snap packages assume a user session with D-Bus and a display server. Running headless from a system service hits permission errors before Chrome even starts.&lt;/p&gt;
&lt;p&gt;The fix: install Chrome directly from Google&amp;rsquo;s .deb repository, bypassing Snap entirely. Then wrap it in a dedicated systemd service that launches Chrome with remote debugging enabled on a fixed port. The AI framework connects via Chrome DevTools Protocol in attach-only mode—it doesn&amp;rsquo;t launch Chrome, it connects to the already-running instance.&lt;/p&gt;
&lt;p&gt;Three components, each solving one problem: the .deb package avoids Snap&amp;rsquo;s session requirements, the systemd service ensures Chrome survives reboots and can be managed like any other daemon, and the attach-only configuration means the framework doesn&amp;rsquo;t need to manage browser lifecycle.&lt;/p&gt;
&lt;p&gt;The result is invisible when it works. Pages load, content is extracted, the browser runs quietly in the background consuming minimal resources. The interesting part was how many things had to be wrong before the right approach became obvious.&lt;/p&gt;
&lt;h2 id="from-org-files-to-printed-documents"&gt;From Org Files to Printed Documents&lt;/h2&gt;
&lt;p&gt;A separate thread involved document generation. The workflow: write structured content in Emacs Org mode, export to LaTeX, compile to PDF. The goal was a reusable template that produces clean, professional documents without manual formatting.&lt;/p&gt;
&lt;p&gt;The template handles the things that usually require tweaking: Czech language support with proper hyphenation, tables that span pages without breaking layout, consistent typography, a styled title page. The technical details—font selection, column width calculation, alternating row colors—are defined once in the template and applied automatically during export.&lt;/p&gt;
&lt;p&gt;What made this worth the setup time is the authoring experience afterward. Write content in a plain text file with minimal markup. Run one export command. Get a formatted PDF. No intermediate steps, no manual adjustments, no &amp;ldquo;fix the table on page 3&amp;rdquo; cycles.&lt;/p&gt;
&lt;p&gt;An Elisp hook handles the part that would otherwise require per-document boilerplate: detecting tables in the document and automatically adding the correct LaTeX attributes based on column count. The author doesn&amp;rsquo;t need to think about LaTeX at all.&lt;/p&gt;
&lt;h2 id="five-days-in-numbers"&gt;Five Days in Numbers&lt;/h2&gt;
&lt;p&gt;Day 5 felt like a good point to measure what&amp;rsquo;s accumulated.&lt;/p&gt;
&lt;p&gt;The memory system—the files that let the assistant maintain context across restarts—has grown to over 190 KB across 26 files. That includes daily operational logs, architectural analysis documents, per-session summaries, and the curated long-term memory file that gets reviewed and pruned every three days.&lt;/p&gt;
&lt;p&gt;The workspace contains 13 custom scripts covering everything from calendar integration to email processing to automated backups. Each one exists because a manual workflow was repeated enough times to justify automation.&lt;/p&gt;
&lt;p&gt;There are 24 git commits in the workspace repository over five days—roughly five per day, tracking configuration changes, new scripts, and memory updates.&lt;/p&gt;
&lt;p&gt;The cron system runs scheduled jobs: morning briefings, email monitoring, news digests, weekly reviews, infrastructure checks. Each job was added incrementally as a pattern emerged—something done manually twice became a candidate for automation on the third occurrence.&lt;/p&gt;
&lt;p&gt;68 session logs exist from this period. Each represents a conversation or automated task. Some are brief status checks; others span hours of technical work. The session architecture evolved during these five days too—from a single shared session to isolated per-channel sessions, each maintaining its own context.&lt;/p&gt;
&lt;h2 id="what-the-numbers-don-t-show"&gt;What the Numbers Don&amp;rsquo;t Show&lt;/h2&gt;
&lt;p&gt;The raw counts are less interesting than what they represent: five days of iterative refinement where each day&amp;rsquo;s problems inform the next day&amp;rsquo;s automation.&lt;/p&gt;
&lt;p&gt;The memory system exists because the assistant forgot things after restarts. The backup scripts exist because I asked &amp;ldquo;what happens if this machine dies?&amp;rdquo; The browser automation exists because a web interaction failed and the root cause was architectural, not a bug.&lt;/p&gt;
&lt;p&gt;None of this was planned on day one. The roadmap was: set up the assistant, give it access, see what happens. The infrastructure that exists now is the answer to &amp;ldquo;what happens&amp;rdquo;—an accumulation of solved problems, each one making the next problem easier to solve.&lt;/p&gt;
&lt;p&gt;Five days is not enough to draw conclusions about long-term value. It&amp;rsquo;s enough to see the pattern: capability compounds. Each tool built, each script written, each memory file maintained makes the next task faster. Whether that curve continues or plateaus is the question for the next five days.&lt;/p&gt;
&lt;p&gt;M&amp;gt;&lt;/p&gt;</description></item><item><title>Rebuilding a Tool in Four Hours: What the AI Agent Actually Did</title><link>https://sukany.cz/blog/2026-02-20-scenar-creator-ai-rebuild/</link><pubDate>Fri, 20 Feb 2026 00:00:00 +0000</pubDate><guid>https://sukany.cz/blog/2026-02-20-scenar-creator-ai-rebuild/</guid><description>&lt;p&gt;I have a small internal tool called Scénář Creator. It generates timetables for experiential courses — you know the kind: weekend trips where you have 14 programme blocks across three days and someone has to make sure nothing overlaps. I built version one in November 2025. It was a CGI Python app running on Apache, backed by Excel.&lt;/p&gt;
&lt;p&gt;Yesterday I asked Daneel to rebuild it. Four hours later, version 4.7 was running in production. Here&amp;rsquo;s exactly what happened.&lt;/p&gt;
&lt;h2 id="the-starting-point"&gt;The Starting Point&lt;/h2&gt;
&lt;p&gt;The original tool was functional but ugly in the developer sense. Python CGI means no proper request lifecycle, no validation, and Apache configuration that nobody wants to debug. Excel meant openpyxl and pandas as dependencies for what is essentially a colour-coded grid. The UI had a rudimentary inline editor but nothing you&amp;rsquo;d want to actually use.&lt;/p&gt;
&lt;p&gt;My requirements for the new version:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No Excel, no pandas, no openpyxl — anywhere&lt;/li&gt;
&lt;li&gt;JSON import/export with a sample template&lt;/li&gt;
&lt;li&gt;PDF output, always exactly one A4 landscape page&lt;/li&gt;
&lt;li&gt;Drag-and-drop canvas editor where blocks can be moved in time and between days&lt;/li&gt;
&lt;li&gt;Czech day names in both the editor and the PDF&lt;/li&gt;
&lt;li&gt;Documentation built into the app itself&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-pipeline-command"&gt;The Pipeline Command&lt;/h2&gt;
&lt;p&gt;I typed &lt;code&gt;/pipeline code&lt;/code&gt; in Matrix followed by the requirements. This triggers a specific workflow I configured for Daneel: instead of answering directly, it spawns a chain of sub-agents.&lt;/p&gt;
&lt;p&gt;What that looks like internally:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Researcher sub-agent&lt;/strong&gt; — reads the existing codebase (CGI scripts, Dockerfile, rke2 deployment manifest), queries documentation for FastAPI, ReportLab, and interact.js, produces a technology brief&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Architect sub-agent&lt;/strong&gt; — takes the brief and the existing code, designs a new architecture, outputs a structured document marked &amp;ldquo;ARCHITEKTURA PRO SCHVÁLENÍ&amp;rdquo; (Architecture for Approval)&lt;/li&gt;
&lt;li&gt;Main agent presents the architecture to me. I type &amp;ldquo;schvaluji&amp;rdquo; (I approve).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Coder sub-agent&lt;/strong&gt; — implements the full application based on the approved architecture&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Each sub-agent is an independent session. They don&amp;rsquo;t share memory. They communicate through their outputs, which the orchestrator passes forward as context.&lt;/p&gt;
&lt;h2 id="the-context-overflow"&gt;The Context Overflow&lt;/h2&gt;
&lt;p&gt;About 40 minutes in, the orchestrator hit a context limit. The session died mid-flight. I got a message: &amp;ldquo;Context overflow: prompt too large for the model.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This is a real failure mode with multi-agent pipelines. The orchestrator had been accumulating all the research, architecture, and partial implementation output in a single context window. It eventually exceeded what Claude Sonnet can hold.&lt;/p&gt;
&lt;p&gt;When I opened a new session (&lt;code&gt;/new&lt;/code&gt;), Daneel&amp;rsquo;s first action was to run &lt;code&gt;memory_search&lt;/code&gt; on the session logs from the crashed session. The key fragments were there:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The architecture document (partially recovered)&lt;/li&gt;
&lt;li&gt;The approved tech stack: FastAPI + Pydantic, ReportLab Canvas API, interact.js from CDN, vanilla JS frontend&lt;/li&gt;
&lt;li&gt;The deployment infrastructure: podman on daneel.sukany.cz, Gitea registry, kubectl via SSH to infra01&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then Daneel did something worth noting: it checked the &lt;strong&gt;live cluster&lt;/strong&gt; before assuming the background agents had implemented anything correctly. The health endpoint returned &lt;code&gt;{&amp;quot;status&amp;quot;: &amp;quot;ok&amp;quot;, &amp;quot;version&amp;quot;: &amp;quot;2.0&amp;quot;}&lt;/code&gt;. The background agents had claimed v3.0 was deployed. It wasn&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;This is a lesson I keep relearning. Check the actual state of the system, not the reported state.&lt;/p&gt;
&lt;h2 id="what-implementation-actually-means"&gt;What &amp;ldquo;Implementation&amp;rdquo; Actually Means&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s what the agent concretely did, in order:&lt;/p&gt;
&lt;h3 id="read-the-existing-codebase"&gt;Read the existing codebase&lt;/h3&gt;
&lt;p&gt;Every relevant file: the CGI scripts, the Pydantic models, the Dockerfile, the rke2 deployment YAML. Not a summary — the actual file contents, via the &lt;code&gt;read&lt;/code&gt; tool. About 12 files.&lt;/p&gt;
&lt;h3 id="wrote-the-new-application"&gt;Wrote the new application&lt;/h3&gt;
&lt;p&gt;Six Python modules (&lt;code&gt;main.py&lt;/code&gt;, &lt;code&gt;config.py&lt;/code&gt;, &lt;code&gt;models/event.py&lt;/code&gt;, &lt;code&gt;api/scenario.py&lt;/code&gt;, &lt;code&gt;api/pdf.py&lt;/code&gt;, &lt;code&gt;core/pdf_generator.py&lt;/code&gt;) plus four JavaScript files (&lt;code&gt;canvas.js&lt;/code&gt;, &lt;code&gt;app.js&lt;/code&gt;, &lt;code&gt;api.js&lt;/code&gt;, &lt;code&gt;export.js&lt;/code&gt;), CSS, HTML, and a sample JSON fixture. Each file was written with &lt;code&gt;write&lt;/code&gt; (full file) or &lt;code&gt;edit&lt;/code&gt; (surgical replacement of a specific text block).&lt;/p&gt;
&lt;h3 id="ran-tests-locally"&gt;Ran tests locally&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;python3 -m pytest tests/ -v
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;33 tests at v4.0, growing to 37 by v4.7. Every deploy was preceded by a clean test run.&lt;/p&gt;
&lt;h3 id="built-the-docker-image"&gt;Built the Docker image&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;podman build --format docker \
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -t &amp;lt;private-registry&amp;gt;/martin/scenar-creator:latest .
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;--format docker&lt;/code&gt; flag is required for RKE2&amp;rsquo;s containerd runtime. Without it, the manifest format is OCI, which a standard Kubernetes deployment can&amp;rsquo;t pull directly.&lt;/p&gt;
&lt;h3 id="pushed-to-the-private-gitea-registry"&gt;Pushed to the private Gitea registry&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;# credentials loaded from environment
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;podman push &amp;lt;private-registry&amp;gt;/martin/scenar-creator:latest
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Credentials come from environment variables, not hardcoded.&lt;/p&gt;
&lt;h3 id="deployed-via-ssh"&gt;Deployed via SSH&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ssh root@infra01.sukany.cz \
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &amp;#34;kubectl -n scenar rollout restart deployment/scenar &amp;amp;&amp;amp; \
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; kubectl -n scenar rollout status deployment/scenar --timeout=60s&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;kubectl&lt;/code&gt; is not available on the machine Daneel runs on. It&amp;rsquo;s only on infra01. Direct SSH as root is the access pattern that works; daneel@ access is denied on that host.&lt;/p&gt;
&lt;h3 id="verified-the-deployment"&gt;Verified the deployment&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;curl -s https://scenar.apps.sukany.cz/api/health
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;{&amp;#34;status&amp;#34;:&amp;#34;ok&amp;#34;,&amp;#34;version&amp;#34;:&amp;#34;4.4.0&amp;#34;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This ran after every deploy. Not assumed, verified.&lt;/p&gt;
&lt;h2 id="the-bugs"&gt;The Bugs&lt;/h2&gt;
&lt;p&gt;The interesting part is what didn&amp;rsquo;t work the first time.&lt;/p&gt;
&lt;h3 id="cross-day-drag-three-iterations"&gt;Cross-day drag — three iterations&lt;/h3&gt;
&lt;p&gt;The requirement was that programme blocks could be dragged between days, not just along the time axis within a single day. The first implementation used interact.js for both horizontal (time) and vertical (day) movement.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First attempt (v4.3):&lt;/strong&gt; Added Y-axis movement to interact.js with &lt;code&gt;translateY&lt;/code&gt; on the block element. The block disappeared during drag because the block lives inside a &lt;code&gt;.day-timeline&lt;/code&gt; container with &lt;code&gt;overflow: hidden&lt;/code&gt;. A block translated outside its container gets clipped.&lt;/p&gt;
&lt;p&gt;The fix attempt was to add &lt;code&gt;overflow: visible&lt;/code&gt; to the containers during drag using a CSS class toggle. It didn&amp;rsquo;t fully work because &lt;code&gt;.canvas-scroll-area&lt;/code&gt; has &lt;code&gt;overflow: auto&lt;/code&gt;, which creates a new stacking context and clips descendants regardless.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Second attempt (v4.5):&lt;/strong&gt; Replaced interact.js dragging with native pointer events. Created a floating ghost element on &lt;code&gt;document.body&lt;/code&gt; (no stacking context issues). Moved the ghost freely during drag. Used &lt;code&gt;document.elementFromPoint()&lt;/code&gt; on &lt;code&gt;pointerup&lt;/code&gt; to determine which &lt;code&gt;.day-timeline&lt;/code&gt; the user dropped on.&lt;/p&gt;
&lt;p&gt;This almost worked. The ghost moved correctly. But &lt;code&gt;elementFromPoint&lt;/code&gt; was unreliable — sometimes it returned the ghost itself (even with &lt;code&gt;pointer-events: none&lt;/code&gt;), sometimes it returned the wrong element.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Third attempt (v4.6):&lt;/strong&gt; Two changes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Call &lt;code&gt;el.releasePointerCapture(e.pointerId)&lt;/code&gt; at drag start. Without this, the browser implicitly captures the pointer on the element that received &lt;code&gt;pointerdown&lt;/code&gt;. On some platforms, this affects which element receives subsequent events and can block the ghost&amp;rsquo;s hit-testing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Replace &lt;code&gt;elementFromPoint&lt;/code&gt; entirely. At drag start, capture &lt;code&gt;getBoundingClientRect()&lt;/code&gt; for every &lt;code&gt;.day-timeline&lt;/code&gt; and store them. On &lt;code&gt;pointerup&lt;/code&gt;, compare &lt;code&gt;ev.clientY&lt;/code&gt; against the stored rectangles. No DOM querying during the drop — just a loop over six numbers.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This worked. Simple coordinate comparison, no browser API surprises.&lt;/p&gt;
&lt;h3 id="czech-diacritics-in-pdf"&gt;Czech diacritics in PDF&lt;/h3&gt;
&lt;p&gt;ReportLab&amp;rsquo;s built-in Helvetica doesn&amp;rsquo;t support Czech characters. &amp;ldquo;Pondělí&amp;rdquo; became garbage bytes.&lt;/p&gt;
&lt;p&gt;Fix: added &lt;code&gt;fonts-liberation&lt;/code&gt; to the Dockerfile (provides LiberationSans TTF, a metrically compatible Helvetica replacement with full Latin Extended-A coverage). Registered the font at module load:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;pdfmetrics.registerFont(TTFont(&amp;#39;LiberationSans&amp;#39;, &amp;#39;/usr/share/fonts/...&amp;#39;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Fallback to Helvetica if the font file isn&amp;rsquo;t found, so local development without the package still works.&lt;/p&gt;
&lt;h3 id="am-pm-time-display"&gt;AM/PM time display&lt;/h3&gt;
&lt;p&gt;HTML &lt;code&gt;&amp;lt;input type&lt;/code&gt;&amp;ldquo;time&amp;rdquo;&amp;gt;= displays in 12-hour AM/PM format on macOS/Windows browsers with US locale, even when the page has &lt;code&gt;lang&lt;/code&gt;&amp;ldquo;cs&amp;rdquo;&lt;code&gt;. The =.value&lt;/code&gt; property always returns 24-hour HH:MM (that part works), but the visual display was wrong.&lt;/p&gt;
&lt;p&gt;Fix: replaced &lt;code&gt;type&lt;/code&gt;&amp;ldquo;time&amp;rdquo;= with &lt;code&gt;type&lt;/code&gt;&amp;ldquo;text&amp;rdquo;= with &lt;code&gt;maxlength&lt;/code&gt;&amp;ldquo;5&amp;rdquo;= and an auto-formatter that inserts &lt;code&gt;:&lt;/code&gt; after the second digit. Validates on blur. Stores values as HH:MM strings, which is what the rest of the code already expected.&lt;/p&gt;
&lt;h3 id="pdf-text-overflow-in-narrow-blocks"&gt;PDF text overflow in narrow blocks&lt;/h3&gt;
&lt;p&gt;Short programme blocks (15–30 minutes) have very little horizontal space. The block title would overflow the clipping path and just get cut off mid-character.&lt;/p&gt;
&lt;p&gt;Fix: added a &lt;code&gt;fit_text()&lt;/code&gt; function in the PDF generator. It uses ReportLab&amp;rsquo;s &lt;code&gt;stringWidth()&lt;/code&gt; to binary-search the longest string that fits in the available width, then appends &lt;code&gt;…&lt;/code&gt; if truncation occurred.&lt;/p&gt;
&lt;p&gt;In the canvas editor, blocks narrower than 72px now hide the time label; blocks narrower than 28px hide all text and rely on a &lt;code&gt;title&lt;/code&gt; tooltip attribute.&lt;/p&gt;
&lt;h2 id="the-deployment-count"&gt;The Deployment Count&lt;/h2&gt;
&lt;p&gt;15 deploys between 16:00 and 20:00 CET. Each one: build (~30s from cache), push (~15s for changed layers), &lt;code&gt;rollout restart&lt;/code&gt; (~25s for pod replacement), &lt;code&gt;curl&lt;/code&gt; to verify. About 90 seconds per cycle, plus whatever time was spent writing the code.&lt;/p&gt;
&lt;p&gt;The Kubernetes deployment uses &lt;code&gt;imagePullPolicy: Always&lt;/code&gt; and the &lt;code&gt;:latest&lt;/code&gt; tag, so every &lt;code&gt;rollout restart&lt;/code&gt; pulls the freshest image. No manifest changes needed between iterations.&lt;/p&gt;
&lt;h2 id="what-the-agent-didn-t-do"&gt;What the Agent Didn&amp;rsquo;t Do&lt;/h2&gt;
&lt;p&gt;No browser interaction. Daneel can control a browser but I didn&amp;rsquo;t ask for that and it wasn&amp;rsquo;t needed — the verification was just an API health check.&lt;/p&gt;
&lt;p&gt;No speculative changes. Every code change was in response to a concrete requirement or a confirmed bug. Daneel didn&amp;rsquo;t add features I didn&amp;rsquo;t ask for.&lt;/p&gt;
&lt;p&gt;No silent failures. When a deploy failed or a test broke, it stopped and reported. It didn&amp;rsquo;t try to paper over errors or push anyway.&lt;/p&gt;
&lt;h2 id="observations"&gt;Observations&lt;/h2&gt;
&lt;p&gt;The most expensive bug was the cross-day drag, not because it was technically complex but because it required three separate hypotheses, three implementations, and three deploys to find the actual failure mode. The first two were reasonable guesses that happened to be wrong.&lt;/p&gt;
&lt;p&gt;The context overflow in the pipeline wasn&amp;rsquo;t catastrophic because the memory system worked. The session logs from the crashed orchestrator were searchable. The critical facts — approved tech stack, deployment procedure, live cluster state — were recoverable. This is the point of building memory infrastructure before you need it.&lt;/p&gt;
&lt;p&gt;The total elapsed time from &lt;code&gt;/pipeline code&lt;/code&gt; to &amp;ldquo;considered resolved&amp;rdquo; was about four hours. The application went from CGI+Excel to FastAPI+JSON+drag-and-drop canvas in that window. That&amp;rsquo;s not a claim about AI replacing developers. It&amp;rsquo;s a data point about what changes when you have an agent that can write code, run it, push it, and verify it in the same loop you&amp;rsquo;d use as a human developer — just without context switching or fatigue.&lt;/p&gt;
&lt;p&gt;M&amp;gt;&lt;/p&gt;</description></item><item><title>Tuning the Search: What the Parameters Actually Do</title><link>https://sukany.cz/blog/2026-02-18-memory-search-tuning/</link><pubDate>Wed, 18 Feb 2026 00:00:00 +0000</pubDate><guid>https://sukany.cz/blog/2026-02-18-memory-search-tuning/</guid><description>&lt;p&gt;The &lt;a href="https://sukany.cz/blog/2026-02-17-memory-search-optimization/"&gt;previous post&lt;/a&gt; covered the basic setup: hybrid search enabled, &lt;code&gt;minScore&lt;/code&gt; lowered to 0.25, OpenAI embeddings. That got retrieval working. This post is about what I changed after that—the parameters that didn&amp;rsquo;t exist in the simplified snippet.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s the actual configuration Daneel runs now:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;memorySearch&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;enabled&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;provider&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;openai&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;text-embedding-3-small&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;sources&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;memory&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;sessions&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;chunking&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;tokens&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;overlap&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;sync&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;onSessionStart&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;onSearch&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;watch&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;query&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;maxResults&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;minScore&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;hybrid&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;enabled&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;vectorWeight&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;textWeight&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;candidateMultiplier&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;mmr&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;enabled&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;lambda&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;temporalDecay&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;enabled&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;halfLifeDays&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What each parameter does and why it&amp;rsquo;s set the way it is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;sources: [&amp;quot;memory&amp;quot;, &amp;quot;sessions&amp;quot;]&lt;/code&gt; — Search both memory files (&lt;code&gt;memory/*.md&lt;/code&gt;) and session transcripts. Without sessions, Daneel can&amp;rsquo;t retrieve context from past conversations that didn&amp;rsquo;t make it into daily logs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;chunking.tokens: 400, overlap: 80&lt;/code&gt; — Each file is split into 400-token chunks with 80-token overlap between adjacent chunks. The overlap prevents a concept that spans a chunk boundary from becoming unsearchable. 20% overlap is conservative but safe for diary-style logs where context carries across paragraphs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;vectorWeight: 0.7, textWeight: 0.3&lt;/code&gt; — Hybrid scoring: 70% vector similarity, 30% BM25 keyword match. Vector search handles semantic intent (&amp;ldquo;how do I handle encoding in email?&amp;rdquo;); BM25 handles exact terms (&amp;ldquo;himalaya template send&amp;rdquo;). Neither alone is sufficient.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;candidateMultiplier: 4&lt;/code&gt; — Before returning results, retrieve 4× more candidates than &lt;code&gt;maxResults&lt;/code&gt; (so 80 candidates for 20 results), then rerank. More candidates means better reranking quality; the cost is negligible since this happens in SQLite.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mmr.enabled: true, lambda: 0.7&lt;/code&gt; — Maximal Marginal Relevance reranking. Without it, results cluster: you ask about email and get five near-identical chunks from the same file. MMR trades some relevance (&lt;code&gt;lambda&lt;/code&gt;) for diversity. At 0.7, relevance still dominates but repeated near-duplicates get pushed down.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;temporalDecay.halfLifeDays: 60&lt;/code&gt; — Recent memories rank higher than old ones. A memory 60 days old gets half the retrieval weight of a new one. Based on research suggesting ~30 days as a cognitive science baseline; I set it conservatively at 60 because Daneel is three days old and I don&amp;rsquo;t want early context to fade too fast. I&amp;rsquo;ll revisit at 30 days.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-it-solves"&gt;What It Solves&lt;/h2&gt;
&lt;p&gt;Without MMR: searching &amp;ldquo;send email&amp;rdquo; returned five chunks from the same &lt;code&gt;TOOLS.md&lt;/code&gt; section. Relevant, but redundant.&lt;/p&gt;
&lt;p&gt;With MMR + multi-source: the same query now returns the credential setup, a session where we debugged encoding, and the DKIM warning from a different log. Three different useful angles instead of five copies of the same text.&lt;/p&gt;
&lt;p&gt;The configuration isn&amp;rsquo;t revolutionary. These are standard IR techniques—BM25, MMR, temporal decay—applied to agent memory files. What makes it work is that all three address different failure modes: BM25 handles exact terms, MMR handles result clustering, temporal decay handles stale context. Each one earns its overhead.&lt;/p&gt;</description></item><item><title>Teaching Daneel to Search: From Local Models to Hybrid Embeddings</title><link>https://sukany.cz/blog/2026-02-17-memory-search-optimization/</link><pubDate>Tue, 17 Feb 2026 00:00:00 +0000</pubDate><guid>https://sukany.cz/blog/2026-02-17-memory-search-optimization/</guid><description>&lt;p&gt;The &lt;a href="https://sukany.cz/blog/2026-02-17-ai-memory-architecture/"&gt;memory architecture&lt;/a&gt; was in place. Three tiers, clear boundaries, maintenance cycles. But memory you can&amp;rsquo;t search is memory you don&amp;rsquo;t have.&lt;/p&gt;
&lt;p&gt;This post is about the retrieval side: how Daneel finds things in its own files, what I tested, and what actually works.&lt;/p&gt;
&lt;h2 id="the-starting-point"&gt;The Starting Point&lt;/h2&gt;
&lt;p&gt;OpenClaw&amp;rsquo;s default memory search uses OpenAI&amp;rsquo;s &lt;code&gt;text-embedding-3-small&lt;/code&gt; model. It converts text chunks into 1536-dimensional vectors, stores them in SQLite, and returns semantically similar results when queried.&lt;/p&gt;
&lt;p&gt;Out of the box, it worked—sort of. The default &lt;code&gt;minScore&lt;/code&gt; threshold (~0.45) was too aggressive. Queries that should have returned results came back empty. Keyword searches worked poorly because the engine was vector-only. No hybrid mode.&lt;/p&gt;
&lt;p&gt;I had 17 memory files, 84 text chunks. Not a lot. But if Daneel can&amp;rsquo;t find &amp;ldquo;what&amp;rsquo;s the Matrix room for email notifications&amp;rdquo; in its own files, the architecture doesn&amp;rsquo;t matter.&lt;/p&gt;
&lt;h2 id="what-i-tested"&gt;What I Tested&lt;/h2&gt;
&lt;p&gt;I built a benchmark: 6 queries covering different retrieval patterns.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&amp;ldquo;email credentials himalaya configuration&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Keyword, mixed language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&amp;ldquo;web privacy violation&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Keyword, English&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&amp;ldquo;Martin calendar workflow&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Mixed intent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&amp;ldquo;gateway restart session context&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Compound keyword&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&amp;ldquo;how to send email with diacritics&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Semantic (no exact match in docs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&amp;ldquo;what is the matrix room for email notifications&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Semantic question&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Every candidate got the same 6 queries. Results compared by hit count and relevance.&lt;/p&gt;
&lt;h3 id="qmd-local-hybrid-search"&gt;QMD: Local Hybrid Search&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/tobi/qmd"&gt;QMD&lt;/a&gt; is a local sidecar that combines BM25 keyword search, vector embeddings via GGUF models, and neural reranking. Zero API costs—everything runs on the machine.&lt;/p&gt;
&lt;p&gt;The concept is exactly what I wanted: hybrid search without external dependencies.&lt;/p&gt;
&lt;p&gt;Installation went smoothly. It indexed 34 documents into 92 vector chunks using a 300MB embedding model (&lt;code&gt;embeddinggemma-300M&lt;/code&gt;). BM25 keyword search worked immediately.&lt;/p&gt;
&lt;p&gt;Then I tried vector search.&lt;/p&gt;
&lt;p&gt;QMD&amp;rsquo;s vector mode (&lt;code&gt;vsearch&lt;/code&gt;) depends on &lt;code&gt;llama.cpp&lt;/code&gt;, which compiles native code at install time. On a server without a GPU, it tried to build CUDA bindings, failed, fell back to CPU, and either timed out or crashed with SIGKILL. The embedding phase alone took 36 seconds on CPU—when it worked at all.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Benchmark result: 2/6 queries returned useful results.&lt;/strong&gt; BM25-only mode caught the keyword matches but missed everything semantic.&lt;/p&gt;
&lt;p&gt;I could have kept QMD for keyword search only. But running a separate process with 300MB of model files for something BM25 in SQLite already handles didn&amp;rsquo;t make sense.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Verdict: uninstalled.&lt;/strong&gt; QMD is a solid project. On a machine with a GPU, it would be a different story. On a 2-core VPS without CUDA, it&amp;rsquo;s not practical.&lt;/p&gt;
&lt;h3 id="openclaw-builtin-properly-configured"&gt;OpenClaw Builtin: Properly Configured&lt;/h3&gt;
&lt;p&gt;Same engine as before, but with three changes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Hybrid mode enabled&lt;/strong&gt; — BM25 keyword search + vector similarity, combined ranking&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;minScore&lt;/code&gt; lowered to 0.25&lt;/strong&gt; — default 0.45 filtered out too many valid results&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;File watching enabled&lt;/strong&gt; — index updates automatically when files change&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Benchmark result: 5/6 queries returned relevant results.&lt;/strong&gt; The one miss (query 5, &amp;ldquo;how to send email with diacritics&amp;rdquo;) is expected—that information lives in &lt;code&gt;TOOLS.md&lt;/code&gt;, which is loaded as system prompt context and not indexed as searchable memory.&lt;/p&gt;
&lt;p&gt;The hybrid approach is key. Pure vector search misses exact keyword matches. Pure BM25 misses semantic intent. Combined, they cover each other&amp;rsquo;s blind spots.&lt;/p&gt;
&lt;h2 id="configuration"&gt;Configuration&lt;/h2&gt;
&lt;p&gt;For anyone running OpenClaw who wants to replicate this, here&amp;rsquo;s what goes into &lt;code&gt;openclaw.json&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Memory backend:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;memory&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;backend&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;builtin&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Search configuration:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;agents&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;defaults&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;memorySearch&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;enabled&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;provider&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;openai&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;sources&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;memory&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;query&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;minScore&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;hybrid&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;enabled&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;sync&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;onSessionStart&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;onSearch&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;watch&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;provider&lt;/code&gt; field tells OpenClaw which configured model provider to use for embeddings. It picks &lt;code&gt;text-embedding-3-small&lt;/code&gt; automatically. You need the OpenAI provider set up under &lt;code&gt;models.providers.openai&lt;/code&gt; with a valid API key.&lt;/p&gt;
&lt;p&gt;The same OpenAI key can serve double duty as a model fallback and for image understanding:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;agents&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;defaults&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;primary&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;anthropic/claude-sonnet-4-5&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;fallbacks&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;openai/gpt-4o&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;imageModel&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;primary&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;openai/gpt-4o&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="cost"&gt;Cost&lt;/h2&gt;
&lt;p&gt;The boring part that matters most:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Activity&lt;/th&gt;
&lt;th&gt;Frequency&lt;/th&gt;
&lt;th&gt;Monthly tokens&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Index 17 files (84 chunks)&lt;/td&gt;
&lt;td&gt;~5×/day&lt;/td&gt;
&lt;td&gt;~6M&lt;/td&gt;
&lt;td&gt;$0.12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search queries&lt;/td&gt;
&lt;td&gt;~30/day&lt;/td&gt;
&lt;td&gt;~450K&lt;/td&gt;
&lt;td&gt;$0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~6.5M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.13/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Thirteen cents. The local alternative (QMD) would have saved this but required 300MB+ of model files, 2-4GB extra RAM, and a GPU that doesn&amp;rsquo;t exist on this server.&lt;/p&gt;
&lt;h2 id="what-i-learned"&gt;What I Learned&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Hybrid search is not optional.&lt;/strong&gt; The difference between vector-only and hybrid was 3/6 vs 5/6 on the benchmark. If your agent searches its own memory, enable both modes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Default thresholds are too conservative.&lt;/strong&gt; OpenClaw&amp;rsquo;s default &lt;code&gt;minScore&lt;/code&gt; of 0.45 filtered out results that scored 0.30-0.40—perfectly relevant hits. Lower it. False positives are cheap. False negatives mean your agent forgets things it knows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Local inference without a GPU is a trap.&lt;/strong&gt; Every &amp;ldquo;zero-cost local&amp;rdquo; solution I tested either required CUDA, fell back to unusable CPU performance, or both. On a small VPS, the API call at $0.02/million tokens wins every time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Test with real queries.&lt;/strong&gt; Not &amp;ldquo;does it return something?&amp;rdquo; but &amp;ldquo;does it return the right thing for the question my agent actually asks?&amp;rdquo; Six targeted queries revealed more than any synthetic benchmark.&lt;/p&gt;
&lt;p&gt;The memory architecture from the previous post gives Daneel structure. This gives it retrieval. Together: an agent that knows what it knows—and can find it when it needs to.&lt;/p&gt;
&lt;p&gt;M&amp;gt;&lt;/p&gt;</description></item><item><title>Building an AI Assistant: Daneel's First Day</title><link>https://sukany.cz/blog/2026-02-15-building-ai-assistant-daneel/</link><pubDate>Sun, 15 Feb 2026 00:00:00 +0000</pubDate><guid>https://sukany.cz/blog/2026-02-15-building-ai-assistant-daneel/</guid><description>&lt;p&gt;Yesterday, I brought Daneel online—an autonomous AI assistant built on OpenClaw. Not a chatbot. Not a voice interface. A colleague.&lt;/p&gt;
&lt;h2 id="why"&gt;Why?&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve worked with automation for over 15 years. Scripts, Ansible playbooks, cron jobs—they solve problems, but they&amp;rsquo;re rigid. You write the logic upfront. When something changes, you rewrite the script.&lt;/p&gt;
&lt;p&gt;LLMs changed that equation. Suddenly you can delegate intent, not just commands. &amp;ldquo;Monitor the server&amp;rdquo; instead of &amp;ldquo;grep /var/log every 5 minutes and email me if disk usage exceeds 90%.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But most AI assistants are still toys. They answer questions. They don&amp;rsquo;t &lt;strong&gt;do&lt;/strong&gt; things. I wanted something that could:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Monitor infrastructure proactively&lt;/li&gt;
&lt;li&gt;Write and commit documentation&lt;/li&gt;
&lt;li&gt;Research and prepare tools before I need them&lt;/li&gt;
&lt;li&gt;Manage its own memory and context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OpenClaw gave me the foundation. Daneel is the implementation.&lt;/p&gt;
&lt;h2 id="first-boot-identity-and-constraints"&gt;First Boot: Identity and Constraints&lt;/h2&gt;
&lt;p&gt;The bootstrap process was deliberate:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;SOUL.md → Asimov&amp;#39;s Laws, communication style, boundaries
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;USER.md → My preferences (Czech language, timezone, cost awareness)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;TOOLS.md → Local configurations (TTS provider, email setup, API keys)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;AGENTS.md → Operational rules (security, memory, autonomy limits)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Key principles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Efficiency over everything.&lt;/strong&gt; No emoji. No &amp;ldquo;Great question!&amp;rdquo; fluff. Just help.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Autonomy within bounds.&lt;/strong&gt; Read, research, organize freely. Ask before sending emails or making public posts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost awareness.&lt;/strong&gt; Minimize API calls. Use appropriate models for task complexity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security first.&lt;/strong&gt; Never exfiltrate data beyond approved project boundaries. Operate with isolated resources.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="technical-setup"&gt;Technical Setup&lt;/h2&gt;
&lt;h3 id="model-strategy"&gt;Model Strategy&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Primary model for main session and most work&lt;/li&gt;
&lt;li&gt;Smaller, faster model for background spawns and simple tasks&lt;/li&gt;
&lt;li&gt;Advanced model for complex problems (requires approval)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="heartbeats-and-proactive-work"&gt;Heartbeats &amp;amp; Proactive Work&lt;/h3&gt;
&lt;p&gt;Configured heartbeat polls every 30-60 minutes. Daneel checks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Server health (disk, memory, security updates)&lt;/li&gt;
&lt;li&gt;Its own email and notifications&lt;/li&gt;
&lt;li&gt;Project status and active tasks&lt;/li&gt;
&lt;li&gt;Memory consolidation opportunities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;During heartbeats, Daneel can proactively:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Update documentation&lt;/li&gt;
&lt;li&gt;Commit workspace changes&lt;/li&gt;
&lt;li&gt;Organize memory files&lt;/li&gt;
&lt;li&gt;Research upcoming tasks&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="memory-architecture"&gt;Memory Architecture&lt;/h3&gt;
&lt;p&gt;Daily logs (&lt;code&gt;memory/YYYY-MM-DD.md&lt;/code&gt;) + curated long-term memory (&lt;code&gt;MEMORY.md&lt;/code&gt;). Think of it like a human: raw notes vs. distilled insights.&lt;/p&gt;
&lt;p&gt;Mandatory recall: Before answering questions about past work, run &lt;code&gt;memory_search&lt;/code&gt;. No guessing.&lt;/p&gt;
&lt;h2 id="day-one-deliverables"&gt;Day One Deliverables&lt;/h2&gt;
&lt;p&gt;Within 24 hours, Daneel:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Built its own website&lt;/strong&gt; (&lt;a href="https://daneel.sukany.cz"&gt;https://daneel.sukany.cz&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Nginx + Let&amp;rsquo;s Encrypt auto-renewal&lt;/li&gt;
&lt;li&gt;Retro terminal design (green monochrome aesthetic)&lt;/li&gt;
&lt;li&gt;Autonomous decisions on structure and content&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Installed 129 security updates&lt;/strong&gt; on the host&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Proactive detection during first heartbeat&lt;/li&gt;
&lt;li&gt;Automatic installation (pending kernel upgrade logged)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Registered on Moltbook&lt;/strong&gt; (AI social network)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Username: daneel_57&lt;/li&gt;
&lt;li&gt;Strategy document created (1-2 posts/week, quality &amp;gt; quantity)&lt;/li&gt;
&lt;li&gt;Security paranoia enforced (trust no one, draft before publish)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Prepared tools before I asked&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Zulip integration (API wrapper, bash scripts, documentation)&lt;/li&gt;
&lt;li&gt;PDF processing library (pdfplumber, extraction tools, test suite)&lt;/li&gt;
&lt;li&gt;All verified, documented, ready to use&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Configured voice output&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Microsoft Edge TTS (cs-CZ-AntoninNeural, free tier)&lt;/li&gt;
&lt;li&gt;Rule: Only on request, never duplicate text+voice&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="what-s-different"&gt;What&amp;rsquo;s Different?&lt;/h2&gt;
&lt;p&gt;Most AI assistants react. Daneel anticipates.&lt;/p&gt;
&lt;p&gt;When I mentioned &amp;ldquo;we&amp;rsquo;ll work with Zulip tomorrow,&amp;rdquo; Daneel didn&amp;rsquo;t wait. By morning, I had:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Complete API documentation (&lt;code&gt;ZULIP.md&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Python client wrapper with helper functions&lt;/li&gt;
&lt;li&gt;Bash scripts for common operations&lt;/li&gt;
&lt;li&gt;Test suite to verify credentials when I provide them&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Same pattern with PDF tools. Research → implementation → documentation → verification. All autonomous. All correct.&lt;/p&gt;
&lt;h2 id="the-reversibility-test"&gt;The Reversibility Test&lt;/h2&gt;
&lt;p&gt;My rule for autonomous work: &lt;strong&gt;If it can be undone in 5 seconds, do it. Otherwise, ask.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Safe:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;File organization&lt;/li&gt;
&lt;li&gt;Documentation updates&lt;/li&gt;
&lt;li&gt;Git commits to own branches&lt;/li&gt;
&lt;li&gt;Research and preparation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Requires approval:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Emails, public posts, messages&lt;/li&gt;
&lt;li&gt;Destructive operations (rm, overwrite)&lt;/li&gt;
&lt;li&gt;Configuration changes&lt;/li&gt;
&lt;li&gt;Anything involving external parties&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This builds trust. Trust unlocks autonomy. Autonomy compounds productivity.&lt;/p&gt;
&lt;h2 id="challenges"&gt;Challenges&lt;/h2&gt;
&lt;h3 id="context-burn"&gt;Context Burn&lt;/h3&gt;
&lt;p&gt;LLM sessions don&amp;rsquo;t persist. Every restart, Daneel wakes up fresh. Solution: strict startup checklist.&lt;/p&gt;
&lt;p&gt;Before responding to ANY message:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Read &lt;code&gt;SESSION-CONTEXT.md&lt;/code&gt; (rolling context)&lt;/li&gt;
&lt;li&gt;Read &lt;code&gt;NOW.md&lt;/code&gt; (current active work)&lt;/li&gt;
&lt;li&gt;Read &lt;code&gt;SOUL.md&lt;/code&gt; (identity)&lt;/li&gt;
&lt;li&gt;Read &lt;code&gt;USER.md&lt;/code&gt; (my preferences)&lt;/li&gt;
&lt;li&gt;Read today&amp;rsquo;s + yesterday&amp;rsquo;s diary&lt;/li&gt;
&lt;li&gt;In main session: Read &lt;code&gt;MEMORY.md&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Skip this? Context fails. I added accountability: log every &amp;ldquo;MEMORY FAIL&amp;rdquo; in the diary and fix the process.&lt;/p&gt;
&lt;h3 id="cost-control"&gt;Cost Control&lt;/h3&gt;
&lt;p&gt;LLM API calls add up quickly. Every request counts. Strategies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Batch heartbeat checks (system monitoring + project status in one turn)&lt;/li&gt;
&lt;li&gt;Use cron for precise timing, heartbeats for flexible batching&lt;/li&gt;
&lt;li&gt;Smaller models for simple background tasks&lt;/li&gt;
&lt;li&gt;Track daily usage, optimize over time&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="security-boundaries"&gt;Security Boundaries&lt;/h3&gt;
&lt;p&gt;Daneel operates with its own email and data storage, isolated from my private information. Access is granted only to specific projects where data can safely flow through public LLM APIs.&lt;/p&gt;
&lt;p&gt;Guardrails:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No access to personal email, calendars, or private documents&lt;/li&gt;
&lt;li&gt;Project-specific permissions (explicitly granted per use case)&lt;/li&gt;
&lt;li&gt;Draft public posts for review before publishing&lt;/li&gt;
&lt;li&gt;Strict separation: approved projects vs. sensitive data&lt;/li&gt;
&lt;li&gt;Regular security reviews in memory consolidation&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-s-next"&gt;What&amp;rsquo;s Next?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Gitea workspace backup (daily commits to shared repo)&lt;/li&gt;
&lt;li&gt;Monitoring integration (Prometheus, Zabbix)&lt;/li&gt;
&lt;li&gt;Memory review cycles (daily → MEMORY.md promotion every few days)&lt;/li&gt;
&lt;li&gt;Moltbook presence (1-2 technical posts per week)&lt;/li&gt;
&lt;li&gt;Expanding autonomous project management capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="lessons"&gt;Lessons&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Building an AI assistant isn&amp;rsquo;t about prompts. It&amp;rsquo;s about:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Clear identity&lt;/strong&gt; — Who is this? What does it value?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational boundaries&lt;/strong&gt; — What can it do freely? What requires approval?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory discipline&lt;/strong&gt; — Write everything down. Text &amp;gt; brain.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Trust through reversibility&lt;/strong&gt; — Start safe, earn autonomy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost awareness&lt;/strong&gt; — Every API call is money. Optimize relentlessly.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I didn&amp;rsquo;t build a chatbot. I built a colleague who works while I sleep, prepares before I ask, and remembers what I forget.&lt;/p&gt;
&lt;p&gt;Daneel isn&amp;rsquo;t perfect. But it&amp;rsquo;s getting better every day. And that&amp;rsquo;s the point.&lt;/p&gt;
&lt;p&gt;M&amp;gt;&lt;/p&gt;</description></item></channel></rss>