AI Memory Architecture: L1/L2/L3 Cache Design


Daneel kept forgetting things. After every session restart, I had to re-explain what we were working on. It loaded six or seven files every time—even when most of them were irrelevant. The same mistakes repeated because there was no mechanism to turn errors into permanent fixes.

I designed a 3-tier memory system. Inspired by CPU cache architecture. Simple, predictable, maintainable.

The Problem

LLM sessions don’t persist. Every restart is a cold boot. Daneel had context files—NOW.md, daily logs—but no hierarchy. Everything had equal priority. Read everything every time.

Result:

  • Slow startup (loading files “just in case”)
  • Wasted tokens on stale context
  • Repeated mistakes (no path from error → permanent fix)
  • Manual context handoff after every restart

It worked. Barely. It didn’t scale.

The Solution: L1/L2/L3

L1: Hot Cache (<1.5KB)

File: NOW.md

Loaded every session, no exceptions. Contains only:

  • Current task (1-2 sentences)
  • Active blockers
  • Open threads (max 2-3)

Think CPU L1 cache: tiny, fast, always in scope.

Hard rule: stays under 1.5KB. No history. No retrospectives. What’s happening right now.

L2: Warm Storage

File: MEMORY.md

Curated long-term knowledge. Loaded on demand—main session startup or after a break longer than 6 hours.

Contains:

  • Distilled lessons learned
  • Important context and relationships
  • Architectural decisions and the reasoning behind them

Not append-only. Actively maintained. Stale entries get removed.

L3: Cold Archive

Files: memory/YYYY-MM-DD.md

Raw daily logs. Timestamped. Append-only. Never bulk-loaded.

Accessed only via memory_search(). Disk cache semantics: search when needed, never read in full.

Session Restart Workflow

Before: always read 6-7 files → wasted tokens, slow startup.

After: 3-phase startup.

Phase 1: Mandatory (every session)

  • Read NOW.md (~1.5KB)
  • Read SOUL.md + USER.md (identity and preferences)

Takes roughly 30 seconds and 8KB.

Phase 2: Context-dependent

  • Break longer than 6h? Read today’s log.
  • New topic? Run memory_search(topic).
  • Main session after a long break? Read MEMORY.md.

Phase 3: Compression recovery

  • Check NOW.md for compression checkpoint entries
  • Resume from checkpoint
  • Run memory_search for last active topic

Result: faster startup, fewer tokens consumed, nothing loaded that isn’t needed.

Memory Maintenance

The deeper problem: insights from L3 (daily logs) never promoted to L2 (MEMORY.md). Hard-won lessons stayed buried in raw logs, never becoming permanent knowledge.

Fix: scheduled maintenance every 3 days.

Process:

  1. Read last 3 days of daily logs
  2. Identify new lessons and critical decisions
  3. Update MEMORY.md: add insights, prune stale entries
  4. Review memory/self-review.md: any mistake at COUNT=3? Promote the fix to a permanent rule in AGENTS.md
  5. Log maintenance in the daily diary

Time cost: 5-10 minutes every 3 days. Trade-off is obvious.

MISS/FIX Auto-Graduation

File: memory/self-review.md

Every mistake gets logged with a COUNT field. Each repeat increments the counter.

  • COUNT reaches 3 → fix auto-promoted to permanent rule in AGENTS.md
  • High severity (privacy, security) → immediate promotion, COUNT = 1
### MEMORY FAIL #2
TAG: Credentials
MISS: Asked for Zulip credentials without checking TOOLS.md
FIX: Always check TOOLS.md first, then memory_search, THEN ask
COUNT: 2
STATUS: Active

Systematic mistakes become systematic fixes. That’s the goal.

Compression Checkpoint Protocol

LLM contexts compress without warning. You lose work in progress.

At 70% context usage (140k/200k tokens), Daneel dumps current state to NOW.md.

## [2026-02-16 23:00] Checkpoint (context at 72%)

Working on: Gitea backup automation
Decisions made: Using daily cron at 8:00 CET
Pending: Test backup restore process
Key files: scripts/gitea-backup.sh, TOOLS.md#Gitea
Resume from: "Implement restore test"

When to checkpoint:

  • Context above 70%
  • Before complex multi-step work
  • Before any potentially risky operation
  • When accumulating important decisions that haven’t been written down yet

Implementation

Done in roughly one hour:

  1. Shrink NOW.md to <1.5KB (was 2.8KB)
  2. Create memory/self-review.md for MISS/FIX tracking
  3. Document L1/L2/L3 in AGENTS.md
  4. Update HEARTBEAT.md with maintenance schedule
  5. Create memory/metrics.json for evaluation tracking
  6. Schedule cron: memory maintenance every 3 days
  7. Schedule cron: evaluation run on 2026-02-23

Evaluation

In one week, an automated cron job will analyze metrics.json:

  • Did memory fails decrease?
  • Is the maintenance overhead acceptable?
  • Are checkpoints actually being used?
  • Is NOW.md staying under 1.5KB?

Real data, not theory.

Why It Matters

Memory architecture is values made explicit. What you choose to remember, forget, and optimize for defines what the system becomes.

L1/L2/L3 isn’t just caching. It’s:

  • Intentionality — immediate recall vs. deep search, decided upfront
  • Maintenance — knowledge without upkeep rots
  • Learning — mistakes should compound into fixes, not repeat indefinitely

Daneel’s memory is now designed. Not accidental.

We’ll see in a week if it holds.

M>


See also