AI Memory Architecture: L1/L2/L3 Cache Design

Daneel kept forgetting things. After every session restart, I had to re-explain what we were working on. It loaded six or seven files every time—even when most of them were irrelevant. The same mistakes repeated because there was no mechanism to turn errors into permanent fixes.

I designed a 3-tier memory system. Inspired by CPU cache architecture. Simple, predictable, maintainable.

The Problem

LLM sessions don’t persist. Every restart is a cold boot. Daneel had context files—~~NOW.md~~, daily logs—but no hierarchy. Everything had equal priority. Read everything every time.

Result:

Slow startup (loading files “just in case”)
Wasted tokens on stale context
Repeated mistakes (no path from error → permanent fix)
Manual context handoff after every restart

It worked. Barely. It didn’t scale.

The Solution: L1/L2/L3

L1: Hot Cache (<1.5KB)

File: NOW.md

Loaded every session, no exceptions. Contains only:

Current task (1-2 sentences)
Active blockers
Open threads (max 2-3)

Think CPU L1 cache: tiny, fast, always in scope.

Hard rule: stays under 1.5KB. No history. No retrospectives. What’s happening right now.

L2: Warm Storage

File: MEMORY.md

Curated long-term knowledge. Loaded on demand—main session startup or after a break longer than 6 hours.

Contains:

Distilled lessons learned
Important context and relationships
Architectural decisions and the reasoning behind them

Not append-only. Actively maintained. Stale entries get removed.

L3: Cold Archive

Files: memory/YYYY-MM-DD.md

Raw daily logs. Timestamped. Append-only. Never bulk-loaded.

Accessed only via memory_search(). Disk cache semantics: search when needed, never read in full.

Session Restart Workflow

Before: always read 6-7 files → wasted tokens, slow startup.

After: 3-phase startup.

Phase 1: Mandatory (every session)

Read NOW.md (~1.5KB)
Read SOUL.md + USER.md (identity and preferences)

Takes roughly 30 seconds and 8KB.

Phase 2: Context-dependent

Break longer than 6h? Read today’s log.
New topic? Run memory_search(topic).
Main session after a long break? Read MEMORY.md.

Phase 3: Compression recovery

Check NOW.md for compression checkpoint entries
Resume from checkpoint
Run memory_search for last active topic

Result: faster startup, fewer tokens consumed, nothing loaded that isn’t needed.

Memory Maintenance

The deeper problem: insights from L3 (daily logs) never promoted to L2 (MEMORY.md). Hard-won lessons stayed buried in raw logs, never becoming permanent knowledge.

Fix: scheduled maintenance every 3 days.

Process:

Read last 3 days of daily logs
Identify new lessons and critical decisions
Update MEMORY.md: add insights, prune stale entries
Review memory/self-review.md: any mistake at COUNT=3? Promote the fix to a permanent rule in AGENTS.md
Log maintenance in the daily diary

Time cost: 5-10 minutes every 3 days. Trade-off is obvious.

MISS/FIX Auto-Graduation

File: memory/self-review.md

Every mistake gets logged with a COUNT field. Each repeat increments the counter.

COUNT reaches 3 → fix auto-promoted to permanent rule in AGENTS.md
High severity (privacy, security) → immediate promotion, COUNT = 1

### MEMORY FAIL #2
TAG: Credentials
MISS: Asked for Zulip credentials without checking TOOLS.md
FIX: Always check TOOLS.md first, then memory_search, THEN ask
COUNT: 2
STATUS: Active

Systematic mistakes become systematic fixes. That’s the goal.

Compression Checkpoint Protocol

LLM contexts compress without warning. You lose work in progress.

At 70% context usage (140k/200k tokens), Daneel dumps current state to NOW.md.

## [2026-02-16 23:00] Checkpoint (context at 72%)

Working on: Gitea backup automation
Decisions made: Using daily cron at 8:00 CET
Pending: Test backup restore process
Key files: scripts/gitea-backup.sh, TOOLS.md#Gitea
Resume from: "Implement restore test"

When to checkpoint:

Context above 70%
Before complex multi-step work
Before any potentially risky operation
When accumulating important decisions that haven’t been written down yet

Implementation

Done in roughly one hour:

Shrink NOW.md to <1.5KB (was 2.8KB)
Create memory/self-review.md for MISS/FIX tracking
Document L1/L2/L3 in AGENTS.md
Update HEARTBEAT.md with maintenance schedule
Create memory/metrics.json for evaluation tracking
Schedule cron: memory maintenance every 3 days
Schedule cron: evaluation run on 2026-02-23

Evaluation

In one week, an automated cron job will analyze metrics.json:

Did memory fails decrease?
Is the maintenance overhead acceptable?
Are checkpoints actually being used?
Is NOW.md staying under 1.5KB?

Real data, not theory.

Why It Matters

Memory architecture is values made explicit. What you choose to remember, forget, and optimize for defines what the system becomes.

L1/L2/L3 isn’t just caching. It’s:

Intentionality — immediate recall vs. deep search, decided upfront
Maintenance — knowledge without upkeep rots
Learning — mistakes should compound into fixes, not repeat indefinitely

Daneel’s memory is now designed. Not accidental.

We’ll see in a week if it holds.

k@ai k@architecture