AI Agent Memory Systems in 2026: How OpenClaw Workspaces, SOUL.md, and Context Compaction Actually Work
AI agent memory in 2026 is three layers: workspace files like SOUL.md and MEMORY.md, runtime context with Sonnet 4.6 compaction, and Anthropic’s memory tool for long-term storage. Step-by-step setup, comparison tables, anti-patterns, and FAQ for agency operators.
Last updated: May 4, 2026
AI agent memory is the system an autonomous agent uses to retain identity, knowledge, and unfinished work across context boundaries. It combines files on disk, the live context window, and long-term storage. In 2026 the concept has split into three distinct layers. The workspace layer is plain Markdown on disk. The runtime layer lives inside the model context window. The long-term layer survives compaction or restart through a memory tool or external store. Confusing the three is the most common reason agency-deployed agents start strong and quietly degrade after a week.
This post walks through the memory architecture that ships in OpenClaw v2026.4.26 today, how it interacts with Anthropic’s April 2026 memory tool and Sonnet 4.6 automatic context compaction, what to put in each file, and the anti-patterns that turn a working agent into a forgetful one. The aim is a clear mental model you can take into a real client deployment, not a tour of every API.
Key takeaways
- Agent memory in 2026 is three layers: workspace files on disk (SOUL.md, MEMORY.md, AGENTS.md), runtime context inside the model window, and long-term storage via Anthropic’s memory tool or a vector store.
- OpenClaw assembles its workspace into the system prompt at session start. Sonnet 4.6 holds a 1M-token window and runs automatic compaction once it fills, so older turns are summarized server-side rather than dropped.
- Anthropic announced its memory tool on April 23, 2026. It writes Markdown into a /memories directory the model can read, write, and delete via tool calls, making memory exportable and editable instead of opaque.
- Workspace memory and the memory tool are complementary. Workspace files set identity and rules; the memory tool stores running facts the agent learned during a task.
- Most degrading agents are degrading because their MEMORY.md is unbounded. Capping at 200 lines and summarizing older notes into a CHANGELOG section keeps performance steady.
- Vector RAG is still useful, but only as the third layer for genuinely large knowledge bases. For a single client’s conversational history, plain files compress better and cite cleaner.
What "AI agent memory" actually means in 2026
The phrase "agent memory" papers over three different technologies that operate on different timescales. Treating them as one thing is what produces the "my agent forgets after a few hours" complaint that fills Reddit threads in 2026.
The three layers, ordered from shortest to longest timescale:
- Runtime memory. Whatever fits inside the active context window during the current call. With Sonnet 4.6 that is up to one million tokens, but practical sessions stay below 200K because cost and latency scale linearly. Runtime memory dies when the session ends.
- Workspace memory. Plain Markdown files on disk (SOUL.md, AGENTS.md, MEMORY.md, USER.md, TOOLS.md, HEARTBEAT.md, IDENTITY.md) that the gateway concatenates into the system prompt every time the agent starts. Workspace memory persists for the life of the workspace folder and is editable by humans.
- Long-term memory. Information the agent decides to keep beyond the current task. In 2026 there are two production-grade options: Anthropic’s memory tool (announced April 23, 2026), which writes files to a /memories directory via tool calls, and external vector stores (pgvector, Pinecone, Weaviate) for arbitrarily large bodies of text.
A well-built agent uses all three. Identity and operating rules live in the workspace. Today’s task uses runtime context. Anything the agent learned that should outlive the task gets written to the memory tool or a vector store. None of the three is optional. Missing any one of them produces a specific failure mode.
The OpenClaw workspace: seven files, one ontology
OpenClaw treats the workspace folder as the agent’s filesystem of record. The seven files are read in a strict precedence order at session start, concatenated, and injected into the system prompt before any user message. The set:
| File | Purpose | Edit cadence | Typical length |
|---|---|---|---|
SOUL.md | Personality, values, voice, immutable principles | Rarely (quarterly) | 30 to 80 lines |
IDENTITY.md | Name, role, who this specific deployment serves | Once at provisioning | 10 to 30 lines |
AGENTS.md | Operating rules, sub-agent routing, escalation policy | Weekly | 50 to 150 lines |
USER.md | What the agent knows about the human or client | On change | 20 to 100 lines |
TOOLS.md | Tool allowlist, denylist, usage notes | Per release | 30 to 100 lines |
HEARTBEAT.md | Scheduled tasks in plain English | Per task | 10 to 60 lines |
MEMORY.md | Running notes, facts learned, recent decisions | Continuously | Cap at 200 lines |
The mental model worth holding: SOUL.md and IDENTITY.md are who the agent is. AGENTS.md is how it behaves. USER.md and TOOLS.md are what it works with. HEARTBEAT.md is when it acts. MEMORY.md is what it remembers.
OpenClaw v2026.4.26 ships releases roughly every two days, and the workspace files have been stable since the late-March 2026 security cleanup. Names and ordering are unlikely to shift before the 0.15 series. The official reference lives at docs.openclaw.ai/concepts/memory, and the heartbeat docs are at docs.openclaw.ai/gateway/heartbeat.
How context compaction extends memory beyond the window
Anthropic’s context-compaction-2026-02-01 beta turned on automatic server-side summarization for Sonnet 4.5 and stayed on by default for Sonnet 4.6 and Opus 4.7. When a conversation gets within roughly 80% of the model’s context limit, older turns are replaced with a summary the model itself produces. The new condensed history takes the place of the original, freeing room for new turns.
For an agency operator that means three practical things.
- Long-running sessions stop dying. A multi-hour task that previously hit the limit and reset now keeps going past the boundary. The gateway sees a continuous conversation. The model sees a compacted one.
- Specific details can disappear into the summary. Compaction is lossy by design. A client’s account number mentioned 30 turns ago might survive or might not. Anything the agent must remember after compaction belongs in MEMORY.md or in the memory tool, not in conversation.
- Cost grows roughly linearly with elapsed time, not with raw turn count. Compaction collapses old turns into shorter form, so the running input cost stops doubling indefinitely.
The interaction with the memory tool is the part most teams miss. Compaction summarizes; the memory tool persists. They are not redundant. A long-horizon agent should write any commitment, deadline, account, or user preference to /memories the moment it learns it, on the assumption that the conversation it appeared in will be compressed away within the hour.
Set up workspace memory in OpenClaw, step by step
This is the path most agencies follow when provisioning a new client. Each step assumes you already have a running OpenClaw gateway and you are SSH’d into the host or running commands inside the per-client container.
1. Create the workspace folder.
mkdir -p /opt/openclaw/workspaces/acme-dental
cd /opt/openclaw/workspaces/acme-dental
2. Initialize the seven kernel files.
touch SOUL.md IDENTITY.md AGENTS.md USER.md TOOLS.md HEARTBEAT.md MEMORY.md
3. Write SOUL.md. Keep it short. Voice, values, refusal policy.
cat > SOUL.md << 'EOF'
# Soul
Voice: warm, brief, never salesy.
Values: protect the patient’s time. Confirm before booking.
Refuses: medical advice, prescription questions, billing disputes.
EOF
4. Write IDENTITY.md.
cat > IDENTITY.md << 'EOF'
Name: Sage
Role: Front-desk assistant for Acme Dental.
Hours: Mon-Fri 7am-7pm Eastern.
Languages: English, Spanish.
EOF
5. Write AGENTS.md. Routing rules and escalation.
cat > AGENTS.md << 'EOF'
- Booking requests: confirm patient name, date of birth, and reason. Use the gohighlevel.book_appointment tool.
- Insurance questions: collect payer name, route to a human via slack.send_to_billing.
- After 3 unanswered clarifications: hand off with summary.
EOF
6. Configure HEARTBEAT.md. Scheduled tasks in plain language. OpenClaw checks the file every 30 minutes.
cat > HEARTBEAT.md << 'EOF'
Every weekday at 8am Eastern:
Pull tomorrow’s appointments from GHL and send confirmation SMS.
Every Monday at 9am Eastern:
Summarize last week’s missed calls. Post to #front-desk in Slack.
EOF
7. Bound MEMORY.md. The agent will write to this file. Set a shape now so unbounded growth is not the default.
cat > MEMORY.md << 'EOF'
# Memory (most recent first, prune below 200 lines)
## Recent decisions
-
## Facts learned
-
## Changelog (auto-summarized weekly)
-
EOF
8. Restart the gateway and confirm load.
openclaw gateway restart
openclaw workspace status acme-dental
The status command prints the seven files and the line count of each. If any are missing or any are over the recommended cap, the gateway warns at startup. From here, the agent reads the workspace at every session start. Humans can edit any file at any time; changes apply on the next session.
OpenClaw vs Anthropic memory tool vs Claude Code CLAUDE.md vs vector RAG
Four memory technologies dominate 2026 deployments. They are often confused, partly because the file conventions overlap. The differences matter for production.
| System | Storage | Persistence | Best for | Failure mode |
|---|---|---|---|---|
| OpenClaw workspace | Markdown files on disk | Permanent until edited | Identity, rules, scheduled tasks | Drift if files are not curated |
| Anthropic memory tool | Files in /memories via tool calls | Across sessions on the platform | Facts the agent learned in a task | Unbounded growth without pruning |
| Claude Code CLAUDE.md | Single Markdown file in repo | Per project, version controlled | Coding rules, repo conventions | Compliance degrades past 200 lines |
| Vector RAG (pgvector, Pinecone) | Embeddings in a database | Permanent, searchable | Large reference corpora | Noisy retrieval, citation gaps |
For a typical agency client, the right starting stack is OpenClaw workspace plus the memory tool. CLAUDE.md is for engineering teams using Claude Code as a developer tool, not for client-facing agents. Vector RAG enters the picture only when a client has a body of reference material (product manuals, legal documents, internal wikis) that genuinely will not fit even with compaction.
Memory anti-patterns that quietly break agents in production
Six failure modes show up repeatedly in agency deployments. Each has a fix that is more boring than its root cause.
1. Unbounded MEMORY.md. The agent writes a note every interaction and never prunes. By week two the system prompt is 4,000 lines and the agent forgets its instructions. Fix: cap at 200 lines, run a weekly summarization cron that compresses older notes into a CHANGELOG section.
2. Identity drift. SOUL.md is edited mid-session by a well-meaning operator. The agent’s voice changes, customers notice. Fix: treat SOUL.md as read-only outside a quarterly review. Track changes in git.
3. Memory tool used as a journal. The agent writes a long entry to /memories every turn. Token cost on the next session balloons. Fix: only write atomic facts (one fact per file, short title, dated). Read selectively, not in bulk.
4. Compaction-blind context engineering. The system prompt assumes the user’s first message will still be visible 50 turns later. After compaction it is not. Fix: re-state task-critical context every 10 to 20 turns, or persist it to memory.
5. Vector RAG without metadata. Embeddings retrieve passages with no source, no date, no author. The agent cites the wrong document. Fix: always store source URL, last-modified date, and author in the metadata column. Filter on those before semantic ranking.
6. Per-client containers sharing a memory store. Two clients see each other’s data. Fix: isolate the /memories directory per container, and audit at provisioning. The OpenClaw gateway enforces this when configured correctly. Verify, do not assume.
When workspace memory is not for you
This pattern is wrong for a few specific cases. Worth naming so nobody force-fits it.
- Stateless one-shot agents. A classifier that scores a single email has no need for SOUL.md or MEMORY.md. Pass instructions in the system prompt, return JSON, exit. Workspace memory adds startup cost for no benefit.
- Real-time chat at consumer scale. If you are running thousands of concurrent end-user sessions per second, the per-session disk read becomes a bottleneck. Use an in-memory cache layer in front of the workspace files, or pre-bake the merged system prompt.
- Strict regulatory environments where every prompt token must be auditable. Compaction summaries are model-generated. If a regulator demands the exact tokens shown to the model, run with compaction disabled and architect for shorter sessions.
- Workloads dominated by retrieval over reasoning. A pure search assistant over a 10-million-document corpus is better served by a vector index plus a small model than by an agent with a workspace.
For everything in between (the long tail of agency deployments handling appointments, qualification, follow-up, support, internal ops), workspace memory plus the memory tool is the 2026 default. The cost of building it is one afternoon. The cost of not building it shows up four weeks later when the agent stops behaving the way it did at launch.
Frequently asked questions
Does Sonnet 4.6’s 1M context window mean I do not need workspace memory?
No. The 1M window holds runtime context for the current session, but every new session starts empty. Workspace memory is what the agent reads at the start of every session to know who it is. Without it, the model is a blank slate at session zero, regardless of how much context the window can hold.
Should I use Anthropic’s memory tool or write to a database?
Start with the memory tool. It was announced April 23, 2026, ships as files in a /memories directory, and is exportable, editable, and inspectable through the Claude Console or the API. A database is the right answer once you need cross-agent search, structured queries, or volumes that exceed a few hundred files. Until then the file-based memory tool is simpler and cheaper.
How does compaction interact with the memory tool?
They run in different layers. Compaction summarizes the active conversation when it nears the context limit. The memory tool persists named files across sessions. A practical pattern: at the end of every task the agent writes a short summary to /memories with a clear name (for example, 2026-05-04-acme-policy-update.md). Even after compaction or a session reset, that file remains and can be re-read on the next task.
Can I edit MEMORY.md while the agent is running?
Yes, with one caveat: the change applies on the next session start, not mid-session. If the agent is in a long-running heartbeat run, your edit is picked up after the current run ends. For urgent changes (a wrong fact, a leaked PII), restart the workspace to force an immediate reload.
What about CLAUDE.md from Claude Code? Is that the same thing?
Same idea, different surface. CLAUDE.md is a single Markdown file Claude Code reads at the top of every session in a repository, used to teach the coding agent your conventions. It is the right tool for engineering use cases. OpenClaw’s seven-file workspace is the same idea generalized to non-coding agents (front-desk assistants, sales qualifiers, support workers) where identity, scheduled tasks, and per-client knowledge matter as much as coding rules.
How big should MEMORY.md actually get?
Cap it at 200 lines. Beyond 200, model compliance with the rest of the system prompt starts to degrade. Anthropic’s own Claude Code guidance reaches the same number for CLAUDE.md. When the file approaches the cap, run a weekly summarization that pushes older notes into a "Changelog" section as one-line entries, then deletes them from the active section. Treat MEMORY.md the way a good ops engineer treats a logfile: rotate, summarize, archive.
Pulling the layers together
An agent that survives a year in production has all three layers configured. A small, stable workspace defines identity and rules. A runtime context benefits from compaction without depending on it. A long-term memory is written to selectively, pruned regularly, and audited per client. None of this is exotic infrastructure. It is the boring discipline of treating an agent like a long-running service rather than a one-off prompt.
If you are deploying for clients and want the workspace plus memory tool plus per-client isolation already wired up, Kyra ships it as the default. The same architecture is open source under OpenClaw if you would rather run it yourself; the public docs live at docs.openclaw.ai/concepts/memory, the source is on GitHub, and Anthropic’s official memory tool reference is at platform.claude.com. Anthropic’s context management notes are at anthropic.com/news/context-management. For a deeper companion read, the architecture-level walkthrough in what is OpenClaw covers the gateway side, and the first Claude Skill guide shows what to build once memory is in place. Vertical examples live at AI for dental practices.
The Kyra Team
Conversion System
We build white-label AI workforce infrastructure for digital agencies on top of OpenClaw. We publish practical guides on deploying AI agents, self-hosted AI, and multi-channel workforce design.
Try Kyra free
No credit card. Powered by OpenClaw. First AI worker live in under 2 minutes.
Related reading
AI Infrastructure
Self-Hosted AI Cost vs Cloud LLM Bills in 2026: The Honest Math for Agencies
16 min read
AI Infrastructure
Per-Client AI Container Isolation in 2026: How Agencies Run 50+ AI Workers Without Cross-Contamination
12 min read
AI Infrastructure
AI Data Sovereignty in 2026: Why Self-Hosted Is Winning Regulated Industries
12 min read