What is the three-layer Claude Code memory architecture?

The three-layer architecture separates AI context into: Layer 1 (CLAUDE.md) — always loaded every conversation (~800 tokens), containing onboarding rules and intent detection; Layer 2 (MEMORY.md) — a lean index of pointers (~200 tokens), never storing full workflows; Layer 3 (Skills) — on-demand detailed manuals loaded only when triggered. This separation prevents the context window from being filled with rarely-used information.

How much can this architecture reduce Claude Code token consumption?

In our experience managing 5 AIA CEU courses, 20+ blog articles, and 25+ skills, we reduced total memory token consumption from 5,000+ tokens per conversation to under 4,000 tokens — a reduction of approximately 35%. The biggest wins came from moving complete workflows out of MEMORY.md and into skill files that are only loaded when relevant.

What is the intent detection table in CLAUDE.md and why does it matter?

The intent detection table is a mapping of user behaviors to skill triggers. Instead of requiring team members to memorize slash commands like /publish-article or /security-check, you write a table telling the AI: 'when the user discusses writing an article, suggest /publish-article.' The AI reads this table every conversation and proactively suggests the right skill. This is the most impactful part of the framework for team adoption, because new members get the full skill system without any training.

How does this architecture help onboard new team members?

New team members simply clone the git repository. They immediately get CLAUDE.md (which tells the AI how to work with the team) and the full skills directory (all reusable workflows). The AI's intent detection table means they don't need to memorize any commands. Their personal preferences stay in their local ~/.claude/ directory and are never committed. This reduces onboarding from hours of manual AI teaching to a single git clone command.

What should NOT go in Claude Code memory files?

Memory files should not contain: complete step-by-step workflows (these belong in skill SKILL.md files), reference documentation or API docs (belong in skill references/ folders), detailed procedures used less than 20% of conversations, or anything duplicated in a skill file. Memory should only contain behavior rules (3-5 lines each), skill pointers (one line per skill), brief design preferences, and 1-line project index entries pointing to detail files.

Claude Code Memory & Skill Architecture: A Three-Layer Framework for AI Agent Teams

How we organized memory, skills, and project instructions to reduce token waste by 35% and onboard new team members instantly.

By Waterson AI Team · April 9, 2026 · 8 min read

At a Glance

Problem	Memory files growing unbounded, context window waste, no way to share AI setup with teammates
Solution	Three-layer architecture: CLAUDE.md + lean MEMORY.md + on-demand Skills
Key result	~35% token reduction, zero-training team onboarding, 25+ reusable skills
Applies to	Claude Code, but concepts work with any AI coding agent
Template	Download the .md framework file

The Problem with Unstructured AI Memory

After using Claude Code intensively for several months — managing 5 AIA CEU courses, 20+ blog articles, and a growing library of automation scripts — we hit a familiar wall.

Our memory files kept growing. Every time we wanted the AI to remember something, we added it. After a few months, a typical conversation was loading 5,000+ tokens of memory before anything useful was even said. Worse, most of that memory wasn't relevant to what we were currently doing.

The real cost of bloated memory: Every token loaded into the context window at conversation start is a token that can't be used for actual work. On a 200k-token window, 5,000 tokens of memory overhead doesn't sound like much — until you realize that 80% of it is being loaded in conversations where it's completely irrelevant.

We also had a team onboarding problem. Whenever a new team member started using Claude Code on our project, they had to manually teach the AI all our conventions. There was no way to share an AI "configuration" the way you share a .eslintrc or a tsconfig.json.

The third problem was discoverability. We had built some excellent workflows — multi-agent AIA course rewriting, security scanning before every push, automated article publishing. But to use them, you had to remember the exact slash command. New people didn't know what tools existed.

The Solution: Three-Layer Architecture

Layer 1: CLAUDE.md → AI reads this EVERY conversation (~800 tokens)
Layer 2: Memory (MEMORY.md) → Lean index, pointers only (~200 tokens)
Layer 3: Skills (SKILL.md) → Full manuals, loaded ONLY when triggered

The core insight: not all context needs to be loaded all the time. Separate the always-relevant from the occasionally-relevant, and let the AI load the latter on demand.

Layer 1: CLAUDE.md (Always Loaded)

This is the AI's "onboarding guide." Every time anyone opens Claude Code in your project directory, the AI reads this file automatically. It's the contract between your team and the AI.

What goes here:

Language rules (e.g., "discuss in Traditional Chinese, code in English")
The intent detection table — see below, this is the most important part
Content architecture (what goes where in your project)
Team conventions (commit format, deployment process)

What does NOT go here:

Detailed workflows — put these in Skills
Reference data — put in Skill references/ directories
Personal preferences — keep in local ~/.claude/ Memory

The Intent Detection Table — The Key Innovation

Instead of requiring users to memorize slash commands, write a behavior-to-skill mapping table directly in CLAUDE.md:

## Behavior Rules — Proactive Skill Suggestion

Do NOT wait for slash commands. Detect user intent:
- Clear intent → auto-use the skill
- Ambiguous intent → suggest the skill

| When the user...                    | Auto-suggest or use...  |
|-------------------------------------|------------------------|
| Discusses writing an article        | /publish-article       |
| Mentions deploying or uploading     | /upload                |
| Says "done" or "finished editing"   | /upload                |
| Reviews content for accuracy        | /content-review        |
| Is about to git push                | /security-check        |

The AI reads this table and proactively suggests the right skill — no memorization needed. This single change eliminated about 90% of the "I didn't know that existed" onboarding friction.

Layer 2: Memory (Lean Index)

Target: under 4,000 tokens total across all memory files.

Memory should contain ONLY four types of information:

Behavior rules — 3–5 lines each. How the AI should work with you.
Skill pointers — One line each. "For X, use /skill-y."
Design preferences — Brief. UI style, coding conventions.
Project index — Links to detail files, not the details themselves.

# Memory Index

## Behavior Rules
- [feedback_workflow.md](feedback_workflow.md) — Plan before coding, never jump straight to implementation
- [feedback_delegate.md](feedback_delegate.md) — Use sub-agents for execution, don't code directly

## Skill Pointers
- Writing articles → /publish-article
- Deploying → /upload
- Security scan → /security-check (mandatory before every push)

## Preferences
- [feedback_ui_style.md](feedback_ui_style.md) — Large fonts (16-20px), spacious layout

## Active Projects
- [projects_active.md](projects_active.md) — Links to 3 currently active projects

The most common mistake: Putting full API documentation, step-by-step procedures, or reference data directly in memory files. These should live in skill/references/ and be loaded only when that skill is triggered.

Layer 3: Skills (On-Demand Loading)

Skills are the on-demand half of the system. They can be as detailed as needed — 500 lines, 50 files, full reference documentation — because they're loaded only when triggered.

.claude/skills/
├── CATALOG.md              ← Overview of all skills (human + AI readable)
├── publish-article/
│   ├── SKILL.md            ← Trigger conditions + operation manual
│   └── references/         ← Detailed docs, loaded when needed
├── security-check/
│   └── SKILL.md
├── aia-rewrite/
│   ├── SKILL.md
│   └── references/
│       ├── writing-guide.md
│       └── aia-standards.md

The frontmatter of each SKILL.md tells the AI when to load it:

---
name: publish-article
description: Generate blog articles for watersonusa.ai. Use when the user
  says "write article", "generate content", "新文章", or discusses
  creating content for the website.
---

The description field is what the AI reads to decide whether to trigger the skill. Write it comprehensively — include synonyms, alternative phrasings, and the languages your team uses.

Setup Steps

Audit your existing memory. List every memory file. For each one, ask: Is this a repeatable workflow? (→ move to Skill) Is this reference data? (→ move to Skill references/) Is this used in less than 20% of conversations? (→ delete or move to Skill) What remains is your new lean Memory.
Build skills from extracted workflows. For each workflow you pulled out of Memory, create a skill directory. Write the SKILL.md with frontmatter trigger conditions, ordered steps, exact commands, and quality constraints.
Write CLAUDE.md with intent detection. This is the highest-leverage step. Draft the intent detection table carefully — it determines what the AI does proactively versus what it waits to be told.
Compress remaining memory. Go through each remaining memory file. Can this be said in 3 lines instead of 30? Compress it. Is the detail needed every conversation? No — move it to a skill reference. Target: MEMORY.md index under 30 lines, total memory under 4,000 tokens.
Separate shared from private. Commit CLAUDE.md and the skills directory to git. Keep personal preferences and private project records in ~/.claude/ (never committed). Team members who clone the repo immediately get the full setup.

Shared vs private split: Think of CLAUDE.md and skills as the team's .eslintrc — committed to git, the same for everyone. Personal memory is like your editor settings — stays on your machine.

Results

Metric	Before	After
Memory token consumption	5,000+ per conversation	< 4,000 (−35%)
New member onboarding	Manual teaching session (hours)	Clone repo → open Claude Code → auto
Skill triggering	Memorize slash commands	Intent detection (natural language)
Team config sharing	Manual file transfer	`git clone`
Context window waste	High — unused info loaded every time	Low — on-demand loading

How This Compares to Industry Best Practices

Our three-layer architecture aligns with the latest AI agent memory management patterns from 2025–2026. Here's how what we built maps to the terminology researchers and practitioners are using:

What we already do

Technique	Industry term	Our implementation
Three-layer separation	Memory Tiering	CLAUDE.md + MEMORY.md + Skills
Index pointers	Pointer Index System	MEMORY.md with one-line pointers
On-demand loading	Progressive Disclosure / Selective Re-injection	Skills loaded only when triggered
Sub-agent summarization	Sub-agent Distillation	12-role agents report summaries to orchestrator
Team sharing via repo	Shared Project Config	CLAUDE.md + Skills in GitHub
Behavior-based triggering	Intent Detection	CLAUDE.md behavior-to-skill mapping table

What you could add next

Technique	What it does	How to implement
AutoCompact	Auto-compress conversation when context hits ~92%	Add to CLAUDE.md: "In long conversations, proactively summarize completed tasks"
AutoDream	Background agent consolidates memory (merge duplicates, prune stale)	Build a `/memory-cleanup` skill that runs periodically
Memory Decay	Old memories auto-expire	Add dates to memory files, flag anything > 90 days for review
MCP Integration	Pull context from GitHub Issues, Slack, Jira	Already using Supabase for storyboard sync — extend the pattern
Small Model Filter	Lightweight model pre-filters memory relevance	Use Haiku to score memory relevance before injecting into context

Bottom line: If you've followed the three-layer setup, you're already implementing most of what the research community is writing about. The next frontier is automation — making the system self-maintaining rather than manually curated.

The Core Principle

The context window is finite. Every token loaded at conversation start that turns out to be irrelevant is waste. The three-layer architecture applies the same logic as lazy loading in software: load only what you need, when you need it. The difference is that here, the "import cost" is your ability to think clearly about the actual task at hand.

Continue Reading

This article focuses on the three-layer architecture for organizing AI memory. For advanced topics built on top of this foundation, see:

Building AI Agent Teams with OGSM — How to organize multi-agent workflows using the Objective-Goal-Strategy-Measure framework
Multi-AI Collaboration: Claude + Gemini + Codex — How to distribute work across multiple AI providers to maximize throughput and minimize cost

FAQ

Does this only work with Claude Code?

The CLAUDE.md mechanism is specific to Claude Code. But the underlying principles — lean always-loaded context, on-demand detailed documentation, intent-based triggering — apply to any AI agent system. You can implement equivalent structures with Cursor rules files, GitHub Copilot workspace configuration, or custom system prompts.

How do I decide what goes in a skill versus what stays in memory?

Apply this filter: if you would include it in documentation handed to a new employee on their first day, it belongs in a skill. If you would say it verbally in a 10-second handoff, it belongs in memory. Procedures, reference data, multi-step workflows, and anything longer than five lines is almost always a skill.

What happens when a skill gets very large?

That's expected and fine. Skills can have a references/ subdirectory with arbitrarily many files. The skill's SKILL.md acts as a table of contents — it specifies which reference files to load for which sub-tasks. A skill for a complex publication workflow might have 10 reference files totaling thousands of lines, and none of that weight appears in your context unless you're actually publishing.

How do we handle skills that multiple projects share?

Claude Code supports two locations for skills: project-level (.claude/skills/ in the repo, shared via git) and user-level (~/.claude/skills/ on your machine, private). Generic utility skills like security scanning or git operations live at user level. Project-specific skills like course rewriting or article publishing live at project level.

Want the raw .md file to use in your own project?

The full framework template — including the audit checklist, MEMORY.md structure, and SKILL.md frontmatter format — is available as a standalone markdown file.

Download the template →

About this article: Developed by the Waterson USA AI team. Based on real-world experience managing 5 AIA CEU courses, 20+ blog articles, and 25+ skills across multiple projects using Claude Code. The framework described here is actively used in production at watersonusa.ai.