Claude Code Memory & Skill Architecture: A Three-Layer Framework for AI Agent Teams
How we organized memory, skills, and project instructions to reduce token waste by 35% and onboard new team members instantly.
At a Glance
| Problem | Memory files growing unbounded, context window waste, no way to share AI setup with teammates |
|---|---|
| Solution | Three-layer architecture: CLAUDE.md + lean MEMORY.md + on-demand Skills |
| Key result | ~35% token reduction, zero-training team onboarding, 25+ reusable skills |
| Applies to | Claude Code, but concepts work with any AI coding agent |
| Template | Download the .md framework file |
The Problem with Unstructured AI Memory
After using Claude Code intensively for several months — managing 5 AIA CEU courses, 20+ blog articles, and a growing library of automation scripts — we hit a familiar wall.
Our memory files kept growing. Every time we wanted the AI to remember something, we added it. After a few months, a typical conversation was loading 5,000+ tokens of memory before anything useful was even said. Worse, most of that memory wasn't relevant to what we were currently doing.
We also had a team onboarding problem. Whenever a new team member started using Claude Code on our project, they had to manually teach the AI all our conventions. There was no way to share an AI "configuration" the way you share a .eslintrc or a tsconfig.json.
The third problem was discoverability. We had built some excellent workflows — multi-agent AIA course rewriting, security scanning before every push, automated article publishing. But to use them, you had to remember the exact slash command. New people didn't know what tools existed.
The Solution: Three-Layer Architecture
Layer 2: Memory (MEMORY.md) → Lean index, pointers only (~200 tokens)
Layer 3: Skills (SKILL.md) → Full manuals, loaded ONLY when triggered
The core insight: not all context needs to be loaded all the time. Separate the always-relevant from the occasionally-relevant, and let the AI load the latter on demand.
Layer 1: CLAUDE.md (Always Loaded)
This is the AI's "onboarding guide." Every time anyone opens Claude Code in your project directory, the AI reads this file automatically. It's the contract between your team and the AI.
What goes here:
- Language rules (e.g., "discuss in Traditional Chinese, code in English")
- The intent detection table — see below, this is the most important part
- Content architecture (what goes where in your project)
- Team conventions (commit format, deployment process)
What does NOT go here:
- Detailed workflows — put these in Skills
- Reference data — put in Skill
references/directories - Personal preferences — keep in local
~/.claude/Memory
The Intent Detection Table — The Key Innovation
Instead of requiring users to memorize slash commands, write a behavior-to-skill mapping table directly in CLAUDE.md:
## Behavior Rules — Proactive Skill Suggestion
Do NOT wait for slash commands. Detect user intent:
- Clear intent → auto-use the skill
- Ambiguous intent → suggest the skill
| When the user... | Auto-suggest or use... |
|-------------------------------------|------------------------|
| Discusses writing an article | /publish-article |
| Mentions deploying or uploading | /upload |
| Says "done" or "finished editing" | /upload |
| Reviews content for accuracy | /content-review |
| Is about to git push | /security-check |
The AI reads this table and proactively suggests the right skill — no memorization needed. This single change eliminated about 90% of the "I didn't know that existed" onboarding friction.
Layer 2: Memory (Lean Index)
Target: under 4,000 tokens total across all memory files.
Memory should contain ONLY four types of information:
- Behavior rules — 3–5 lines each. How the AI should work with you.
- Skill pointers — One line each. "For X, use /skill-y."
- Design preferences — Brief. UI style, coding conventions.
- Project index — Links to detail files, not the details themselves.
# Memory Index
## Behavior Rules
- [feedback_workflow.md](feedback_workflow.md) — Plan before coding, never jump straight to implementation
- [feedback_delegate.md](feedback_delegate.md) — Use sub-agents for execution, don't code directly
## Skill Pointers
- Writing articles → /publish-article
- Deploying → /upload
- Security scan → /security-check (mandatory before every push)
## Preferences
- [feedback_ui_style.md](feedback_ui_style.md) — Large fonts (16-20px), spacious layout
## Active Projects
- [projects_active.md](projects_active.md) — Links to 3 currently active projects
skill/references/ and be loaded only when that skill is triggered.
Layer 3: Skills (On-Demand Loading)
Skills are the on-demand half of the system. They can be as detailed as needed — 500 lines, 50 files, full reference documentation — because they're loaded only when triggered.
.claude/skills/
├── CATALOG.md ← Overview of all skills (human + AI readable)
├── publish-article/
│ ├── SKILL.md ← Trigger conditions + operation manual
│ └── references/ ← Detailed docs, loaded when needed
├── security-check/
│ └── SKILL.md
├── aia-rewrite/
│ ├── SKILL.md
│ └── references/
│ ├── writing-guide.md
│ └── aia-standards.md
The frontmatter of each SKILL.md tells the AI when to load it:
---
name: publish-article
description: Generate blog articles for watersonusa.ai. Use when the user
says "write article", "generate content", "新文章", or discusses
creating content for the website.
---
The description field is what the AI reads to decide whether to trigger the skill. Write it comprehensively — include synonyms, alternative phrasings, and the languages your team uses.
Setup Steps
- Audit your existing memory. List every memory file. For each one, ask: Is this a repeatable workflow? (→ move to Skill) Is this reference data? (→ move to Skill references/) Is this used in less than 20% of conversations? (→ delete or move to Skill) What remains is your new lean Memory.
- Build skills from extracted workflows. For each workflow you pulled out of Memory, create a skill directory. Write the SKILL.md with frontmatter trigger conditions, ordered steps, exact commands, and quality constraints.
- Write CLAUDE.md with intent detection. This is the highest-leverage step. Draft the intent detection table carefully — it determines what the AI does proactively versus what it waits to be told.
- Compress remaining memory. Go through each remaining memory file. Can this be said in 3 lines instead of 30? Compress it. Is the detail needed every conversation? No — move it to a skill reference. Target: MEMORY.md index under 30 lines, total memory under 4,000 tokens.
-
Separate shared from private. Commit CLAUDE.md and the skills directory to git. Keep personal preferences and private project records in
~/.claude/(never committed). Team members who clone the repo immediately get the full setup.
.eslintrc — committed to git, the same for everyone. Personal memory is like your editor settings — stays on your machine.
Results
| Metric | Before | After |
|---|---|---|
| Memory token consumption | 5,000+ per conversation | < 4,000 (−35%) |
| New member onboarding | Manual teaching session (hours) | Clone repo → open Claude Code → auto |
| Skill triggering | Memorize slash commands | Intent detection (natural language) |
| Team config sharing | Manual file transfer | git clone |
| Context window waste | High — unused info loaded every time | Low — on-demand loading |
How This Compares to Industry Best Practices
Our three-layer architecture aligns with the latest AI agent memory management patterns from 2025–2026. Here's how what we built maps to the terminology researchers and practitioners are using:
What we already do
| Technique | Industry term | Our implementation |
|---|---|---|
| Three-layer separation | Memory Tiering | CLAUDE.md + MEMORY.md + Skills |
| Index pointers | Pointer Index System | MEMORY.md with one-line pointers |
| On-demand loading | Progressive Disclosure / Selective Re-injection | Skills loaded only when triggered |
| Sub-agent summarization | Sub-agent Distillation | 12-role agents report summaries to orchestrator |
| Team sharing via repo | Shared Project Config | CLAUDE.md + Skills in GitHub |
| Behavior-based triggering | Intent Detection | CLAUDE.md behavior-to-skill mapping table |
What you could add next
| Technique | What it does | How to implement |
|---|---|---|
| AutoCompact | Auto-compress conversation when context hits ~92% | Add to CLAUDE.md: "In long conversations, proactively summarize completed tasks" |
| AutoDream | Background agent consolidates memory (merge duplicates, prune stale) | Build a /memory-cleanup skill that runs periodically |
| Memory Decay | Old memories auto-expire | Add dates to memory files, flag anything > 90 days for review |
| MCP Integration | Pull context from GitHub Issues, Slack, Jira | Already using Supabase for storyboard sync — extend the pattern |
| Small Model Filter | Lightweight model pre-filters memory relevance | Use Haiku to score memory relevance before injecting into context |
The Core Principle
The context window is finite. Every token loaded at conversation start that turns out to be irrelevant is waste. The three-layer architecture applies the same logic as lazy loading in software: load only what you need, when you need it. The difference is that here, the "import cost" is your ability to think clearly about the actual task at hand.
Continue Reading
This article focuses on the three-layer architecture for organizing AI memory. For advanced topics built on top of this foundation, see:
- Building AI Agent Teams with OGSM — How to organize multi-agent workflows using the Objective-Goal-Strategy-Measure framework
- Multi-AI Collaboration: Claude + Gemini + Codex — How to distribute work across multiple AI providers to maximize throughput and minimize cost
FAQ
Does this only work with Claude Code?
The CLAUDE.md mechanism is specific to Claude Code. But the underlying principles — lean always-loaded context, on-demand detailed documentation, intent-based triggering — apply to any AI agent system. You can implement equivalent structures with Cursor rules files, GitHub Copilot workspace configuration, or custom system prompts.
How do I decide what goes in a skill versus what stays in memory?
Apply this filter: if you would include it in documentation handed to a new employee on their first day, it belongs in a skill. If you would say it verbally in a 10-second handoff, it belongs in memory. Procedures, reference data, multi-step workflows, and anything longer than five lines is almost always a skill.
What happens when a skill gets very large?
That's expected and fine. Skills can have a references/ subdirectory with arbitrarily many files. The skill's SKILL.md acts as a table of contents — it specifies which reference files to load for which sub-tasks. A skill for a complex publication workflow might have 10 reference files totaling thousands of lines, and none of that weight appears in your context unless you're actually publishing.
How do we handle skills that multiple projects share?
Claude Code supports two locations for skills: project-level (.claude/skills/ in the repo, shared via git) and user-level (~/.claude/skills/ on your machine, private). Generic utility skills like security scanning or git operations live at user level. Project-specific skills like course rewriting or article publishing live at project level.
Want the raw .md file to use in your own project?
The full framework template — including the audit checklist, MEMORY.md structure, and SKILL.md frontmatter format — is available as a standalone markdown file.
Download the template →