Claude Code Memory & Skill Architecture: A Three-Layer Framework for AI Agent Teams
How we organized memory, skills, and project instructions to reduce token waste by 35% and onboard new team members instantly.
At a Glance
| Problem | Memory files growing unbounded, context window waste, no way to share AI setup with teammates |
|---|---|
| Solution | Three-layer architecture: CLAUDE.md + lean MEMORY.md + on-demand Skills |
| Key result | ~35% token reduction, zero-training team onboarding, 25+ reusable skills |
| Applies to | Claude Code, but concepts work with any AI coding agent |
| Template | Download the .md framework file |
The Problem with Unstructured AI Memory
After using Claude Code intensively for several months — managing 5 AIA CEU courses, 20+ blog articles, and a growing library of automation scripts — we hit a familiar wall.
Our memory files kept growing. Every time we wanted the AI to remember something, we added it. After a few months, a typical conversation was loading 5,000+ tokens of memory before anything useful was even said. Worse, most of that memory wasn't relevant to what we were currently doing.
We also had a team onboarding problem. Whenever a new team member started using Claude Code on our project, they had to manually teach the AI all our conventions. There was no way to share an AI "configuration" the way you share a .eslintrc or a tsconfig.json.
The third problem was discoverability. We had built some excellent workflows — multi-agent AIA course rewriting, security scanning before every push, automated article publishing. But to use them, you had to remember the exact slash command. New people didn't know what tools existed.
The Solution: Three-Layer Architecture
Layer 2: Memory (MEMORY.md) → Lean index, pointers only (~200 tokens)
Layer 3: Skills (SKILL.md) → Full manuals, loaded ONLY when triggered
The core insight: not all context needs to be loaded all the time. Separate the always-relevant from the occasionally-relevant, and let the AI load the latter on demand.
Layer 1: CLAUDE.md (Always Loaded)
This is the AI's "onboarding guide." Every time anyone opens Claude Code in your project directory, the AI reads this file automatically. It's the contract between your team and the AI.
What goes here:
- Language rules (e.g., "discuss in Traditional Chinese, code in English")
- The intent detection table — see below, this is the most important part
- Content architecture (what goes where in your project)
- Team conventions (commit format, deployment process)
What does NOT go here:
- Detailed workflows — put these in Skills
- Reference data — put in Skill
references/directories - Personal preferences — keep in local
~/.claude/Memory
The Intent Detection Table — The Key Innovation
Instead of requiring users to memorize slash commands, write a behavior-to-skill mapping table directly in CLAUDE.md:
## Behavior Rules — Proactive Skill Suggestion
Do NOT wait for slash commands. Detect user intent:
- Clear intent → auto-use the skill
- Ambiguous intent → suggest the skill
| When the user... | Auto-suggest or use... |
|-------------------------------------|------------------------|
| Discusses writing an article | /publish-article |
| Mentions deploying or uploading | /upload |
| Says "done" or "finished editing" | /upload |
| Reviews content for accuracy | /content-review |
| Is about to git push | /security-check |
The AI reads this table and proactively suggests the right skill — no memorization needed. This single change eliminated about 90% of the "I didn't know that existed" onboarding friction.
Layer 2: Memory (Lean Index)
Target: under 4,000 tokens total across all memory files.
Memory should contain ONLY four types of information:
- Behavior rules — 3–5 lines each. How the AI should work with you.
- Skill pointers — One line each. "For X, use /skill-y."
- Design preferences — Brief. UI style, coding conventions.
- Project index — Links to detail files, not the details themselves.
# Memory Index
## Behavior Rules
- [feedback_workflow.md](feedback_workflow.md) — Plan before coding, never jump straight to implementation
- [feedback_delegate.md](feedback_delegate.md) — Use sub-agents for execution, don't code directly
## Skill Pointers
- Writing articles → /publish-article
- Deploying → /upload
- Security scan → /security-check (mandatory before every push)
## Preferences
- [feedback_ui_style.md](feedback_ui_style.md) — Large fonts (16-20px), spacious layout
## Active Projects
- [projects_active.md](projects_active.md) — Links to 3 currently active projects
skill/references/ and be loaded only when that skill is triggered.
Layer 3: Skills (On-Demand Loading)
Skills are the on-demand half of the system. They can be as detailed as needed — 500 lines, 50 files, full reference documentation — because they're loaded only when triggered.
.claude/skills/
├── CATALOG.md ← Overview of all skills (human + AI readable)
├── publish-article/
│ ├── SKILL.md ← Trigger conditions + operation manual
│ └── references/ ← Detailed docs, loaded when needed
├── security-check/
│ └── SKILL.md
├── aia-rewrite/
│ ├── SKILL.md
│ └── references/
│ ├── writing-guide.md
│ └── aia-standards.md
The frontmatter of each SKILL.md tells the AI when to load it:
---
name: publish-article
description: Generate blog articles for watersonusa.ai. Use when the user
says "write article", "generate content", "新文章", or discusses
creating content for the website.
---
The description field is what the AI reads to decide whether to trigger the skill. Write it comprehensively — include synonyms, alternative phrasings, and the languages your team uses.
Setup Steps
- Audit your existing memory. List every memory file. For each one, ask: Is this a repeatable workflow? (→ move to Skill) Is this reference data? (→ move to Skill references/) Is this used in less than 20% of conversations? (→ delete or move to Skill) What remains is your new lean Memory.
- Build skills from extracted workflows. For each workflow you pulled out of Memory, create a skill directory. Write the SKILL.md with frontmatter trigger conditions, ordered steps, exact commands, and quality constraints.
- Write CLAUDE.md with intent detection. This is the highest-leverage step. Draft the intent detection table carefully — it determines what the AI does proactively versus what it waits to be told.
- Compress remaining memory. Go through each remaining memory file. Can this be said in 3 lines instead of 30? Compress it. Is the detail needed every conversation? No — move it to a skill reference. Target: MEMORY.md index under 30 lines, total memory under 4,000 tokens.
-
Separate shared from private. Commit CLAUDE.md and the skills directory to git. Keep personal preferences and private project records in
~/.claude/(never committed). Team members who clone the repo immediately get the full setup.
.eslintrc — committed to git, the same for everyone. Personal memory is like your editor settings — stays on your machine.
Results
| Metric | Before | After |
|---|---|---|
| Memory token consumption | 5,000+ per conversation | < 4,000 (−35%) |
| New member onboarding | Manual teaching session (hours) | Clone repo → open Claude Code → auto |
| Skill triggering | Memorize slash commands | Intent detection (natural language) |
| Team config sharing | Manual file transfer | git clone |
| Context window waste | High — unused info loaded every time | Low — on-demand loading |
How This Compares to Industry Best Practices
Our three-layer architecture aligns with the latest AI agent memory management patterns from 2025–2026. Here's how what we built maps to the terminology researchers and practitioners are using:
What we already do
| Technique | Industry term | Our implementation |
|---|---|---|
| Three-layer separation | Memory Tiering | CLAUDE.md + MEMORY.md + Skills |
| Index pointers | Pointer Index System | MEMORY.md with one-line pointers |
| On-demand loading | Progressive Disclosure / Selective Re-injection | Skills loaded only when triggered |
| Sub-agent summarization | Sub-agent Distillation | 12-role agents report summaries to orchestrator |
| Team sharing via repo | Shared Project Config | CLAUDE.md + Skills in GitHub |
| Behavior-based triggering | Intent Detection | CLAUDE.md behavior-to-skill mapping table |
What you could add next
| Technique | What it does | How to implement |
|---|---|---|
| AutoCompact | Auto-compress conversation when context hits ~92% | Add to CLAUDE.md: "In long conversations, proactively summarize completed tasks" |
| AutoDream | Background agent consolidates memory (merge duplicates, prune stale) | Build a /memory-cleanup skill that runs periodically |
| Memory Decay | Old memories auto-expire | Add dates to memory files, flag anything > 90 days for review |
| MCP Integration | Pull context from GitHub Issues, Slack, Jira | Already using Supabase for storyboard sync — extend the pattern |
| Small Model Filter | Lightweight model pre-filters memory relevance | Use Haiku to score memory relevance before injecting into context |
Advanced: Multi-Agent Skills
Once the three-layer structure is in place, skills can orchestrate complex multi-agent workflows without any added memory overhead. Our AIA course rewrite skill runs 11 parallel agents in two waves:
Wave 1 (parallel, 5 agents):
- ResearchAgent → finds current standards and competitor products
- DraftAgent × 3 → drafts three content sections concurrently
- FactCheckAgent → validates all product claims
Wave 2 (parallel, 5 agents):
- ADAReviewAgent → checks accessibility compliance
- SEOAgent → meta tags, schema, keyword density
- LegalAgent → regulatory claims audit
- StyleAgent → tone and reading level
- CitationAgent → formats all source notes
Wave 3 (sequential, 1 agent):
- IntegrationAgent → merges all outputs, resolves conflicts, deploys
This entire workflow lives in the skill SKILL.md file. It contributes zero tokens to memory. It's only loaded when someone says "rewrite the course" or triggers the /aia-rewrite command.
Advanced: External Tool Integration
Skills can define tool fallback chains, making them resilient to quota limits and API outages:
## Research Step
1. Try: gemini -m gemini-2.5-flash -p "{{query}}" --output-format text
2. Fallback: Claude Sonnet sub-agent with WebSearch tool
3. Fallback: Manual research prompt to user
Because this fallback logic lives in the skill file rather than in memory, it doesn't cost tokens during conversations where research isn't needed — which is most of them.
Advanced: Multi-AI Collaboration
Don't let Claude do everything alone. Once your skill architecture is in place, you can distribute work across multiple AI providers — maximizing throughput while minimizing cost. Each AI has a different strength and a different price point.
The delegation hierarchy
| AI | Role | When to use | Cost |
|---|---|---|---|
| Claude Opus | Orchestrator | Complex decisions, quality control, final integration | Highest — reserve for core work |
| Claude Sonnet | Workers | Writing, reviewing, code generation, auditing | Medium — your main workforce |
| Gemini Flash/Pro | Researchers | Google Search grounding, fact verification, SEO analysis, proofreading | Free (1,000 req/day) — use aggressively |
| Codex (GPT) | Code reviewer | HTML/CSS/JS quality, accessibility audit, competitive analysis | Subscription — use until quota exhausted |
| Claude Haiku | Lightweight tasks | Memory filtering, simple formatting, quick lookups | Cheapest — use for high-volume low-complexity |
Configure the fallback chain in CLAUDE.md
## Multi-AI Collaboration
Gemini CLI: `echo "Y" | gemini -m gemini-2.5-flash -p "QUERY" --output-format text`
Codex CLI: `codex exec --full-auto -C /path "TASK"`
Fallback chain:
1. Gemini Flash (free) → 2. Codex (subscription) → 3. Claude Sonnet (paid per token)
Always try free/cheaper options first for:
- Web research, fact checking
- Code review, linting
- SEO analysis, proofreading
- Bulk formatting, translation
Real example: 42 agents, one session
Here's what that session produced and how the agents were distributed:
- 25 Claude Sonnet agents — writing, reviewing, fixing
- 10 Gemini Flash tasks — citation verification, SEO checks, proofreading
- 4 Codex tasks — code review, accessibility audit
- 2 Gemini Pro tasks — deep content analysis
- 1 Claude Opus orchestrator — planning, integration, quality control
Output: one working day
- 5 AIA CEU courses (284 slides total)
- 7 blog articles
- 12 content topics identified and briefed
- 1 collaborative storyboard editor (deployed to production)
None of this required the orchestrator to hold the entire context in memory. Each agent received a focused brief, worked independently, and reported a summary back. The three-layer architecture made it possible to dispatch that many agents without losing coherence.
The Core Principle
The context window is finite. Every token loaded at conversation start that turns out to be irrelevant is waste. The three-layer architecture applies the same logic as lazy loading in software: load only what you need, when you need it. The difference is that here, the "import cost" is your ability to think clearly about the actual task at hand.
FAQ
Does this only work with Claude Code?
The CLAUDE.md mechanism is specific to Claude Code. But the underlying principles — lean always-loaded context, on-demand detailed documentation, intent-based triggering — apply to any AI agent system. You can implement equivalent structures with Cursor rules files, GitHub Copilot workspace configuration, or custom system prompts.
How do I decide what goes in a skill versus what stays in memory?
Apply this filter: if you would include it in documentation handed to a new employee on their first day, it belongs in a skill. If you would say it verbally in a 10-second handoff, it belongs in memory. Procedures, reference data, multi-step workflows, and anything longer than five lines is almost always a skill.
What happens when a skill gets very large?
That's expected and fine. Skills can have a references/ subdirectory with arbitrarily many files. The skill's SKILL.md acts as a table of contents — it specifies which reference files to load for which sub-tasks. A skill for a complex publication workflow might have 10 reference files totaling thousands of lines, and none of that weight appears in your context unless you're actually publishing.
How do we handle skills that multiple projects share?
Claude Code supports two locations for skills: project-level (.claude/skills/ in the repo, shared via git) and user-level (~/.claude/skills/ on your machine, private). Generic utility skills like security scanning or git operations live at user level. Project-specific skills like course rewriting or article publishing live at project level.
Want the raw .md file to use in your own project?
The full framework template — including the audit checklist, MEMORY.md structure, and SKILL.md frontmatter format — is available as a standalone markdown file.
Download the template →