# Claude Code Memory & Skill Architecture Guide

> A practical framework for organizing AI agent memory, skills, and project instructions.
> Works with Claude Code, but the concepts apply to any AI coding agent.

## First Things First: Name Your AI Team Leader

When you set up this framework, the first thing your AI should ask is:

> "I'll be your AI project manager — I'll understand your needs, coordinate the team, 
> and deliver results. Would you like to give me a name?"

We call ours "A君" (Mr. A). You can call yours anything — the point is that you talk to ONE AI, 
and it handles everything else. You don't manage 12 agents directly. You manage one team leader.

## The Problem

After using Claude Code for a while, you end up with:
- Memory files that keep growing (eating tokens every conversation)
- Workflows buried in memory that should be reusable templates
- New team members who have to re-teach the AI everything from scratch
- No clear separation between "what to remember" and "how to do things"

## The Solution: Three-Layer Architecture

```
Layer 1: CLAUDE.md          → AI reads this EVERY conversation (~800 tokens)
Layer 2: Memory (MEMORY.md) → Lean index, pointers only (~200 tokens)
Layer 3: Skills (SKILL.md)  → Full manuals, loaded ONLY when triggered
```

### Layer 1: CLAUDE.md (Always Loaded)

This is the AI's "onboarding guide." Every time anyone opens Claude Code in your project directory, the AI reads this file automatically.

**What goes here:**
- Language rules (e.g., "discuss in Chinese, code in English")
- Intent detection table (the most important part — see below)
- Content architecture (what goes where in your project)
- Team conventions (commit format, deployment process)

**What does NOT go here:**
- Detailed workflows (→ put in Skills)
- Reference data (→ put in Skill references/)
- Personal preferences (→ put in local Memory)

**The Intent Detection Table — This is the key innovation:**

Instead of requiring users to memorize slash commands, write behavior-to-skill mappings:

```markdown
| When the user...                    | Auto-suggest or use...  |
|-------------------------------------|------------------------|
| Discusses writing an article        | /publish-article       |
| Mentions deploying or uploading     | /upload                |
| Says "done" or "finished editing"   | /upload                |
| Reviews content for accuracy        | /content-review        |
| Is about to git push                | /security-check        |
```

The AI reads this table and proactively suggests the right skill — no memorization needed.

### Layer 2: Memory (Lean Index)

**Target: < 4,000 tokens total.**

Memory should contain ONLY:
1. **Behavior rules** (3-5 lines) — How the AI should work with you
2. **Skill pointers** (one line each) — "For X, use /skill-y"
3. **Design preferences** (brief) — UI style, coding conventions
4. **Project index** — Links to detail files, not the details themselves

**Example MEMORY.md:**
```markdown
# Memory Index

## Behavior Rules
- [feedback_workflow.md](feedback_workflow.md) — Plan before coding
- [feedback_delegate.md](feedback_delegate.md) — Always use agents, don't code directly

## Skill Pointers
- Writing articles → /publish-article
- Deploying → /upload
- Security scan → /security-check

## Preferences
- [feedback_ui_style.md](feedback_ui_style.md) — Large fonts, spacious layout

## Active Projects
- [projects_active.md](projects_active.md) — Links to 3 active projects
```

**What does NOT belong in Memory:**
- Complete workflows (→ Skill)
- Reference documents (→ Skill references/)
- API documentation (→ Skill references/)
- Step-by-step procedures (→ Skill)

### Layer 3: Skills (On-Demand)

Skills are loaded ONLY when triggered. They can be as detailed as needed without wasting tokens.

**Structure:**
```
.claude/skills/
├── CATALOG.md              ← Overview of all skills (human + AI readable)
├── my-skill/
│   ├── SKILL.md            ← Trigger conditions + operation manual
│   └── references/         ← Detailed docs, loaded when needed
├── another-skill/
│   └── SKILL.md
```

**SKILL.md frontmatter — Write clear trigger conditions:**
```yaml
---
name: publish-article
description: Generate blog articles. Use when the user says "write article",
  "generate content", "新文章", or discusses creating content for the website.
---
```

**Key principle:** The `description` field is what the AI reads to decide whether to trigger the skill. Be comprehensive about when to use it.

## Setup Steps

### Step 1: Audit Your Memory

List every memory file. For each one, ask:

| Question | If yes → |
|----------|----------|
| Is this a repeatable workflow? | Move to Skill |
| Is this reference data? | Move to Skill references/ |
| Is this a behavior preference? | Keep in Memory (compress to 3 lines) |
| Is this a project record? | Keep as 1-line index pointer |
| Is this used every conversation? | Keep in Memory |
| Is this used < 20% of the time? | Move to Skill or delete |

### Step 2: Build Skills from Extracted Workflows

For each workflow you extracted from Memory:

```
mkdir -p .claude/skills/my-workflow
```

Write `SKILL.md` with:
1. **Frontmatter** — name + description (trigger conditions)
2. **Steps** — What to do, in order
3. **Commands** — Exact commands or API calls
4. **Rules** — Quality requirements, constraints

### Step 3: Write CLAUDE.md with Intent Detection

This is the most impactful step. Your CLAUDE.md should make the AI proactive:

```markdown
## Behavior Rules — Proactive Skill Suggestion

Do NOT wait for slash commands. Detect user intent:
- Clear intent → auto-use the skill
- Ambiguous intent → suggest the skill
```

### Step 4: Compress Memory

Go through each remaining memory file:
- Can this be said in 3 lines instead of 30? → Compress
- Is the detail needed every conversation? → No → Move to Skill reference
- Is this duplicated in a Skill? → Delete from Memory

**Target: MEMORY.md index < 30 lines, total memory < 4,000 tokens.**

### Step 5: Separate Shared vs Private

```
GitHub repo (team shared)          Local memory (private)
├── CLAUDE.md                      ├── Personal preferences
├── .claude/skills/                ├── Private project records
│   ├── CATALOG.md                 └── Account credentials
│   ├── skill-a/
│   └── skill-b/
```

Team members clone the repo → get CLAUDE.md + all skills automatically.
Personal stuff stays in `~/.claude/` (never committed).

## Results You Can Expect

| Metric | Before | After |
|--------|--------|-------|
| Memory token consumption | 5,000+ | < 4,000 (-35%) |
| New member onboarding | Manual teaching | Clone → open → auto |
| Skill triggering | /command memorization | Intent detection |
| Team config sharing | Manual file transfer | git clone |
| Context window waste | High (unused info loaded) | Low (on-demand loading) |

## How This Compares to Industry Best Practices

Our three-layer architecture aligns with the latest AI agent memory management patterns (2025-2026):

### What we already do

| Technique | Industry term | Our implementation |
|-----------|--------------|-------------------|
| Three-layer separation | Memory Tiering | CLAUDE.md + MEMORY.md + Skills |
| Index pointers | Pointer Index System | MEMORY.md with one-line pointers |
| On-demand loading | Progressive Disclosure / Selective Re-injection | Skills loaded only when triggered |
| Sub-agent summarization | Sub-agent Distillation | 12-role agents report summaries to orchestrator |
| Team sharing via repo | Shared Project Config | CLAUDE.md + Skills in GitHub |
| Behavior-based triggering | Intent Detection | CLAUDE.md behavior-to-skill mapping table |

### What you could add next

| Technique | What it does | How to implement |
|-----------|-------------|-----------------|
| **AutoCompact** | Auto-compress conversation when context hits ~92% | Add to CLAUDE.md: "In long conversations, proactively summarize completed tasks" |
| **AutoDream** | Background agent consolidates memory (merge duplicates, prune stale) | Build a `/memory-cleanup` skill that runs periodically |
| **Memory Decay** | Old memories auto-expire | Add dates to memory files, flag anything > 90 days for review |
| **MCP Integration** | Pull context from GitHub Issues, Slack, Jira | Already using Supabase for storyboard sync — extend pattern |
| **Small Model Filter** | Lightweight model pre-filters memory relevance | Use Haiku to score memory relevance before injecting |

## Advanced Techniques

### Multi-AI Collaboration: Use Every Token You're Paying For

Don't let Claude do everything alone. Distribute work across multiple AI providers to maximize throughput and minimize cost:

**The delegation hierarchy:**

| AI | Role | When to use | Cost |
|----|------|------------|------|
| **Claude Opus** | Orchestrator / A君 | Complex decisions, quality control, final integration | Highest — reserve for core work |
| **Claude Sonnet** | Workers | Writing, reviewing, code generation, auditing | Medium — your main workforce |
| **Gemini Flash/Pro** | Researchers | Google Search grounding, fact verification, SEO analysis, proofreading | Free (1000 req/day) — use aggressively |
| **Codex (GPT)** | Code reviewer | HTML/CSS/JS quality, accessibility audit, competitive analysis | Subscription — use until quota exhausted |
| **Claude Haiku** | Lightweight tasks | Memory filtering, simple formatting, quick lookups | Cheapest — use for high-volume low-complexity |

**How to configure in CLAUDE.md:**

```markdown
## Multi-AI Collaboration

Gemini CLI: `echo "Y" | gemini -m gemini-2.5-flash -p "QUERY" --output-format text`
Codex CLI: `codex exec --full-auto -C /path "TASK"`

Fallback chain:
1. Gemini Flash (free) → 2. Codex (subscription) → 3. Claude Sonnet (paid per token)

Always try free/cheaper options first for:
- Web research, fact checking
- Code review, linting
- SEO analysis, proofreading
- Bulk formatting, translation
```

**Real example from our workflow:**
We dispatched 42 agents in a single session:
- 25 Claude Sonnet agents (writing, reviewing, fixing)
- 10 Gemini Flash tasks (citation verification, SEO, proofreading)
- 4 Codex tasks (code review, accessibility audit)
- 2 Gemini Pro tasks (deep analysis)
- 1 Claude Opus orchestrator

Result: 5 AIA courses (284 slides total), 7 blog articles, 12 content topics, and a collaborative storyboard editor — in one working day.

**Key principle: Burn the free tokens first.** Gemini gives you 1,000 requests/day for free. Codex subscriptions include a token budget. Use them aggressively before falling back to Claude.

### Multi-Agent Collaboration in Skills

Skills can define multi-agent workflows. Example from our AIA course rewrite:

```
Wave 1 (parallel): 5 agents — research + draft
Wave 2 (parallel): 5 agents — review + audit
Wave 3 (sequential): 1 agent — integrate + deploy
```

### External Tool Integration

Skills can call external tools (Gemini, Codex, etc.) with fallback chains:

```markdown
## Research Step
1. Try: gemini -m gemini-2.5-flash -p "query"
2. Fallback: Claude Sonnet agent
3. Fallback: WebSearch tool
```

### Supabase for Real-Time State

For collaborative editing (like our storyboard tool), use Supabase to sync state between the browser and Claude Code:

```
Browser edit → Supabase → Claude reads via REST API
```

## How to Use This Guide

**Option A: Reference it in your CLAUDE.md**
```markdown
For memory/skill organization methodology, see:
https://raw.githubusercontent.com/chihao919/door-site/main/admin/memory-skill-guide/memory-skill-architecture.md
```

**Option B: Copy the framework**
Download this file and adapt the three-layer structure to your project.

**Option C: Use as a checklist**
Follow Steps 1-5 above to reorganize your existing Claude Code setup.

---

*Developed by the Waterson USA AI team. Based on real-world experience managing 5 AIA CEU courses, 20+ blog articles, and 25+ skills across multiple projects using Claude Code.*
