Gmail AI Reply Extension — Architecture Guide v4.1

One Sentence Summary 一句話總結

OGSM defines what to achieve and how. Skills define step-by-step recipes. Python ensures it actually happens correctly — at scale, every morning, without anyone watching. OGSM 定義要達成什麼、怎麼做。Skills 定義具體步驟食譜。Python 確保確實正確執行 — 大批量、每天早上、無人值守。

Traditional AI agents have a fundamental problem: you tell them what to do in a prompt, hope they follow instructions, and have no way to verify they actually did. This architecture solves that by separating direction (OGSM), knowledge (Skills), and enforcement (Python). 傳統 AI agent 有一個根本問題：你在 prompt 裡告訴它該做什麼，然後祈禱它照做，但沒有辦法驗證它真的做了。這套架構把方向（OGSM）、知識（Skills）、執行保證（Python）分開來解決這個問題。

Three-Layer Architecture 三層架構

Agent is the boss. Python is the skeleton. AI judgment happens inside each skill runner. Agent 是老大。Python 是骨架。AI 判斷在每個 skill runner 內部發生。

Layer 1: Agent Boss

Strategic PDCA — reads OGSM, runs pipeline, checks results, improves 戰略 PDCA — 讀 OGSM、跑 pipeline、看結果、改善

Reads O+G+S+M before execution執行前讀取 O+G+S+M
Calls Python pipeline as a tool把 Python pipeline 當工具呼叫
Checks results against G metrics用 G 指標量測結果
Updates policies & corrections-log更新 policies 和 corrections-log

Layer 2: Python Skeleton

Deterministic pipeline — fixed flow, gate checks, API calls 確定性 pipeline — 固定流程、Gate Check、API 呼叫

Scan → Plan → Research → Suggest → Cache掃描 → 規劃 → 研究 → 建議 → 快取
Gate Check with M verificationGate Check 含 M 資源利用驗證
Retry on quality gate failure品質不過自動重試
Returns JSON results to Agent Boss回傳 JSON 結果給 Agent 主控

Layer 3: Agent Judgment

AI decisions — each skill reads SKILL.md + policies at runtime AI 判斷 — 每個 skill 動態讀 SKILL.md + policies

Planner: strategic triage per OGSMPlanner: 依 OGSM 做策略分派
Research: customer intelligence via GeminiResearch: 用 Gemini 做客戶情報
Suggest: Peak Experience draft via OpusSuggest: 用 Opus 寫峰值體驗回覆
Evaluator: PDCA batch analysisEvaluator: PDCA 整批分析

Why Agent Boss為什麼要 Agent Boss

OGSM-driven — every action has strategic purpose, not ad-hocOGSM 驅動 — 每個動作都有戰略目的，不是隨便做
Self-improving daily — PDCA turns yesterday's mistakes into today's improvements每天自我進化 — PDCA 把昨天的錯誤變成今天的改善

Why Python Skeleton為什麼要 Python 骨架

AI can't skip steps — fixed order, one failure doesn't crash the restAI 不能跳步 — 固定順序，一封失敗不影響其他
Gate Check verifies with code, not trust — no "I did the analysis"Gate Check 用程式碼驗證，不靠信任 — 不接受「我有做分析」

Why Skill Runners為什麼要 Skill Runner

Edit SKILL.md → behavior changes next run, no deploy needed改 SKILL.md → 下次跑行為就變，不用部署
Right model for each task — Opus writes, Gemini researches, Flash verifies每個任務用最適合的 model — Opus 寫、Gemini 研究、Flash 驗證

How the System Works 系統如何運作

Full morning flow — from wakeup to PDCA Act. The Agent Boss orchestrates everything. 完整早晨流程 — 從啟動到 PDCA Act。Agent Boss 統籌全局。

Agent Boss wakes up (launchd triggers agent_morning.sh at 7:00 AM) │ ├─ Reads OGSM SKILL.md │ O: "Customers feel genuinely understood" │ G: "Quality score ≥8.5 for quotes, ≥8.0 for troubleshoot" │ S: "Use Peak Experience + persona detection + multi-model" │ M: "Verify 3-layer analysis exists, zero corrections repeat" │ ├─ Reads yesterday's PDCA log │ "Yesterday: troubleshoot drafts were too formal → updated reply-rules.md" │ ├─ Now it knows WHAT to achieve and HOW │ But it doesn't process 78 emails itself — it calls the Python pipeline │ ├─ Calls: python3 daily_morning.py │ Python runs each step in order: │ scan → planner → [research → suggest → gate_check → cache] × 78 → evaluator │ │ Inside each step, the skill runner: │ 1. Reads its own SKILL.md (e.g., suggest-reply/SKILL.md) │ 2. Reads policy files (product-facts.md, reply-rules.md, ...) │ 3. Builds a prompt with SKILL.md + policies + task data │ 4. Calls Claude Opus / Gemini Pro (AI does the creative work) │ 5. Python gate_check verifies the output (deterministic) │ ├─ Pipeline returns results JSON │ {drafts: 72, errors: 3, skipped: 3, quality_scores: [...] } │ ├─ CHECK: Agent Boss compares results against G │ "Quote avg quality: 8.7 ✓ Troubleshoot avg: 7.9 ✗ (below 8.0)" │ └─ ACT: Agent Boss improves Updates policies/reply-rules.md → tomorrow's pipeline reads the new rules Writes PDCA log → tomorrow's planner reads it → compounding improvement

The Agent Boss is the commander. It reads OGSM to understand the mission, delegates execution to Skills via Python, reviews results, and improves the system. It never writes a single email draft itself. Agent Boss 是指揮官。它讀 OGSM 理解使命，透過 Python 委派 Skills 執行，檢視結果，改善系統。它自己不寫任何一封 email draft。

PDCA Continuous Improvement PDCA 持續改善

The system gets better every day because the Agent Boss closes the loop by updating the files that Skills read. 系統每天都在進步，因為 Agent Boss 透過更新 Skills 讀取的檔案來關閉 PDCA 循環。

Day 1: Agent Boss reads OGSM → runs pipeline → Evaluator finds patterns → writes evaluator-pdca.json (e.g., "troubleshoot drafts too formal") Day 2: Agent Boss reads OGSM + yesterday's PDCA → Planner reads PDCA → adjusts strategy for troubleshoot emails → Suggest Reply gets planner's adjusted instructions → better troubleshoot drafts Day 2 Act: Agent Boss updates policies/reply-rules.md (troubleshoot tone guidance) → tomorrow's pipeline reads updated policy Day 3: All troubleshoot drafts now follow the improved tone → compounding improvement

Three things that compound: (1) PDCA log carries lessons to the next day's Planner. (2) Policy files are updated by Agent Boss after each cycle — the pipeline automatically reads the new rules. (3) Corrections-log grows over time, so the same factual error is never made twice. 三件累積的事：(1) PDCA log 把教訓帶給隔天的 Planner。(2) Policy 檔案在每輪後由 Agent Boss 更新 — pipeline 自動讀取新規則。(3) Corrections-log 隨時間增長，同樣的事實錯誤不會重複犯。

Developer Guide: Change Behavior vs Change Flow 開發者指南：改行為 vs 改流程

Most changes don't require touching Python. Know which layer to edit. 大多數改動不需要動 Python。知道要改哪一層。

Change behavior → edit SKILL.md or policies/.md (no code change needed) 改行為 → 改 SKILL.md 或 policies/.md（不需要改程式碼）

When you edit a SKILL.md file, the AI behavior changes on the next pipeline run. The Python runner reads SKILL.md at runtime and injects it into the AI prompt — it's just a bridge between the knowledge file and the AI engine. Change the file, change the behavior. Examples: adjust reply tone, add a new analysis step, update Peak Experience methodology, add product facts. 改 SKILL.md 時，下次 pipeline 跑的 AI 行為就會改變。Python runner 在執行時讀 SKILL.md 並注入到 AI prompt 裡 — 它只是知識檔案和 AI 引擎之間的橋樑。改檔案，就改行為。範例：調整回覆語氣、新增分析步驟、更新峰值體驗方法論、補充產品事實。

Change flow → edit Python skeleton (daily_morning.py) 改流程 → 改 Python 骨架（daily_morning.py）

Python controls the pipeline order, error isolation, timeouts, and gate check logic. Edit Python when you need to: add or remove a pipeline step, change the order of steps, adjust retry logic, or modify what the gate check verifies. These are structural changes to the system's enforcement layer. Python 控制 pipeline 順序、錯誤隔離、超時設定、gate check 邏輯。改 Python 的時機：新增或移除 pipeline 步驟、改變步驟順序、調整重試邏輯、修改 gate check 驗證什麼。這些是系統執行層的結構性改動。

When do you need Python at all? (vs. pure SKILL.md) 什麼時候需要 Python？（vs. 純 SKILL.md）

Claude Code can already run Skills on a schedule using /schedule. A scheduled agent reads SKILL.md, follows the steps, and runs without anyone watching. You don't always need Python. It depends on scale and reliability requirements. Claude Code 已經可以用 /schedule 排程執行 Skills。排程 agent 讀 SKILL.md、照步驟做、無人值守自動跑。不一定需要 Python。取決於規模和可靠性需求。

	Pure SKILL.md (Claude scheduled agent)純 SKILL.md（Claude 排程 agent）	SKILL.md + Python skeletonSKILL.md + Python 骨架
Batch size批量大小	5~10 items: works great5~10 件：完全夠用	78 items: each processed in its own session78 件：每件在獨立 session 處理
Error handling錯誤處理	One failure may crash the whole session一個失敗可能讓整個 session 崩潰	try/except per item — #15 fails, #16 continues每件 try/except — 第 15 封失敗，第 16 封繼續
Quality check品質檢查	AI self-assessment: "yes I did the 3-layer analysis"AI 自我評估：「我有做三層分析」	Python checks JSON fields: `hidden_need` exists and ≥20 chars?Python 檢查 JSON 欄位：`hidden_need` 存在且 ≥20 字？
Context windowContext 視窗	All items share one context — may overflow所有項目共用一個 context — 可能溢出	Each AI call is independent — never overflows每次 AI 呼叫獨立 — 不會溢出
Timeout control超時控制	No per-step timeout沒有每步超時控制	research: 60s, suggest: 600s, fact-check: 30sresearch: 60秒, suggest: 600秒, fact-check: 30秒
Output產出	Conversation log (hard to parse)對話 log（難以解析）	Structured JSON: `{drafts: 72, errors: 3}`結構化 JSON：`{drafts: 72, errors: 3}`
Best for適合場景	Daily summaries, small reports, simple workflows每日摘要、小型報告、簡單工作流	High-volume batch pipelines with quality requirements有品質要求的大批量 pipeline

The restaurant analogy餐廳比喻

Pure SKILL.md = You hire a chef, hand them a recipe, and they cook one dish. Great for dinner at home.
SKILL.md + Python = A restaurant kitchen processing 78 orders at 7 AM with no one watching. Needs: an order queue (Python loop), a recipe for each dish (SKILL.md), a quality station checking every plate (gate_check), and one bad dish can't shut down the whole kitchen (error isolation). 純 SKILL.md = 你請一個廚師、給他食譜、他做一道菜。在家吃晚餐夠用了。
SKILL.md + Python = 一個早上 7 點無人值守的餐廳廚房，要出 78 道菜。需要：出菜順序表（Python 迴圈）、每道菜的食譜（SKILL.md）、每盤菜的品管站（gate_check）、一道菜壞了不能影響其他 77 道（錯誤隔離）。

Decision guide決策指南

Start with pure SKILL.md + Claude schedule. If you later find that items fail silently, context overflows, or you need deterministic quality gates — that's when you add Python. Don't add complexity before you need it. 先用純 SKILL.md + Claude 排程。如果後來發現項目會靜默失敗、context 溢出、或需要確定性品質檢查 — 那時候再加 Python。不需要的時候不加複雜度。

Skill Runner: how behavior changes without code changes Skill Runner：如何不改程式碼就改行為

# skills/suggest_reply.py

def _load_skill_instructions():
    """Read SKILL.md at runtime — change the file, change the behavior."""
    path = SKILL_DIR / "skills/composite/suggest-reply/SKILL.md"
    return path.read_text() if path.exists() else ""

def run(task, brief, dry_run=False):
    # 1. Read SKILL.md (the "recipe")
    skill_md = _load_skill_instructions()

    # 2. Read policies (product-facts, corrections-log, reply-rules, ...)
    policies = _load_policies()

    # 3. Build prompt: SKILL.md + policies + task data
    prompt = f"""SKILL INSTRUCTIONS:
{skill_md}

{policies}

CUSTOMER EMAIL:
{task['subject']}..."""

    # 4. Call AI — Claude Opus for writing quality
    result = call_claude(prompt, model="opus")

    # 5. Return parsed JSON — Python gate_check will verify fields
    return parse_json_from(result)

S → Skills: The Mapping S → Skills：對應關係

Every Strategy in OGSM maps directly to an executable Skill. Change the SKILL.md, change the behavior — no code changes needed. OGSM 裡的每個 Strategy 都直接對應到一個可執行的 Skill。改 SKILL.md，就改行為 — 不需要改程式碼。

Strategy (in OGSM)Strategy（在 OGSM 裡）	Skill (SKILL.md)Skill（SKILL.md）	Runner (.py)Runner（.py）	Model模型
Peak Experience 3-layer analysis峰值體驗三層分析	`suggest-reply/SKILL.md`	`suggest_reply.py`	Claude Opus
Persona-tailored communication角色適配溝通	`persona-detection/SKILL.md`	`scan_classify.py`	Python script
Customer intelligence mining客戶情報挖掘	`research-customer/SKILL.md`	`research_customer.py`	Gemini Pro
Multi-model routing多模型路由	(in orchestrator)（在 orchestrator 裡）	`daily_morning.py`	Opus / Gemini / Flash
Fact verification against wikiWiki 事實驗證	`fact-check/SKILL.md`	(inside suggest_reply.py)（在 suggest_reply.py 內）	Gemini Flash
PDCA continuous improvementPDCA 持續改善	`pdca-update/SKILL.md`	`evaluator.py`	Gemini Pro

Multi-Model Routing 多模型路由

Each step uses the model best suited for the task. Not one model for everything. 每一步用最擅長的 model 做事。不是一個 model 做全部。

Step步驟	Model	Why This Model為什麼用這個 Model	Invocation呼叫方式
scan_classify	No AI	Deterministic classification, zero token cost確定性分類，零 token 成本	`Python script`
planner	Claude Opus	Understands OGSM strategy, global judgment理解 OGSM 策略、全局判斷	`claude -p --model opus`
research	Gemini 2.5 Pro	Web search, long context analysis網路搜尋、長 context 分析	`gemini -m gemini-2.5-pro`
suggest_reply	Claude Opus	Writing quality, empathy, Peak Experience寫作品質、同理心、峰值體驗	`claude -p --model opus`
fact_check	Gemini Flash	Fast factual comparison, low cost快速事實比對，成本低	`gemini -m gemini-2.5-flash`
precache	No AI	Deterministic CRUD確定性 CRUD	`Python (Supabase SDK)`
evaluator	Claude Opus	Cross-customer insight, PDCA judgment跨客戶洞察、PDCA 判斷	`claude -p --model opus`
sync_deals / sync_tasks	No AI	Deterministic sync確定性同步	`Python (API calls)`

Gate Check Rules Gate Check 規則

Deterministic Python checks — G metrics + M resource utilization. Any check fails → draft rejected and retried. 確定性 Python 檢查 — G 指標 + M 資源利用。任一檢查失敗 → draft 被拒、重試。

Quality Threshold品質門檻 G2

Draft must meet type-specific quality score:Draft 必須達到各類型品質分數：

quote_request ≥ 8.5 · troubleshooting ≥ 8.0
follow_up ≥ 7.5 · cold_outreach ≥ 8.0

3-Layer Analysis三層分析 M-R6

Draft JSON must contain all three layers:Draft JSON 必須包含三層：

analysis.surface_question
analysis.hidden_need
analysis.proactive_suggestion

Peak Stage & Persona峰值階段與角色 M-R6

Draft must have:Draft 必須有：

peak_stage (Discovery/Evaluation/Decision/Onboarding/Advocacy)（探索/評估/決定/入門/倡導）
persona (detected, not default "End User")（已偵測，非預設「End User」）

Zero Corrections Repeat零重犯 M-R2

Fact-check must have no CRITICAL issues. Repeating a previously corrected error is a blocking failure.事實查核不得有 CRITICAL 問題。重犯已修正的錯誤為阻斷性失敗。

Draft Completeness草稿完整性 Basic

Draft text must exist and be ≥50 characters. Empty or stub drafts are rejected.Draft 文字必須存在且 ≥50 字元。空白或佔位草稿被拒絕。

On Failure失敗處理

Gate fail → inject failure reason into prompt → retry suggest_reply once. Still fails → skip task, log for human review.Gate 失敗 → 注入失敗原因到 prompt → 重試 suggest_reply 一次。仍失敗 → 跳過任務，記錄待人工審查。

File Architecture 檔案架構

OGSM definition files (direction) and Python skeleton files (enforcement) live in separate directories by design. OGSM 定義檔案（方向）和 Python 骨架檔案（執行保證）刻意分開放在不同目錄。

# OGSM definition (direction)
.claude/skills/gmail-reply-extension/
├── SKILL.md              # O + G + S + M + Skill dependency graph
├── CLAUDE.md             # Blocking rules (GRE-01~15)
├── agents/               # Agent definitions (schedule + steps)
├── policies/             # Shared strategies (S extracted)
│   ├── reply-rules.md    # Peak Experience methodology
│   ├── research-rules.md
│   └── ...
└── skills/               # SKILL.md per skill (the "recipes")
    ├── composite/suggest-reply/SKILL.md
    ├── composite/research-customer/SKILL.md
    ├── core/fact-check/SKILL.md
    └── ...

# Python skeleton (enforcement)
tools/gmail-extension/server/
├── agent_morning.sh      # Layer 1: Agent Boss
├── daily_morning.py      # Layer 2: Orchestrator (~180 lines)
├── lib/                  # Shared helpers (deterministic)
│   ├── gate_check.py     # G + M verification
│   ├── claude_cli.py     # call_claude(model="opus")
│   ├── gemini_cli.py     # call_gemini() + fallback chain
│   └── ...
└── skills/               # Layer 3: Skill runners
    ├── suggest_reply.py  # Reads SKILL.md → Opus
    ├── research_customer.py
    └── ...

Responsibility Matrix 職責矩陣

Who does what, and who can change it. 誰做什麼，誰可以改。

Component元件	Type類型	Responsibility職責	Can be changed by由誰修改
`SKILL.md` (OGSM)	Markdown	Direction: O, G, S, M definitions方向：O, G, S, M 定義	Human (strategic decisions)人類（戰略決策）
`policies/*.md`	Markdown	Rules: how Skills should behave規則：Skills 該如何行為	Agent Boss (PDCA Act) or HumanAgent Boss（PDCA Act）或人類
`skills/*/SKILL.md`	Markdown	Recipes: step-by-step instructions for AI食譜：給 AI 的逐步指令	Human (methodology changes)人類（方法論改變）
`agent_morning.sh`	Shell + Claude	Boss: PDCA cycle, strategic judgment主控：PDCA 循環、戰略判斷	Developer開發者
`daily_morning.py`	Python	Skeleton: fixed flow + gate checks骨架：固定流程 + gate check	Developer開發者
`skills/*.py`	Python	Bridge: read SKILL.md → build prompt → call AI橋樑：讀 SKILL.md → 組 prompt → 呼叫 AI	Developer開發者
`gate_check.py`	Python	Enforcer: deterministic M verification執法者：確定性 M 驗證	Developer開發者
`corrections-log.md`	Markdown	Memory: errors to never repeat記憶：不能再犯的錯誤	Agent Boss (PDCA Act) or HumanAgent Boss（PDCA Act）或人類
`pdca-log.md`	JSON/Markdown	Learning: what yesterday taught us學習：昨天教我們什麼	Agent Boss (automatic)Agent Boss（自動）

Skill Library Skill 資料庫

Click any skill to view its full SKILL.md instructions. Each skill is independently expandable. 點擊任一 skill 展開完整 SKILL.md 內容。每個 skill 可獨立展開收合。

Core Skills — Reusable building blocks Core Skills — 可重複使用的基礎模組

C persona-detection Detect customer role (Architect / Contractor / Distributor / End User / Property Manager) 偵測客戶角色（建築師/承包商/經銷商/終端用戶/物業管理）

Purpose:用途： Identify the customer's professional role from email metadata, body content, and HubSpot records. Called by scan-classify and suggest-reply for consistent persona classification.從 email metadata、內文和 HubSpot 記錄偵測客戶的專業角色。被 scan-classify 和 suggest-reply 呼叫以確保一致的角色分類。

Role Detection Patterns角色偵測模式

Role角色	Strong Signals強訊號	Weak Signals弱訊號
Architect	AIA member, firm domain, spec language, code refs (NFPA 80, ADA)AIA 會員、事務所域名、規範用語、法規引用（NFPA 80, ADA）	"project", "design"「project」「design」
Contractor	License # in signature, "install", "job site", "bid"簽名檔有證照號碼、「install」「job site」「bid」	Company name ends in "Construction"公司名含「Construction」
Distributor	"stock", "MOQ", "margin", "territory", dealer pricing「stock」「MOQ」「margin」「territory」、經銷商報價	Multiple product inquiries多品項詢問
End User	Personal email domain, "my door", "my house"個人 email、「my door」「my house」	Describes own building描述自己的建築物
Property Manager	"property", "units", "tenants", multiple doors「property」「units」「tenants」、多扇門	Property management firm物業管理公司

Confidence Scoring信心度評分

High (>80%): 2+ strong signals agree, no contradictions2+ 強訊號一致，無矛盾
Medium (50-80%): 1 strong signal or 2+ weak signals1 個強訊號或 2+ 弱訊號
Low (<50%): Only weak signals or contradictory — flag for human review只有弱訊號或矛盾 — 標記待人工確認

Reads讀取檔案

policies/research-rules.md — role definition table角色定義表
policies/reply-rules.md — role-specific tone adjustments角色專屬語氣調整
HubSpot contact properties + deal historyHubSpot 聯絡人屬性 + deal 歷史

C email-type-detection Classify email type with type-specific quality thresholds 分類信件類型，對應不同品質門檻

Purpose:用途： Classify inbound email type so downstream skills apply the correct quality standard, tone, and structure.分類收件信件類型，讓下游 skill 套用正確的品質標準、語氣和結構。

Type類型	Detection Signals偵測訊號	Quality Threshold品質門檻
`quote_request`	Price, quote, RFQ, "how much", quantity, specs價格、報價、RFQ、「多少錢」、數量、規格	≥8.5
`troubleshooting`	Problem, sagging, won't close, broken, warranty問題、下垂、關不了、壞了、保固	≥8.0
`follow_up`	Reply to thread, "checking in", "any update"回覆既有對話、「checking in」「any update」	≥7.5
`cold_outreach`	No prior thread, first contact, prospecting沒有先前對話、首次聯繫、開發	≥8.0
`product_inquiry`	"Which model", "what's the difference"「哪個型號」「有什麼差別」	≥8.0
`return_complaint`	Return, refund, complaint, unhappy退貨、退款、客訴、不滿意	≥8.5
`not_actionable`	Newsletter, auto-notification, spam電子報、自動通知、垃圾信	N/A — skipN/A — 跳過

Ambiguity Resolution模糊性處理

When signals point to multiple types, classify as the type with the higher quality threshold (safer).當訊號指向多個類型時，分類為品質門檻較高的類型（較安全）。

C product-fit-check Match requirements to Waterson product lines with door thickness guidance 依需求推薦 Waterson 產品，含門厚度選擇指引

Purpose:用途： Given customer requirements (door size, material, environment), recommend the best-fit Waterson products. All specs grounded in product-facts.md.根據客戶需求（門尺寸、材質、環境），推薦最適合的 Waterson 產品。所有規格以 product-facts.md 為準。

Requirements Extraction需求擷取

Door weight & dimensions, material, thickness門重量和尺寸、材質、厚度
Usage environment (interior, exterior, fire-rated, ADA, gate/fence)使用環境（室內、室外、防火、ADA、門/圍欄）
Code compliance (NFPA 80, ADA, UL, UBC)法規合規（NFPA 80, ADA, UL, UBC）
Aesthetic preferences (concealed vs surface-mounted)外觀偏好（隱藏式 vs 外露式）

Reads讀取檔案

docs/waterson-wiki/product-facts.md — ground truth唯一事實來源
docs/waterson-wiki/corrections-log.md — known errors已知錯誤

C fact-check Verify draft claims against wiki + corrections-log (last gate before sales team) 比對 wiki + corrections-log 驗證草稿事實（給業務團隊前的最後一關）

Purpose:用途： Validate every factual claim against product-facts.md and corrections-log.md. No draft should contain invented specs or previously corrected mistakes.比對 product-facts.md 和 corrections-log.md 驗證每一項事實聲明。草稿不能包含捏造的規格或已修正過的錯誤。

Checks Performed執行的檢查

Product model numbers — exact match required產品型號 — 必須完全匹配
Specs — must match wiki values (no rounding)規格 — 必須與 wiki 值一致（不能四捨五入）
Certifications — must be currently valid (e.g., "UL Listed" not "UL Certified")認證 — 必須是當前有效的（如「UL Listed」不是「UL Certified」）
Corrections-log — zero tolerance for repeating known errorsCorrections-log — 零容忍重犯已知錯誤
Communication rules — "Waterson self-closing hinge" not "hydraulic hinge"溝通規則 — 說「Waterson self-closing hinge」不說「hydraulic hinge」

Severity嚴重等級

CRITICAL: Wrong spec, fabricated model, corrections-log repeat — MUST fix規格錯誤、捏造型號、重犯 — 必須修正
WARNING: Imprecise statement — should fix不精確的描述 — 應該修正
INFO: Style issue — nice to fix風格問題 — 可以修正

Composite Skills — Orchestrated workflows Composite Skills — 組合式工作流

W scan-classify Daily Gmail + HubSpot scan, classify by type & priority → daily task list 每日 Gmail + HubSpot 掃描，依類型和優先級分類 → 每日任務清單

Purpose:用途： Scan Gmail inbox + HubSpot for new emails, classify each by type and priority, produce a structured task list for the daily pipeline.掃描 Gmail + HubSpot 的新信件，依類型和優先級分類，產出每日 pipeline 的結構化任務清單。

Pipeline流程

Scan Gmail (unread, past 24h, excluding promotions/social)掃描 Gmail（未讀、過去 24 小時、排除促銷/社群）
Enrich from HubSpot (contact, company, deals, lifecycle)從 HubSpot 豐富資料（聯絡人、公司、deals、生命週期）
Call persona-detection for each email對每封 email 呼叫 persona-detection
Call product-fit-check if product questions detected偵測到產品問題時呼叫 product-fit-check
Classify email type + assign priority (P1-P4)分類信件類型 + 指派優先級（P1-P4）
Determine research depth (Deep / Medium / Cache)決定研究深度（Deep / Medium / Cache）
Write daily-tasks.json寫入 daily-tasks.json

Priority Rules優先級規則

P1: Active deal, quote with deadline, safety troubleshooting進行中的 deal、有截止日的報價、安全相關問題
P2: New target persona, quote without deadline新的目標角色、無截止日的報價
P3: Follow-up, informational, routine跟進、資訊詢問、例行
P4: Automated emails, newsletters自動化信件、電子報

W research-customer Intelligence mining with depth calibration (Deep / Medium / Cache) 客戶情報挖掘，依熟悉度調整深度（Deep / Medium / Cache）

Purpose:用途： Build a customer intelligence brief by mining email history, HubSpot, call notes, and web sources. Research depth calibrated to familiarity level. This is mining, not collecting: connecting dots across sources.透過挖掘 email 歷史、HubSpot、通話記錄和網路資源建立客戶情報簡報。研究深度依熟悉度校準。這是挖掘，不是收集：跨資料源連結線索。

Depth Matrix深度矩陣

Familiarity熟悉度	Depth深度	Time時間	Sources資料源
First contact首次接觸	Deep	3 min	Gmail + HubSpot + call notes + web
Known, new project已知，新專案	Medium	1 min	Gmail + HubSpot + call notes
Ongoing thread進行中對話	Cache	<10s	Cache + current thread onlyCache + 僅當前對話

Over-research Guard過度研究防護

Before finalizing: is any finding irrelevant? Would mentioning it make the customer feel surveilled? If yes → mark as internal_only.完成前檢查：有無不相關的發現？提及它會讓客戶感覺被監視嗎？如果是 → 標記為 internal_only。

Model模型

Gemini 2.5 Pro (web research, long context)Gemini 2.5 Pro（網路研究、長 context）

W suggest-reply ⭐ 3-layer analysis + peak experience draft generation (core skill) 三層分析 + 峰值體驗草稿生成（核心 skill）

Purpose:用途： The core skill. Generate a reply that makes the customer feel "this company doesn't just answer my question — they help me think of what I haven't thought of yet."核心 skill。產出讓客戶感到「這家公司不只回答我的問題 — 還幫我想到我沒想到的事」的回覆。

3-Layer Analysis三層分析

Layer層	Question問題	Example範例
Surface	What did they literally ask?他們字面上問了什麼？	"What's the price for K51M?"「K51M 多少錢？」
Hidden	What are they actually worried about?他們實際上擔心什麼？	"Will this pass fire inspection for my hospital project?"「這能通過我醫院專案的消防檢查嗎？」
Proactive	What should we help them think about?我們應該幫他們想到什麼？	"ADA closing force requirements — here's compliance docs"「ADA 關門力要求 — 這是合規文件」

Peak Stages峰值階段

Discovery → Evaluation → Decision → Onboarding → Advocacy探索 → 評估 → 決定 → 入門 → 倡導

Draft Structure草稿結構

Opening: Acknowledge their specific situation (NOT generic)回應他們的具體狀況（不要通用問候）
Surface response: Answer literal question回答字面問題
Hidden need: Address real concern回應真正的擔憂
Proactive value: Offer what they didn't ask for提供他們沒問的價值
Next step: ONE specific call to action一個具體的下一步行動

Reads讀取檔案

reply-rules.md, product-facts.md, corrections-log.md, email-format-guide.md, safety-rules.md, customer brief

Model模型

Claude Opus (writing quality + empathy)Claude Opus（寫作品質 + 同理心）

W regenerate Rewrite draft based on sales team feedback, re-run fact-check 依業務團隊回饋重寫草稿，重新事實查核

Purpose:用途： When sales team provides feedback (e.g., "too formal", "wrong product"), regenerate the draft incorporating feedback while maintaining quality.當業務團隊提供回饋（如「太正式」「產品錯了」），整合回饋重新生成草稿並維持品質。

Feedback Categories回饋類別

Category類別	Examples範例	Action動作
Tone adjustment語氣調整	"too formal", "more friendly"「太正式」「更友善」	Rewrite tone, keep content重寫語氣，保留內容
Factual correction事實修正	"wrong product", "price is different"「產品錯了」「價格不對」	Fix facts, re-run fact-check修正事實，重跑 fact-check
Content addition內容新增	"mention the warranty"「提一下保固」	Add content, maintain flow新增內容，保持流暢
Content removal內容移除	"don't mention competitor"「不要提競品」	Remove, adjust transitions移除，調整銜接
Strategic redirect策略重定向	"they're actually a contractor"「他們其實是承包商」	Re-run persona detection, rewrite重跑 persona detection，重寫

Infra Skills — System maintenance Infra Skills — 系統維護

I precache Write drafts to Supabase for <2s Chrome Extension load time 寫入 Supabase 確保 Chrome Extension <2 秒載入

Purpose:用途： After pipeline produces drafts, write to Supabase email_suggestions table so Chrome Extension loads instantly.Pipeline 產出草稿後，寫入 Supabase email_suggestions 表，讓 Chrome Extension 即時載入。

Steps步驟

Collect all drafts from workspace/drafts/從 workspace/drafts/ 收集所有草稿
Upsert to Supabase (newer version wins)Upsert 到 Supabase（較新版本為準）
Clean expired entries清理過期項目
Verify cache hit rate >80%驗證 cache 命中率 >80%

Model模型

No AI — pure Python (Supabase SDK)不用 AI — 純 Python（Supabase SDK）

I pdca-update Process feedback → update wiki + corrections-log → content signals 處理回饋 → 更新 wiki + corrections-log → 內容訊號

Purpose:用途： Turn sales team feedback into system improvements via the PDCA cycle. Every feedback flows through: Plan (analyze) → Do (update knowledge) → Check (verify) → Act (prevent recurrence).透過 PDCA 循環將業務團隊的回饋轉化為系統改善。每個回饋經過：Plan（分析）→ Do（更新知識庫）→ Check（驗證）→ Act（防止再犯）。

Feedback Categories回饋類別

Category類別	Action動作	Update Target更新目標
Factual error事實錯誤	Fix product fact修正產品事實	corrections-log + product-facts
Tone mismatch語氣不符	Adjust persona tone調整角色語氣	reply-rules.md
Missing knowledge缺少知識	Add product info新增產品資訊	product-facts.md (requires verification)product-facts.md（需驗證）
Content signal內容訊號	3+ same question = article idea3+ 相同問題 = 文章靈感	content-plan.md

Guard防護

Product-facts updates require: verified customer interaction + cross-reference + human acceptance. No auto-accept.Product-facts 更新需要：已驗證的客戶互動 + 交叉比對 + 人工核准。不能自動接受。

I audit 3-Model blind evaluation + corrections compliance + architecture health 3-Model 盲評 + corrections 合規 + 架構健康度

Purpose:用途： Independent quality verification. No team should grade its own work — uses 3 different AI model families (Claude Sonnet, Gemini Flash, Codex) to blindly evaluate drafts.獨立品質驗證。任何團隊不應自我評分 — 使用 3 個不同 AI 模型家族（Claude Sonnet, Gemini Flash, Codex）盲評草稿。

4-Part Audit四部分稽核

3-Model Blind Eval: Sample 5 drafts → send to 3 models without showing scores → analyze disagreements抽樣 5 份草稿 → 送給 3 個 model，不顯示分數 → 分析分歧
Corrections Compliance: Scan ALL drafts for corrections-log repeats (zero tolerance)掃描所有草稿檢查 corrections-log 重犯（零容忍）
Architecture Health: Redundancy check, orphan skills, policy drift冗餘檢查、孤兒 skill、policy 偏移
Customer Reactions: Reply rates, tone, conversions for sent drafts回覆率、語氣、已發送草稿的轉換率

Models模型

Claude Sonnet + Gemini Flash + Codex (3 different families = no bias)Claude Sonnet + Gemini Flash + Codex（3 個不同家族 = 無偏見）

One Sentence Summary 一句話總結

Three-Layer Architecture 三層架構

Layer 1: Agent Boss

Layer 2: Python Skeleton

Layer 3: Agent Judgment

Why Agent Boss為什麼要 Agent Boss

Why Python Skeleton為什麼要 Python 骨架

Why Skill Runners為什麼要 Skill Runner

How the System Works 系統如何運作

PDCA Continuous Improvement PDCA 持續改善

Developer Guide: Change Behavior vs Change Flow 開發者指南：改行為 vs 改流程

Change behavior → edit SKILL.md or policies/*.md (no code change needed) 改行為 → 改 SKILL.md 或 policies/*.md（不需要改程式碼）

Change flow → edit Python skeleton (daily_morning.py) 改流程 → 改 Python 骨架（daily_morning.py）

When do you need Python at all? (vs. pure SKILL.md) 什麼時候需要 Python？（vs. 純 SKILL.md）

The restaurant analogy餐廳比喻

Decision guide決策指南

Skill Runner: how behavior changes without code changes Skill Runner：如何不改程式碼就改行為

S → Skills: The Mapping S → Skills：對應關係

Multi-Model Routing 多模型路由

Gate Check Rules Gate Check 規則

Quality Threshold品質門檻 G2

3-Layer Analysis三層分析 M-R6

Peak Stage & Persona峰值階段與角色 M-R6

Zero Corrections Repeat零重犯 M-R2

Draft Completeness草稿完整性 Basic

On Failure失敗處理

File Architecture 檔案架構

Responsibility Matrix 職責矩陣

Skill Library Skill 資料庫

Role Detection Patterns角色偵測模式

Confidence Scoring信心度評分

Reads讀取檔案

Ambiguity Resolution模糊性處理

Requirements Extraction需求擷取

Reads讀取檔案

Checks Performed執行的檢查

Severity嚴重等級

Pipeline流程

Priority Rules優先級規則

Depth Matrix深度矩陣

Over-research Guard過度研究防護

Model模型

3-Layer Analysis三層分析

Peak Stages峰值階段

Draft Structure草稿結構

Reads讀取檔案

Model模型

Feedback Categories回饋類別

Steps步驟

Model模型

Feedback Categories回饋類別

Guard防護

4-Part Audit四部分稽核

Models模型

Change behavior → edit SKILL.md or policies/.md (no code change needed) 改行為 → 改 SKILL.md 或 policies/.md（不需要改程式碼）