watersonusa.ai · HSW-002 OGSM v5.1 Final

OGSM v5.1 — Polish Loop 完成

HSW-002 · commit f59e3fc · 2026-04-11 · v5.md 從 1516 → 1728 行

🎉 19 個 agent 全部完成 4-Robot Factory Polish

4 個 validator 全 PASS · 0 個 rollback · 0 個 validator failure unfixed

Wave α (Spec Verifier) → Wave β (Dispatch Harness) → Wave γ (Iterator) → Wave δ (Quality Auditor) → Apply

Diff 標示: 綠色 = 新增內容 黃色 = 修改內容

這個 polish loop 產出了什麼

版本:v5.1 · 前一版 v5 → commit f59e3fc · 175 行新增 / 37 行刪除

19 個 agent 全部被 4 個角色隔離的 robot 審查、修正、驗證。過程中 3 個 patch 被 Robot 4 抓到需要 rework,成功透過 recursive self-correction v2 cycle 修好,**zero user intervention**。

19-agent 變更摘要

Agent最主要的改動
👑 Commander Line 146 G4 矛盾修正:skill commands 改為 pointer 不再矛盾 line 169 wrapper 要求 · +5 M bullets 補齊 5 個 S→M orphans · +M12 canonical persona 檔案驗證(Q1 Option A:Commander 只驗證存在,缺檔 escalate Writer A,不載入內容) · Data-layer fixskill-invocation-map.md 新增 ## Role: commander section
🔍 Investigator A P-015 WebSearch migration:Research tool 改為 WebSearch primary,/ai-fallback 只用於 generative sub-tasks · URL authenticity post-process check · P-016 escalation chain 升為 LLM-harness primary path(非 fallback)
🔎 Investigator B P-019 NEW-03 forbidden phrase list(從 Inv A 同構傳過來,法規+費用雙 hard count 版本)· Line 1039 raw gemini CLI → wrapper + WebSearch · 5 州 AHJ-adoption delta 驗證 + paywall silent-drop forbid
✍️ Writer A Arc-level voice M proxy(Tier B):每 4 slide 抽樣 Gemini persona 評分 ≥3.5 · Substrate-gap analogy fidelity note · Title-layer voice gate(封殺行銷式標題)· Tier C hook_for_next_writer(producer side)4 欄 Markdown:hook_slide_id / callback_theme / door_scenario_details / voice_carryover(Q3a 新增)
✍️ Writer B Tier C handoff consumer side:slide 13 MUST consume handoff.md,禁止 cold-start bypass,缺檔 escalate Commander · Industry Researcher 不足時硬錯誤處理
🎨 Engagement Designer (v2) SCN-ED-003 subcategory taxonomy:改為 Compliance Reviewer canonical 功能類別(hanging/securing/closing/controlling/protective),不用錯的產品型態 · SCN-ED-004 self-derive slide roles:不從 Content Director import,自己從 Writer A/B 輸出推導,產出 slide_role_derivation.md
📋 Content Director Per-phase pacing M(≤10% deviation from baseline)· narrative_through_line 顯式 section 要求 · NOT/INSTEAD recommendation format(element-level,禁 hand-wave)
✅ Compliance Reviewer Provider # 位置稽核 M · HSW block-by-block substantive rubric(75% 門檻) · Primary-frame classification rule · 20% promo 分鐘基單位 · Gate 4 pointer(Puppeteer render-based final check)
📝 Copy Editor 角度單位 ° 慣例 M · <<SME-REVIEW>> marker 強制用於無法解析的技術內容
🔢 Fact Checker P-015 WebSearch primary(取代 Gemini Pro with Google Search grounding)· P-019 NEW-03 forbidden phrase list(7-phrase) + first-party URL 結構規則 + under-delivery escape clause
📎 Source Reviewer Anti-pattern #5 opinion vs empirical 判別(first-person 必要但不充分,任何 sample/metric/time/direction signal 即強制 flag)· <p class="source-note"> 容器 class 強制 · Fact Checker 5% budget cross-reference + reconciliation table 顯式 M deliverable
🎯 Project Architect Advisor Canonical persona file reference~/.claude/personas/project-architect-marcus.md)· Q2 Option B:Gemini 2.5 Pro 維持 persona primary(新 clawcode API key),Claude Opus 降為 documented degraded-mode fallback only · Time-budget realism gate
💼 Sales Rep Advisor Line 679 persona owner:Gemini 2.5 Pro primary + Claude Sonnet fallback · Line 689 inline 命令完整刪除,改為純 pointer → line 1044 Model Invocation Map 唯一真相,永久消除 drift surface
🔄 Fresh Eyes Reviewer Tier C Option C 結構性矛盾修正clean_draft_asserted: true schema + exit path,3 個必要欄位(per_section_coverage + clean_draft_rationale)· factual_accuracy_check 第 6 override 欄位(factual errors 不屬 blindspot class 1/2/3)· raw_class_reassignment_count 50% actuation threshold
💻 Engineer (HTML) Cadence window 從 Engagement Designer handoff 繼承 · Body ≥16px 來源標為 Waterson house rule(非 WCAG) · Post-test Phase-2 fallback 判定規則化
📊 Performance Supervisor (v2) Hard systemic 定義:綁 (same-agent, same-failure-class),列 5 合法 failure class,禁止 cross-class / cross-agent 累加 · skill-invocation-gap three-condition lock:(a) content 唯一合法 (b) IMMUTABLE (c) QA 退件觸發
🔍 Quality Auditor Testable claim 定義 · Reverse-index check(lane-scoped 分母) · Scope-creep + threshold-drift 2 個 anti-patterns(含「讀上游 audit ≠ 重跑驗證管線」)· QA Fresh Eyes enforcement teeth addendum:審核每個 false_clean_assert 案例,BLOCKER 判定
🎓 Learning Outcome Validator (v2) Coverage matrix(LO × Q) + per-persona score schema(含 prior_belief_seed)· halt_gate: <int> machine-readable flag(wave-agnostic)· Threshold drift anti-pattern 4 modes(含 persona recalibration)· P-015 WebSearch 2 條 bright-line 規則
🗃️ Candidate Collector 3 個新 CLI metadata flag--origin-agent / --source-wave / --scan-wave(同步到 Tier 1 + S + Map row)· Auto-fill 契約 for non-CC callers(skill 層預設 origin_agent = source_agent,豁免 fail-closed)· Shell-escape idiom 文件化

3 個 recursive self-correction 勝利

這是 4-robot architecture 第一次實際跑完整的 REJECT → v2 → re-audit → ACCEPT-V2 循環。全程 zero user intervention。

🎨 Engagement Designer — 從 REJECT 到 ACCEPT-V2

v1 Blocker 1:Robot 3 v1 用錯分類法(產品型態 cam lift / spring hinge / ...),但 Compliance Reviewer line 483 canonical 是功能類別(hanging / securing / closing / controlling / protective)。Robot 4 catch:v1 patch 直接落地會跟 Robot 2 deliverable 自相矛盾。

v1 Blocker 2:Robot 3 v1 要求 Content Director 產出 7-role narrative map,但 Content Director M schema 根本沒有這個欄位 → 會造成 subagent 在 first dispatch 就 halt 在無限迴圈等待不存在的輸入。

v2 fix:改用 Compliance Reviewer canonical 分類 + Engagement Designer 自己 derive slide roles(decision tree + slide_role_derivation.md)。Content Director 零隱含義務。

Result: 3/5 PASS → 5/5 stable PASS

📊 Performance Supervisor — 從 APPROVE+blockers 到 ACCEPT-V2

v1 Blocker 1systemic 定義有 3 個漏洞(未綁 same-agent、未說明 ≥2 waves 是否連續、OR 分支太緊)

v1 Blocker 2skill-invocation-gap → NOT format 時間壓力下可改標 architect-perspective 繞過

v2 fix:hard-rule block 綁 (same-agent, same-failure-class) + 列 5 failure class + 工作示例。skill-invocation-gap 改 three-condition lock (a)(b)(c)。

🎓 Learning Outcome Validator — 從 CONDITIONAL 到 ACCEPT-V2

v1 H4 HOLD: wave3_blocked 命名不一致,Gate 3 無消費者

v1 H7 HOLD: WebSearch 邊界模糊,誤導性 Slide-07 NFPA 80 例子

v1 H6 minor: persona recalibration 不被 threshold drift 覆蓋

v2 fix: rename → halt_gate: <int> + inline downstream-coupling 警告 + residual #6 Commander follow-up。WebSearch 改兩條 bright-line + IBC 2021 double-egress 真實例子。第 4 種 drift mode 加 persona recalibration + H2 schema 加 prior_belief_seed verbatim 欄位讓 recalibration 可偵測。

跨 agent pattern propagation 完整落地

🛡️ P-019 NEW-03 forbidden phrase — 3 個 agent 同步保護

Wave α Robot 1 在 Investigator A 發現的 fabricated-count pattern,Wave γ 自動傳播到 Investigator A(原始)、Investigator B(code+cost 版本 7 phrases)、Fact Checker(verification 版本 7 phrases + first-party URL 結構規則)。

G-022 scope-aware Iterator 能做到的事——過去 1-subagent mode 的 scope creep 抑制會讓每個 agent 各自錯過 pattern。

🌐 P-015 WebSearch-primary — 5 個 research agent migration

🔧 Data-layer finding — factory 自己的知識庫也有洞

Commander Robot 2 Wave β 實測 get_skills_for_role.sh commander 回「role not found」。不只 spec 層沒寫 commander 的 skill invocation,data 層skill-invocation-map.md 也缺 commander row。Wave γ Commander Iterator 的 patch 特別包含 data-layer hunk,不改 v5.md,改 ~/.claude/skills/ogsm-framework/references/skill-invocation-map.md 新增 ## Role: commander section。

Validators — 全 PASS

ValidatorTargetResult
validate_ogsm_completeness.py19/19 agents✅ PASS
validate_s_to_m_coverage.py14 S-invoking agents, all M-matched✅ PASS
check_skill_architecture.py19/19✅ PASS
check_ai_fallback_usage.py0 direct model calls✅ PASS

User decisions applied

Q決定實作
Q1Option A — Commander 驗證檔案存在,缺檔 escalate Writer ACommander M12 改為 validation-only,不載入 persona 內容
Q2Option B — Gemini Pro 維持 primary(新 API key 解決 quota)PA Advisor Tier 1 Model commands: Gemini Pro primary + Opus degraded-mode fallback
Q3aYes — 加第 4 欄 voice_carryoverWriter A hook_for_next_writer 4 欄 schema
Q3bReasonableness 合理性檢查Content Director 檢查深度 = reasonableness,不只 presence
Q3cMarkdown sub-sectionhook_for_next_writer 渲染格式 = Markdown
Q4Option A — 同 batch 處理 Fresh Eyes teethQA addendum 加 false_clean_assert 審核程序
+原則Commander load 不要太重Commander 新 M bullets 全部為 validation-only,不做 content loading

Known follow-ups(已 file,非 fixed)

下一步

v5.1 是 polish loop 的第一次完整循環成果。下一個 polish loop (v5.2) 將針對:

  1. Tier B calibration cycle(Writer A voice threshold、Fresh Eyes 50% threshold)
  2. Commander Gate 3 消費者補齊
  3. Cross-agent follow-up tickets
  4. Known Issues 1-8 的後續追蹤

實際的下一步是:拿 v5.1 去跑 真的生產 HSW-006 或新課程的 Wave 1,看打磨過的 factory 能不能穩定產出好課程——這才是最終驗證。Factory 的價值不是修自己,是打磨好之後能順利產出好課程。