watersonusa.ai · HSW-002 OGSM v5.1 Final

OGSM v5.1 — Polish Loop 完成

HSW-002 · commit f59e3fc · 2026-04-11 · v5.md 從 1516 → 1728 行

Diff 標示：綠色 = 新增內容黃色 = 修改內容

這個 polish loop 產出了什麼

19-agent 變更摘要

Agent	最主要的改動
👑 Commander	Line 146 G4 矛盾修正：skill commands 改為 pointer 不再矛盾 line 169 wrapper 要求 · +5 M bullets 補齊 5 個 S→M orphans · +M12 canonical persona 檔案驗證（Q1 Option A：Commander 只驗證存在，缺檔 escalate Writer A，不載入內容） · Data-layer fix：`skill-invocation-map.md` 新增 `## Role: commander` section
🔍 Investigator A	P-015 WebSearch migration：Research tool 改為 WebSearch primary，`/ai-fallback` 只用於 generative sub-tasks · URL authenticity post-process check · P-016 escalation chain 升為 LLM-harness primary path（非 fallback）
🔎 Investigator B	P-019 NEW-03 forbidden phrase list（從 Inv A 同構傳過來，法規+費用雙 hard count 版本）· Line 1039 raw `gemini` CLI → wrapper + WebSearch · 5 州 AHJ-adoption delta 驗證 + paywall silent-drop forbid
✍️ Writer A	Arc-level voice M proxy（Tier B）：每 4 slide 抽樣 Gemini persona 評分 ≥3.5 · Substrate-gap analogy fidelity note · Title-layer voice gate（封殺行銷式標題）· Tier C hook_for_next_writer（producer side）4 欄 Markdown：hook_slide_id / callback_theme / door_scenario_details / voice_carryover（Q3a 新增）
✍️ Writer B	Tier C handoff consumer side：slide 13 MUST consume `handoff.md`，禁止 cold-start bypass，缺檔 escalate Commander · Industry Researcher 不足時硬錯誤處理
🎨 Engagement Designer (v2)	SCN-ED-003 subcategory taxonomy：改為 Compliance Reviewer canonical 功能類別（hanging/securing/closing/controlling/protective），不用錯的產品型態 · SCN-ED-004 self-derive slide roles：不從 Content Director import，自己從 Writer A/B 輸出推導，產出 `slide_role_derivation.md`
📋 Content Director	Per-phase pacing M（≤10% deviation from baseline）· narrative_through_line 顯式 section 要求 · NOT/INSTEAD recommendation format（element-level，禁 hand-wave）
✅ Compliance Reviewer	Provider # 位置稽核 M · HSW block-by-block substantive rubric（75% 門檻） · Primary-frame classification rule · 20% promo 分鐘基單位 · Gate 4 pointer（Puppeteer render-based final check）
📝 Copy Editor	角度單位 `°` 慣例 M · `<<SME-REVIEW>>` marker 強制用於無法解析的技術內容
🔢 Fact Checker	P-015 WebSearch primary（取代 Gemini Pro with Google Search grounding）· P-019 NEW-03 forbidden phrase list（7-phrase） + first-party URL 結構規則 + under-delivery escape clause
📎 Source Reviewer	Anti-pattern #5 opinion vs empirical 判別（first-person 必要但不充分，任何 sample/metric/time/direction signal 即強制 flag）· `<p class="source-note">` 容器 class 強制 · Fact Checker 5% budget cross-reference + reconciliation table 顯式 M deliverable
🎯 Project Architect Advisor	Canonical persona file reference（`~/.claude/personas/project-architect-marcus.md`）· Q2 Option B：Gemini 2.5 Pro 維持 persona primary（新 clawcode API key），Claude Opus 降為 documented degraded-mode fallback only · Time-budget realism gate
💼 Sales Rep Advisor	Line 679 persona owner：Gemini 2.5 Pro primary + Claude Sonnet fallback · Line 689 inline 命令完整刪除，改為純 pointer → line 1044 Model Invocation Map 唯一真相，永久消除 drift surface
🔄 Fresh Eyes Reviewer	Tier C Option C 結構性矛盾修正：`clean_draft_asserted: true` schema + exit path，3 個必要欄位（per_section_coverage + clean_draft_rationale）· `factual_accuracy_check` 第 6 override 欄位（factual errors 不屬 blindspot class 1/2/3）· `raw_class_reassignment_count` 50% actuation threshold
💻 Engineer (HTML)	Cadence window 從 Engagement Designer handoff 繼承 · Body ≥16px 來源標為 Waterson house rule（非 WCAG） · Post-test Phase-2 fallback 判定規則化
📊 Performance Supervisor (v2)	Hard `systemic` 定義：綁 (same-agent, same-failure-class)，列 5 合法 failure class，禁止 cross-class / cross-agent 累加 · skill-invocation-gap three-condition lock：(a) content 唯一合法 (b) IMMUTABLE (c) QA 退件觸發
🔍 Quality Auditor	Testable claim 定義 · Reverse-index check（lane-scoped 分母） · Scope-creep + threshold-drift 2 個 anti-patterns（含「讀上游 audit ≠ 重跑驗證管線」）· QA Fresh Eyes enforcement teeth addendum：審核每個 `false_clean_assert` 案例，BLOCKER 判定
🎓 Learning Outcome Validator (v2)	Coverage matrix（LO × Q） + per-persona score schema（含 `prior_belief_seed`）· `halt_gate: <int>` machine-readable flag（wave-agnostic）· Threshold drift anti-pattern 4 modes（含 persona recalibration）· P-015 WebSearch 2 條 bright-line 規則
🗃️ Candidate Collector	3 個新 CLI metadata flag：`--origin-agent` / `--source-wave` / `--scan-wave`（同步到 Tier 1 + S + Map row）· Auto-fill 契約 for non-CC callers（skill 層預設 `origin_agent = source_agent`，豁免 fail-closed）· Shell-escape idiom 文件化

3 個 recursive self-correction 勝利

這是 4-robot architecture 第一次實際跑完整的 REJECT → v2 → re-audit → ACCEPT-V2 循環。全程 zero user intervention。

🎨 Engagement Designer — 從 REJECT 到 ACCEPT-V2

v1 Blocker 1：Robot 3 v1 用錯分類法（產品型態 cam lift / spring hinge / ...），但 Compliance Reviewer line 483 canonical 是功能類別（hanging / securing / closing / controlling / protective）。Robot 4 catch：v1 patch 直接落地會跟 Robot 2 deliverable 自相矛盾。

v1 Blocker 2：Robot 3 v1 要求 Content Director 產出 7-role narrative map，但 Content Director M schema 根本沒有這個欄位 → 會造成 subagent 在 first dispatch 就 halt 在無限迴圈等待不存在的輸入。

v2 fix：改用 Compliance Reviewer canonical 分類 + Engagement Designer 自己 derive slide roles（decision tree + slide_role_derivation.md）。Content Director 零隱含義務。

Result: 3/5 PASS → 5/5 stable PASS

📊 Performance Supervisor — 從 APPROVE+blockers 到 ACCEPT-V2

v1 Blocker 1：systemic 定義有 3 個漏洞（未綁 same-agent、未說明 ≥2 waves 是否連續、OR 分支太緊）

v1 Blocker 2：skill-invocation-gap → NOT format 時間壓力下可改標 architect-perspective 繞過

v2 fix：hard-rule block 綁 (same-agent, same-failure-class) + 列 5 failure class + 工作示例。skill-invocation-gap 改 three-condition lock (a)(b)(c)。

🎓 Learning Outcome Validator — 從 CONDITIONAL 到 ACCEPT-V2

v1 H4 HOLD: wave3_blocked 命名不一致，Gate 3 無消費者

v1 H7 HOLD: WebSearch 邊界模糊，誤導性 Slide-07 NFPA 80 例子

v1 H6 minor: persona recalibration 不被 threshold drift 覆蓋

v2 fix: rename → halt_gate: <int> + inline downstream-coupling 警告 + residual #6 Commander follow-up。WebSearch 改兩條 bright-line + IBC 2021 double-egress 真實例子。第 4 種 drift mode 加 persona recalibration + H2 schema 加 prior_belief_seed verbatim 欄位讓 recalibration 可偵測。

跨 agent pattern propagation 完整落地

🛡️ P-019 NEW-03 forbidden phrase — 3 個 agent 同步保護

Wave α Robot 1 在 Investigator A 發現的 fabricated-count pattern，Wave γ 自動傳播到 Investigator A（原始）、Investigator B（code+cost 版本 7 phrases）、Fact Checker（verification 版本 7 phrases + first-party URL 結構規則）。

G-022 scope-aware Iterator 能做到的事——過去 1-subagent mode 的 scope creep 抑制會讓每個 agent 各自錯過 pattern。

🌐 P-015 WebSearch-primary — 5 個 research agent migration

Investigator A — Hard Constraint 改「研究搜尋」tool 欄位化
Investigator B — line 1039 raw gemini CLI 改 wrapper + WebSearch
Fact Checker — S line 552 替換 "Gemini Pro with Google Search grounding"
Learning Outcome Validator — WebSearch 限定 pre-simulation 事實檢索
Project Architect Advisor — persona routing Q2 Option B 實作

🔧 Data-layer finding — factory 自己的知識庫也有洞

Commander Robot 2 Wave β 實測 get_skills_for_role.sh commander 回「role not found」。不只 spec 層沒寫 commander 的 skill invocation，data 層的 skill-invocation-map.md 也缺 commander row。Wave γ Commander Iterator 的 patch 特別包含 data-layer hunk，不改 v5.md，改 ~/.claude/skills/ogsm-framework/references/skill-invocation-map.md 新增 ## Role: commander section。

Validators — 全 PASS

Validator	Target	Result
`validate_ogsm_completeness.py`	19/19 agents	✅ PASS
`validate_s_to_m_coverage.py`	14 S-invoking agents, all M-matched	✅ PASS
`check_skill_architecture.py`	19/19	✅ PASS
`check_ai_fallback_usage.py`	0 direct model calls	✅ PASS

User decisions applied

Q	決定	實作
Q1	Option A — Commander 驗證檔案存在，缺檔 escalate Writer A	Commander M12 改為 validation-only，不載入 persona 內容
Q2	Option B — Gemini Pro 維持 primary（新 API key 解決 quota）	PA Advisor Tier 1 Model commands: Gemini Pro primary + Opus degraded-mode fallback
Q3a	Yes — 加第 4 欄 `voice_carryover`	Writer A hook_for_next_writer 4 欄 schema
Q3b	Reasonableness 合理性檢查	Content Director 檢查深度 = reasonableness，不只 presence
Q3c	Markdown sub-section	hook_for_next_writer 渲染格式 = Markdown
Q4	Option A — 同 batch 處理 Fresh Eyes teeth	QA addendum 加 `false_clean_assert` 審核程序
+原則	Commander load 不要太重	Commander 新 M bullets 全部為 validation-only，不做 content loading

Known follow-ups（已 file，非 fixed）

Commander Gate 3 (L1286) 需要下一個 polish cycle grep 新的 halt_gate front-matter 欄位（目前 documentation-only per LO Validator v2 hunk 4）
Writer A arc-level voice 3.5 threshold 需要 Tier B calibration cycle（跑 Robot 2 slide 7 body 期待 <2.5 + known-good passage 期待 ≥4.0，取中點）
Fresh Eyes 50% raw_class reassignment actuation threshold 需要 empirical data tuning
Known Issue #8 追蹤 QA reverse-index mechanics 在 HSW-006 scale
其他 cross-agent follow-up tickets（Investigator A 的 Gemini query string neutrality、Commander Dispatch Template 的 verbatim hard rule 等）

下一步

v5.1 是 polish loop 的第一次完整循環成果。下一個 polish loop (v5.2) 將針對：

Tier B calibration cycle（Writer A voice threshold、Fresh Eyes 50% threshold）
Commander Gate 3 消費者補齊
Cross-agent follow-up tickets
Known Issues 1-8 的後續追蹤

實際的下一步是：拿 v5.1 去跑 真的生產 HSW-006 或新課程的 Wave 1，看打磨過的 factory 能不能穩定產出好課程——這才是最終驗證。Factory 的價值不是修自己，是打磨好之後能順利產出好課程。