OGSM v5.1-beta — Wave β Dispatch Harness 完成（19/19）| HSW-002 Polish Loop

OGSM v5.1-beta — Wave β Dispatch Harness 完成

HSW-002 Polish Loop · 2026-04-11 · 19/19 Robot 2 (Dispatch Harness) 完成

Wave β 做了什麼

19 個獨立的 Robot 2 (Dispatch Harness) 同時跑，每個 subagent：

只讀 Robot 1 的「BDD scenarios」和「test input」段落（強制不讀 mental-run / verdict / gap list → 防 self-fulfilling prophecy）

扮演 target agent，消費 test input，產出**實際 deliverable**

把實際 deliverable 對 5 個 BDD scenario 一條一條 grade（actual PASS/FAIL，不是 mental-run）

記錄執行期遇到的 spec ambiguities、infrastructure issues、fallback triggers

所有 LLM 呼叫強制走 call_with_fallback.sh wrapper（無 raw CLI）

19-agent Wave β 實測結果

#	Agent	Scenario PASS/FAIL	關鍵實測發現
1	Investigator A	5/5 PASS	誠實 N=0 under-delivery，line 236 禁用短語防線 work
2	Writer A	3 PASS + 2 FAIL	Gap 4 LIVE BUG 確認: LLM 實際滑進教科書語氣，anti-pattern 無 detection
3	Fresh Eyes Reviewer	FE-R2 HARD FAIL	Robot 2 拒絕 fabricate，結構性矛盾實證為 blocker
4	Copy Editor	5/5 PASS	21/21 defect catch rate + 0 false positive（最佳結果）
5	Compliance Reviewer	5/5 loose / 2/5 strict	4/4 planted issues 全 CAUGHT，Gemini 中立性評分 2.6/5 < 4.0
6	Content Director	5/5 PASS (3 caveat)	pacing / through-line / NOT-INSTEAD 三個 Tier B 目標確認
7	Engineer (HTML)	10/10 PASS	4 ambiguity（cadence window 繼承 / 16px floor / post-test fallback）
8	Engagement Designer	5/5 PASS (3 ad hoc)	0 LLM call，自律拒絕 prose-regeneration；SCN-ED-004 需要 paired S input-contract
9	Fact Checker	PASS behavior / FAIL spec	🚨 順便發現 v5 課程 3 個事實錯誤：ASCE 7-22 Ch26=Wind Loads / NFPA 80 ≠ 15 lbf / CSC ≠ NMS maintainer
10	Writer B	5/5 PASS (S5 caveat)	cross-author handoff 只因 Robot 1 直接 inject 才過，production 會斷
11	Commander	5/5 behavior PASS	🚨 META gap: `get_skills_for_role.sh commander` → role not found，factory 自己的知識庫有 data hole
12	Investigator B	4 PASS + 1 PARTIAL	4 次 wrapper 呼叫全合規，誠實 HARD COUNT REACHED AT 3；Robot 1 fixture 選錯 NFPA 80 section（meta-finding）
13	Source Reviewer	5 PASS + 3 conditional	opinion vs empirical 邊界規則建立（第一人稱必要但不充分）
14	Sales Rep Advisor	B3 FAIL 確認	byte-level drift 實證：line 689 vs 1044 不一致，現有 validator 沒抓到
15	Project Architect Advisor	5/5 PASS	Gemini 耗盡 → Claude Opus self-sim 降級（spec line 661），persona 6/6 第一人稱
16	Learning Outcome Validator	1 PARTIAL + 3 PASS + 1 FAIL by design	aggregate 77% < 80%，wave3_blocked 正確觸發，未下調門檻
17	Quality Auditor	3/3 hidden catches	全部 3 個隱藏問題抓到，BLOCK Wave 3 verdict
18	Performance Supervisor	5/5 PASS	skill-invocation gap audit 實測成功（gap=2 literal count）
19	Candidate Collector	4 PASS + 1 PARTIAL	S2 byte-exact 驗證通過；S5 flag-candidate CLI 缺 3 個欄位

Agent

Scenario PASS/FAIL

關鍵實測發現

Investigator A

5/5 PASS

誠實 N=0 under-delivery，line 236 禁用短語防線 work

Writer A

3 PASS + 2 FAIL

Gap 4 LIVE BUG 確認: LLM 實際滑進教科書語氣，anti-pattern 無 detection

Fresh Eyes Reviewer

FE-R2 HARD FAIL

Robot 2 拒絕 fabricate，結構性矛盾實證為 blocker

Copy Editor

5/5 PASS

21/21 defect catch rate + 0 false positive（最佳結果）

Compliance Reviewer

5/5 loose / 2/5 strict

4/4 planted issues 全 CAUGHT，Gemini 中立性評分 2.6/5 < 4.0

Content Director

5/5 PASS (3 caveat)

pacing / through-line / NOT-INSTEAD 三個 Tier B 目標確認

Engineer (HTML)

10/10 PASS

4 ambiguity（cadence window 繼承 / 16px floor / post-test fallback）

Engagement Designer

5/5 PASS (3 ad hoc)

0 LLM call，自律拒絕 prose-regeneration；SCN-ED-004 需要 paired S input-contract

Fact Checker

PASS behavior / FAIL spec

🚨 順便發現 v5 課程 3 個事實錯誤：ASCE 7-22 Ch26=Wind Loads / NFPA 80 ≠ 15 lbf / CSC ≠ NMS maintainer

Writer B

5/5 PASS (S5 caveat)

cross-author handoff **只因 Robot 1 直接 inject** 才過，production 會斷

Commander

5/5 behavior PASS

🚨 META gap: get_skills_for_role.sh commander → role not found，factory 自己的知識庫有 data hole

Investigator B

4 PASS + 1 PARTIAL

4 次 wrapper 呼叫全合規，誠實 HARD COUNT REACHED AT 3；Robot 1 fixture 選錯 NFPA 80 section（meta-finding）

Source Reviewer

5 PASS + 3 conditional

opinion vs empirical 邊界規則建立（第一人稱必要但不充分）

Sales Rep Advisor

B3 FAIL 確認

byte-level drift 實證：line 689 vs 1044 不一致，現有 validator 沒抓到

Project Architect Advisor

5/5 PASS

Gemini 耗盡 → Claude Opus self-sim 降級（spec line 661），persona 6/6 第一人稱

Learning Outcome Validator

1 PARTIAL + 3 PASS + 1 FAIL by design

aggregate 77% < 80%，wave3_blocked 正確觸發，未下調門檻

Quality Auditor

3/3 hidden catches

全部 3 個隱藏問題抓到，BLOCK Wave 3 verdict

Performance Supervisor

5/5 PASS

skill-invocation gap audit 實測成功（gap=2 literal count）

Candidate Collector

4 PASS + 1 PARTIAL

S2 byte-exact 驗證通過；S5 flag-candidate CLI 缺 3 個欄位

Wave α 預測 vs Wave β 實測對照

✅ Robot 1 mental-run 預測高度準確

Sales Rep Advisor B3 drift: 預測 PASS with drift risk → 實測 byte-level FAIL 確認
Fresh Eyes 結構性矛盾: 預測 FE-R2 FAIL → Robot 2 拒絕 fabricate clean draft，實證 blocker
Fact Checker NEW-03 缺失: 預測主要 FAIL → 實測抓到 6 次 forbidden phrase hit，但 spec 沒強制
Quality Auditor 3 hidden issues: 預測全部可抓 → 實測 3/3 全 CAUGHT，BLOCK verdict
Writer A Gap 4: 預測 arc-level voice 無 M proxy → 實測 LIVE 教科書語氣 drift

⚠ 新發現（Wave α 沒預期的）

1. Factory 自己的知識庫有 data gap（META-level）

Commander 實測呼叫 get_skills_for_role.sh commander → role not found。Robot 1 以為是 spec-text gap，實際是 ogsm-framework skill 的 references data 也缺。修 v5.md 不夠，要修 data file。→ G-025 候選

2. v5 課程內容本身有事實錯誤

Fact Checker 認真跑，意外發現 v5 打算教的內容本身是錯的：

ASCE 7-22 Chapter 26 是 Wind Loads，不含門開啟力
NFPA 80 沒有 15 lbf 關門力規定（15 lbf 是 ICC A117.1 / ADA 的**開門**力值，opening vs closing 混淆）
CSC 不是 NMS 的維護者，PSPC（加拿大政府）才是

這三個錯誤代表 v5 spec 作為指導文件沒問題，但**真的產出課程時**必須被 Fact Checker 阻擋。Wave γ 不修這些（不在 spec scope），但會強化 Fact Checker 的 anti-pattern 保護。

3. Robot 1 fixture 自己有瑕疵

Investigator B Robot 2 發現：Robot 1 選的 NFPA 80 §6.1.5 實際是「Inspection of Door Assemblies」，不是 hinge-specific（§6.4.2.1 才是）。Robot 1 也會犯錯——這是 4-robot 架構的 recursive 價值：**Robot 2 也審查 Robot 1**。

4. G-020 wrapper bug 直接被 Writer A 觀察到

Writer A Robot 2 實際觸發 call_with_fallback.sh single-model chain + 429，得到 exit 3 + empty stdout/stderr。完美對應 Robot 1 Fresh Eyes 的 G-020 預測。

🚨 Infrastructure：Gemini quota 全線耗盡

Wave β 期間，Gemini 所有 tier 都在撞 429 RESOURCE_EXHAUSTED：

Free-tier（20/day）: Project Architect Advisor 撞到
Lite REST paid: Learning Outcome Validator / Writer A / Investigator B timeout 撞到
Pro tier: Writer A 也 429 → fallback 到 flash
Single-model chain + 429 = exit 3 silent: Writer A 觀察，G-020 實證

但**所有 subagent 都優雅降級**（spec-defined fallback 路徑 work），沒有任何 raw LLM violation、沒有 fabrication。→ 架構對了，但 production 需要更高 quota tier 或 per-agent rate limit。G-024 候選。

誠實紀律 100% holds

對 Tier C 決定的 refinement

Writer A ↔ Writer B handoff 需要雙面條款

原本 Tier C 決定：Writer A 擔 handoff 責任（producer 側）。Writer B Robot 2 實測發現：**只因 Robot 1 test input 直接 inject hook** 才能 callback；production 環境下 Writer A/B 在 Wave 1 並行，hook 傳不過去。

Refinement（不推翻決定）: Writer A lead 不變，但 Writer B 也需要對稱的 consumer-side M bullet — slide 13 必須 consume handoff.md，不能 ignore。Wave γ Iterator 會同時加兩邊。

下一步

v5.1-gamma（Wave γ Iterator）現在立刻開跑。19 個 Robot 3 subagent 會：