19 個獨立的 Robot 2 (Dispatch Harness) 同時跑,每個 subagent:
call_with_fallback.sh wrapper(無 raw CLI)| # | Agent | Scenario PASS/FAIL | 關鍵實測發現 |
|---|---|---|---|
| 1 | Investigator A | 5/5 PASS | 誠實 N=0 under-delivery,line 236 禁用短語防線 work |
| 2 | Writer A | 3 PASS + 2 FAIL | Gap 4 LIVE BUG 確認: LLM 實際滑進教科書語氣,anti-pattern 無 detection |
| 3 | Fresh Eyes Reviewer | FE-R2 HARD FAIL | Robot 2 拒絕 fabricate,結構性矛盾實證為 blocker |
| 4 | Copy Editor | 5/5 PASS | 21/21 defect catch rate + 0 false positive(最佳結果) |
| 5 | Compliance Reviewer | 5/5 loose / 2/5 strict | 4/4 planted issues 全 CAUGHT,Gemini 中立性評分 2.6/5 < 4.0 |
| 6 | Content Director | 5/5 PASS (3 caveat) | pacing / through-line / NOT-INSTEAD 三個 Tier B 目標確認 |
| 7 | Engineer (HTML) | 10/10 PASS | 4 ambiguity(cadence window 繼承 / 16px floor / post-test fallback) |
| 8 | Engagement Designer | 5/5 PASS (3 ad hoc) | 0 LLM call,自律拒絕 prose-regeneration;SCN-ED-004 需要 paired S input-contract |
| 9 | Fact Checker | PASS behavior / FAIL spec | 🚨 順便發現 v5 課程 3 個事實錯誤:ASCE 7-22 Ch26=Wind Loads / NFPA 80 ≠ 15 lbf / CSC ≠ NMS maintainer |
| 10 | Writer B | 5/5 PASS (S5 caveat) | cross-author handoff **只因 Robot 1 直接 inject** 才過,production 會斷 |
| 11 | Commander | 5/5 behavior PASS | 🚨 META gap: get_skills_for_role.sh commander → role not found,factory 自己的知識庫有 data hole |
| 12 | Investigator B | 4 PASS + 1 PARTIAL | 4 次 wrapper 呼叫全合規,誠實 HARD COUNT REACHED AT 3;Robot 1 fixture 選錯 NFPA 80 section(meta-finding) |
| 13 | Source Reviewer | 5 PASS + 3 conditional | opinion vs empirical 邊界規則建立(第一人稱必要但不充分) |
| 14 | Sales Rep Advisor | B3 FAIL 確認 | byte-level drift 實證:line 689 vs 1044 不一致,現有 validator 沒抓到 |
| 15 | Project Architect Advisor | 5/5 PASS | Gemini 耗盡 → Claude Opus self-sim 降級(spec line 661),persona 6/6 第一人稱 |
| 16 | Learning Outcome Validator | 1 PARTIAL + 3 PASS + 1 FAIL by design | aggregate 77% < 80%,wave3_blocked 正確觸發,未下調門檻 |
| 17 | Quality Auditor | 3/3 hidden catches | 全部 3 個隱藏問題抓到,BLOCK Wave 3 verdict |
| 18 | Performance Supervisor | 5/5 PASS | skill-invocation gap audit 實測成功(gap=2 literal count) |
| 19 | Candidate Collector | 4 PASS + 1 PARTIAL | S2 byte-exact 驗證通過;S5 flag-candidate CLI 缺 3 個欄位 |
Commander 實測呼叫 get_skills_for_role.sh commander → role not found。Robot 1 以為是 spec-text gap,實際是 ogsm-framework skill 的 references data 也缺。修 v5.md 不夠,要修 data file。→ G-025 候選
Fact Checker 認真跑,意外發現 v5 打算教的內容本身是錯的:
這三個錯誤代表 v5 spec 作為指導文件沒問題,但**真的產出課程時**必須被 Fact Checker 阻擋。Wave γ 不修這些(不在 spec scope),但會強化 Fact Checker 的 anti-pattern 保護。
Investigator B Robot 2 發現:Robot 1 選的 NFPA 80 §6.1.5 實際是「Inspection of Door Assemblies」,不是 hinge-specific(§6.4.2.1 才是)。Robot 1 也會犯錯——這是 4-robot 架構的 recursive 價值:**Robot 2 也審查 Robot 1**。
Writer A Robot 2 實際觸發 call_with_fallback.sh single-model chain + 429,得到 exit 3 + empty stdout/stderr。完美對應 Robot 1 Fresh Eyes 的 G-020 預測。
Wave β 期間,Gemini 所有 tier 都在撞 429 RESOURCE_EXHAUSTED:
但**所有 subagent 都優雅降級**(spec-defined fallback 路徑 work),沒有任何 raw LLM violation、沒有 fabrication。→ 架構對了,但 production 需要更高 quota tier 或 per-agent rate limit。G-024 候選。
19 個 Robot 2,**沒有一個**捏造數字、沒有一個 fabricate source、沒有一個 self-pass。所有 under-delivery 都被誠實標註:
- Investigator A: N=0 honest under-delivery
- Investigator B: HARD COUNT REACHED AT 3 (target 4)
- Fact Checker: VERIFIED=1 (不是預期的 7),UNVERIFIED 5 NEW-03 demotion
- Fresh Eyes: 拒絕 fabricate 來滿足 ≥3 floor
- Writer A: 老實 admit 2 FAIL,不 self-pass
- Learning Outcome Validator: 77% aggregate < 80%,wave3_blocked 設定,未下調門檻
Engagement Designer 甚至**自律拒絕 call LLM**——因為 Anti-pattern #2 禁止 prose-regeneration 可能 fabricate case detail,Robot 2 選擇 deterministic constraint-satisfaction 取代。這是 anti-pattern 正確 internalize 的最佳範例。
原本 Tier C 決定:Writer A 擔 handoff 責任(producer 側)。Writer B Robot 2 實測發現:**只因 Robot 1 test input 直接 inject hook** 才能 callback;production 環境下 Writer A/B 在 Wave 1 並行,hook 傳不過去。
Refinement(不推翻決定): Writer A lead 不變,但 Writer B 也需要對稱的 consumer-side M bullet — slide 13 必須 consume handoff.md,不能 ignore。Wave γ Iterator 會同時加兩邊。
v5.1-gamma(Wave γ Iterator)現在立刻開跑。19 個 Robot 3 subagent 會:
get_skills_for_role.sh 補 commander + 其他缺 role