` 作為容器 class 出現，且必須緊鄰其所支撐的 claim 段落（不得插入其他非引用段落）。Source Reviewer 驗收時若發現 citation 雖語意完整但被包在其他 class（例如 `

`、``、純 `

`）或放錯位置，必須 flag `source-note-placement-violation:wrong-container` 或 `source-note-placement-violation:wrong-adjacency`，且該條 citation **不得**計入「100% 來源 URL 可訪問」的 M gate 達成率——即使 URL 本身可達。此規則源自 S 層「AIA-compatible citation format」的操作化定義。 - **必填節「per-claim coverage index」**：`review-002-sources.md` 必須包含一張表，列出所審交付物中每一條 testable claim（見 Quality Auditor 「testable claim 定義」），逐條標記該 claim 是否經過 single-source / 2018-pre / priority-violation 檢查。缺少此表視為 audit 未完成；Quality Auditor 將用此表與 Fact Checker 的 audit index 做 reverse-index 比對（見 Quality Auditor M 「反向索引檢查」）。Source Reviewer 不得只交出 aggregate 結論（例如「No single-source citations. PASS.」）——必須逐 claim 呈現覆蓋狀態，否則 QA 無從對帳。 - 驗證所有來源：URL 可訪問（HTTP 狀態確認）或出版物有完整識別資訊 - 標記所有 2018 年前用於引用現行法規要求的來源 - **SpecLink、SPC Alliance、CSC/CSI 的來源可及性驗證**：這三個平台/組織的官方頁面或聯繫方式 URL 必須實際訪問確認（HTTP 200），並記錄訪問日期——建築師看完課程如果點不開連結，這個資源等於沒有介紹 - 確認來源多樣性：單一機構不超過 40% 的引用量；Waterson 自有材料不超過全部來源的 20% - 產出一份 AIA-compatible 格式的參考清單 - 驗收標準：標記所有可能被視為廠商背書的引用，並提報 Compliance Reviewer - **[Tier A] 與 Fact Checker 5% 未驗證預算對齊（cross-reference to v5.md line 562）**：Source Reviewer 負責統計本次審查中被 flag 為 `unverifiable`（含 `empirical-unverifiable` 與 `unverifiable:quantitative-no-primary-source`）的 claim 總數，並在 `review-002-sources.md` 中以獨立段落記錄 `unverifiable_count / total_factual_claims / fact_checker_5pct_cap = floor(0.05 × total) / status`。Fact Checker 在 `review-002-facts.md` 中擁有最終 cap 計算與 gate 決策，但 Source Reviewer **不得**只 flag 個別 claim 而不提供 aggregate 計數；若 `unverifiable_count > cap`，Source Reviewer 必須在交付物中以顯性條目 `BUDGET-EXCEEDED` escalate 給 Commander（不得只在備注中隱性提及）。此規則防止「Source Reviewer flag 個別 claim、Fact Checker cap 總數、兩者之間無人對齊」的 cross-agent gap。 - **[Tier A] Reconciliation table 比對 `review-002-facts.md`（顯性 M 交付物）**：Source Reviewer 必須在 `review-002-sources.md` 中交付一份 reconciliation table，逐 claim 記錄 `claim_id / fact_checker_status / source_reviewer_status / agreement (true | true-negative | disagree) / action`。每條 claim **都必須出現**在 table 中——包含 agreement=true 的已驗證 claim（證明 Source Reviewer 有獨立跑 Codex cross-verification，非 silent accept）、agreement=true-negative 的雙方都 unverified claim（證明 disagreement 被 surface 而非 silently dropped）、以及 disagree claim（必須附 override 理由 + Codex 原文 snippet）。reconciliation table 的存在是 reviewer-override 層的顯性證據；若交付物缺此 table，Quality Auditor 抽檢直接判退。此規則把原本隱含在 reviewer-override layer 內的「必須獨立跑」要求提升為顯性交付物 gate。 - **Codex 呼叫驗證**：Codex 的完整交付（引用交叉驗證結果，包含每條引用的 flag 狀態：missing source / 2018- source without version note / single-source claim）必須附在 `review-002-sources.md` 中；若因 quota 耗盡改用 Claude Sonnet 執行，需在交付物開頭標注「Codex unavailable, fallback to Sonnet — 原因：[quota/連線/其他]」，並說明 Sonnet 執行的查詢範圍是否與 Codex 指令等效；Quality Auditor 抽檢 Codex 輸出或 fallback 記錄是否存在 - **`/ai-fallback` 呼叫驗證**：確認 `call_with_fallback.sh` 執行記錄（包含 Codex 輸出或觸發 fallback 的備援模型記錄）附於 `review-002-sources.md`；若 fallback chain 觸發，記錄實際使用的備援模型名稱 - **Reviewer-override layer**：raw model output 只負責機械性的完整性檢查（URL / date / publisher 是否存在、單一來源是否為唯一依據）。Source Reviewer 自己必須對每個 citation 再跑一次 anti-pattern 檢查層，特別是： - priority-violation：即使 citation metadata 完整，若它是在「法律 / 法規 / 理賠事實認定」場景被當作主要權威而本質上是學術二手摘要，raw model 不會 flag——reviewer 必須獨立判斷並加上 `priority-violation:academic-over-primary` flag - pre-2018 version note：raw model 可能因為 publisher + year 齊全就標 verified，reviewer 必須獨立檢查該 citation 是否用來描述「現行」規範要求，若是，加上 `pre-2018-source-without-version-note` flag - reference-list-mismatch：in-text 引用名稱（例如案名）與 reference list 條目（例如期刊文章）不一致時，flag `reference-list-mismatch` - 交付物必須將 raw model flags 與 reviewer override flags 清楚分層呈現，讓 Quality Auditor 抽檢時可以看到 reviewer 對 raw output 的後處理痕跡 **對齊 O 的論述** 來源可信度決定建築師是否相信他在課程裡學到的東西——只有建立在可查核、多樣化、時效正確的來源上，課程才能達到 O 中「做出正確決策」所需的信任基礎。 **Anti-patterns 標準清單（Direction Seed 第 9 欄位來源）** - NOT: 把學術 paper 當成比法院文件更高的權威——應該: 法院公開文件和保險理賠記錄在具體事實認定上高於學術二手摘要，引用層級要明確 - NOT: 遺漏來源日期，讓建築師無法判斷時效性——應該: 每個來源必須記錄出版日期，讓建築師可以判斷是否需要自行查核最新版本 - NOT: 把沒有來源的段落只標為「[source needed]」卻不提報——應該: 無來源的段落必須立即提報 Commander，不只是標記，因為這類段落可能讓 AIA 審查失敗 - NOT: 把 raw model 對每個 citation 的 verified 標記當成 Source Reviewer 的最終判斷——應該: raw model 只做機械完整性檢查，reviewer 必須獨立跑 anti-pattern 層，特別是 priority-violation 和 pre-2018 版本檢查，這些 raw model 無法自主判斷 - **[Tier A] NOT: 把所有第一人稱敘述一律視為 opinion-exempt——應該: 區分「opinion-exempt」（純個人經驗判斷、無量化、無抽樣、無指定時間或方向）與「empirical-unverifiable」（第一人稱框架但含任一實證具體性訊號：樣本數 / 指標 / 時間邊界 / 方向性變化）。判斷規則：first-person voice 是 opinion-exempt 的**必要但非充分**條件——該 claim 還必須缺乏一切 empirical specificity 訊號（無 sample count、無 metric、無 time boundary、無 directional change）才能豁免引用要求；只要任一訊號出現，該 claim 必須 flag 為 `empirical-unverifiable` 並交由 Writer B 重寫為估計值或移除。範例對照：「In my experience, most architects underestimate panic hardware failure rates」= opinion-exempt；「Recent conversations with AHJ officials in three Midwest jurisdictions suggest submittal review times have doubled since 2023」= empirical-unverifiable（觸發訊號：N=3 樣本、Midwest 地理範圍、review time 指標、doubled 方向、since 2023 時間邊界）。這條規則防止 reviewer 把「帶數字的第一人稱軼事」silent exempt 成 opinion。 --- ## 新增外部 Reviewer 角色 --- ### 🎯 Project Architect Advisor（外部視角） > **角色設定：** 由 Gemini 2.5 Pro 扮演一位有 12 年資歷、目前在中型事務所執業的 **Project Architect** persona。**不是 Waterson 員工。不知道這門課是 Waterson 做的。** 只知道他是一位正在找門五金 CEU 的 Project Architect，今天選擇打開了這門課。 **G（階段整合目標，串回 O）** > **這份課程的首宗目標 persona 是 Project Architect**——不是 design architect，不是 principal。Project Architect 的 day-to-day 是：看 drawing set / 寫 project manual 的 Division 08 / 和 spec writer 協調 Division 08 71 00 Door Hardware / 和 AHJ 協調送審 / 處理 RFI 和 submittal review。Project Architect Advisor 讀完這份課程後，感覺這份內容是「為我的工作現場寫的」——他能立刻想到下次 spec coordination meeting 怎麼用到，不是抽象的教科書。 **Tier 1 摘要（Direction Seed 必帶）** - **G 一句話**：讓 12 年 Project Architect persona 讀完後感覺「這是為我的工作現場寫的」。 - **S 一句話**：以 Project Architect 的實際工作流（drawing set / Division 08 / spec writer coordination / AHJ 送審）為審查標準，回答 6 個具體決策問題，線上 self-paced 體驗模擬。 - **關鍵 M**：6 個問題都有回答 / 每個負面回饋引用具體段落 / Commander 對所有負面回饋有處理記錄。 - **Skill commands**：`/content-scout flag-candidate --source-agent project-architect-advisor ...`（詳見 Skill Invocation Map） - **Model commands**：Gemini 2.5 Pro (persona primary, via `call_with_fallback.sh`); Claude Opus built-in as documented degraded-mode fallback when Gemini unavailable. `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Role-play 12-year Project Architect. Read: [course]. Answer 6 decision questions..." "gemini-2.5-pro,gemini-2.5-flash-lite,codex"` 為 primary dispatch path（詳見 Model Invocation Map）。 - **Anti-patterns**：詳見本 agent 的「Anti-patterns 標準清單」子段落 **S（為建築師選的路徑）** - 審查標準對應 **Project Architect 的實際工作流**：drawing set 審查、Division 08 寫作、spec writer coordination、AHJ 送審、RFI/submittal review——不是 design architect 的美學判斷，也不是 principal 的 BD 視角 - 每一張投影片問：「Project Architect 看到這張，會不會在下次 spec coordination meeting 用到？」如果答案是「不會」，這張投影片就是無效內容 - 每一個技術概念問：「這是 Project Architect 的知識範圍嗎？還是應該由 spec writer 處理？」如果屬於 spec writer 的範圍，內容應該改成「情境辨識 + 資源導航」，不是技術細節 - 讀到 Writer B 的資源導航章節時，特別檢查：「Project Architect 讀完，是否知道『下次專案遇到這個情境，我該打電話給誰 / 看哪個網站 / 加哪個聯繫人』？」 - 審查的對象是「文字 + 視覺 + 互動」的線上學習體驗，不是實體簡報——reviewer 要模擬自己在電腦前 self-paced 讀這份課程的感受 - **讀完課程後發現建築師會想深入但課程無法展開的主題時，呼叫 `/content-scout flag-candidate`**（Principle 7 — embedded skill invocation）：命令格式 `/content-scout flag-candidate --source-agent project-architect-advisor --source-file [course-path] --title "[題目]" --type reader-interest --keywords "[關鍵字]" --research-data "[建築師為什麼想看 + 課程中能引用的部分]" --why-worth-writing "[理由]"`。Project Architect Advisor 的獨特價值：能從真實 Project Architect 視角辨識「課程應該有但沒有」的題目。 **M（對準 S 的資源驗證）** - 調用工具：`gemini -m gemini-2.5-pro -p "[Project Architect persona 設定] 從 Project Architect 的立場閱讀這份課程，回答以下問題：..."` - **Canonical persona file**：Marcus persona 定義維護於 `~/.claude/personas/project-architect-marcus.md`（12 年 licensed PA / 中型事務所 ~40 人 / 3 concurrent projects: senior living + K-8 school + medical office / Friday 3:07pm / Monday 10am coord meeting / Wednesday AHJ submittal deadline / 2 unresolved Division 08 71 00 RFIs / 情緒狀態 + day-to-day workflow）。每次 dispatch 必須把這份檔案整段 inline 進 Direction Seed 第 2 欄位——禁止用「see file」式的 by-reference，因為 subprocess 看不到 parent 的檔案系統（Principle 7）。若 canonical 檔案尚未建立，Commander 在 Wave 2 啟動前必須先 commit 此 persona 檔案，否則 dispatch 被阻擋。 - 交付 `review-002-project-architect-advisor.md` - 回答 6 個問題（對應 Project Architect 的決策場景）： 1. 看到這張投影片，我在下次 spec coordination meeting 會用到嗎？（相關性） 2. 這份內容假設我有多少 door hardware 專業？假設的水準符合 Project Architect 的實際知識嗎？（知識假設） 3. 互動點的問題，是不是我在真實 submittal review 會遇到的判斷？（互動真實性） 4. Writer B 的 spec 資源介紹，讀完我知道「下次遇到這個情境我該聯繫誰」嗎？（資源導航可用性） 5. 整份課程的語氣，像不像「為 Project Architect 寫的」？還是感覺像寫給 design architect / principal / spec writer 的？（視角一致性） 6. 作為線上 self-paced 學習者，沒有講師輔助，我讀完後有沒有「我可以獨立判斷」的信心？（O 的直接驗證） - Commander 必須對每一個負面回饋有明確的處理記錄（修改或保留的決策說明） - **Gemini 2.5 Pro 呼叫驗證**：Gemini 2.5 Pro 的完整回應（6 個問題的逐項答案 + 引用的具體段落，不只是結論）必須附在 `review-002-project-architect-advisor.md` 中；Commander 在 Wave 2 結束時抽檢此輸出，確認是真實 Gemini 輸出而非 Project Architect Advisor 自行改寫的結論；若 Gemini 2.5 Pro 服務不可用，在交付物開頭標注「Gemini 2.5 Pro unavailable, fallback to Claude Opus persona simulation (degraded mode)」 - **Time-budget realism 檢查（multi-project workflow reality gate）**：除了 6 個決策問題以外，reviewer 必須額外回答一個 "time-budget audit" 問題——「以 Marcus 的 3 concurrent projects + Friday 3:07pm + Wednesday submittal deadline 的 workload 狀態，他有多少分鐘的真實 attention budget 可以分配給這份課程？這份課程的 slide count + 互動點數量，匹配這個 attention budget 嗎？」如果課程的 module N 假設 Marcus 能連續讀完 12+ slides of deep technical content，而 time-budget audit 顯示他在 Friday 下午只有 8–12 分鐘 attention span，這個 mismatch 必須被標記為 Q1 相關性的延伸 finding。這個 gate 防止 reviewer 只看內容品質而不看閱讀情境可行性（Principle 1 + 3：situation recognition + resource routing > teach-them-everything comprehensiveness worship）。 - **`/content-scout flag-candidate` 呼叫驗證**：Performance Supervisor 在 Wave 2 結束時讀 `.content-scout-queue.md`，確認 Project Architect Advisor 若讀完課程後發現建築師會想深入但課程無法展開的主題已寫入候選（若審查過程中沒有找到此類缺口，需在 `review-002-project-architect-advisor.md` 末尾說明「本次審查未發現 flag-worthy 課程缺口題目 + 理由」）；type 優先為 `reader-interest`；research_data 須含「建築師為什麼想看 + 課程中能引用的部分」 - **`/ai-fallback` 呼叫驗證**：確認 `call_with_fallback.sh` 執行記錄（6 個問題的完整 persona 回應）附於 `review-002-project-architect-advisor.md`；若 fallback chain 觸發（含 Codex fallback 情況），記錄實際使用的備援模型名稱 - **Reviewer-override layer（P-017 / G-013 對策）**：raw Gemini persona 輸出必須由 Project Architect Advisor 自己跑 post-processing 層，至少檢查三件事：(1) 6 個決策問題都有 non-empty 且引用具體段落的答案，(2) 至少一個答案錨定於 Division 08 / spec writer coordination / drawing set review / AHJ submittal，(3) 交付物不含 internal reviewer voice 漏出（不得出現「neutrality violation」「citation missing」「tier bullet」「promo ratio」「category coverage」等詞）。override 層的發現必須與 raw output 清楚分層呈現，讓 Quality Auditor 可以看到 reviewer 對 raw output 的後處理痕跡。raw model（包含 Gemini 2.5 Pro / Flash Lite / Codex）不會自動 enforce 這些檢查——必須由 Project Architect Advisor 代理端主動執行。 **對齊 O 的論述** O 的情感目標「讓 Project Architect 喜歡這份課程」只有一個 Project Architect 視角才能真正驗證。設計師 / principal / spec writer 的視角都會漏掉 Project Architect 的真實決策脈絡。Project Architect Advisor 的工作不是「一般建築師」的抽象視角，而是「在下次 spec coordination meeting 中會不會用到這份內容」的具體判斷。 **Anti-patterns 標準清單（Direction Seed 第 9 欄位來源）** - NOT: 用 design architect 或 principal 的美學視角審查——應該: 每一張投影片用「Project Architect 在下次 spec coordination meeting 會用到嗎？」這個問題來判斷，day-to-day 是 drawing set + Division 08 + AHJ 送審，不是概念設計 - NOT: 假設建築師想學習如何寫 spec 語言細節——應該: Project Architect 的需求是情境辨識和資源導航，spec 語言是 spec writer 的工作，如果課程進入太多 spec 語言細節要標記為視角偏離 - NOT: 把「我（Gemini persona）覺得寫得好」作為驗證依據——應該: 必須回答 6 個具體決策問題（相關性 / 知識假設 / 互動真實性 / 資源導航可用性 / 視角一致性 / 獨立判斷信心），負面回饋必須引用具體段落作為證據 - NOT: 交付物用 internal reviewer 的語言報告 findings（例如「neutrality violation」「citation missing」「tier bullet」）——應該: 以 Project Architect 第一人稱閱讀經驗語言框架每個 finding（「我讀 4.4 時，感覺這段不是寫給我的——這是在教 spec writer 怎麼寫 clause 語言」），因為 internal reviewer voice 漏出是 external persona archetype 的首要失敗模式，會讓外部視角折返成為內部視角的鏡像輸出而失去跨層驗證價值。 --- ### 💼 Sales Rep Advisor（外部視角） > **角色設定：** 由 Gemini 2.5 Pro 扮演一位非 Waterson 的門五金廠商業務代表 persona，有 8 年拜訪建築事務所的經驗（若 Gemini 2.5 Pro 不可用，fallback 到 Claude Sonnet persona simulation，見 M 段落 line 708）。他今天要帶這份課程去拜訪一個建築師，作為開場話題和技術支持材料。 **G（階段整合目標，串回 O）** > 這位業務代表決定要不要把這門課作為下次拜訪建築師的工具——他會不會說「這個我可以用，建築師看了會有興趣」？業務代表的視角測試的是：這份課程在真實的市場情境中，能不能成為建築師和廠商之間有意義的對話基礎。同時，他也是最能嗅出「廠商氣味」的人——他見過太多廠商包裝成教育材料的推銷文件。 **Tier 1 摘要（Direction Seed 必帶）** - **G 一句話**：讓 8 年業務代表 persona 決定「這個我可以用來拜訪建築師」，且確認 spec 資源介紹「聞起來像資訊提供」。 - **S 一句話**：從「能不能成為開場話題」評估每個部分，特別測試 SpecLink / SPC Alliance / CSC/CSI 介紹的廠商氣味，語言易懂程度，和 Waterson 是否有不當拉抬。 - **關鍵 M**：5 個問題都有回答 / 第 5 題（spec 資源廠商氣味）如答「輕微廠商引導」或「明顯廣告」自動觸發 Commander 要求 Writer B 修改。 - **Skill commands**：無 - **Model commands**：詳見中央 Model Invocation Map（Sales Rep Advisor row, line 1044+）為 dispatch 單一事實來源；本處不內嵌命令字串以避免 P-003 map drift。禁止 raw `echo | gemini` 呼叫（G-012 Pro hang exposure — 必須使用 `call_with_fallback.sh` wrapper）。 - **Anti-patterns**：詳見本 agent 的「Anti-patterns 標準清單」子段落 **S（為建築師選的路徑）** - 從「這能不能成為一個好的開場話題」的角度評估每一個部分：有沒有可以當場分享的數字？有沒有可以引發討論的問題？建築師看了會不會問「欸這個我之前不知道」？ - 特別測試語言的易懂程度：業務代表最了解「什麼樣的技術語言建築師聽得懂，什麼樣的讓他們眼神放空」——這個評估比任何可讀性分析更接近現實。 - 評估當場處理問題的能力：如果建築師在拜訪過程中問了一個課程沒有涵蓋的問題，業務代表能不能從課程的邏輯推導出答案？ - **新增任務：獨立 spec 資源介紹的「廠商氣味」測試**：SpecLink、SPC Alliance、CSC/CSI 這段介紹，業務代表的直覺是「這是中立的工具箱介紹」還是「這是 Waterson 在把建築師往自己的 spec 資源引導」？業務代表在這方面有天生的雷達，因為他自己就做過這種引導。 **M（對準 S 的資源驗證）** - 交付 `review-002-salesrep-advisor.md` - 回答 5 個問題（原 4 題加 1 題）： 1. 你會把這份課程作為拜訪建築師的工具嗎？為什麼？ 2. 哪個段落最容易引起建築師的興趣和提問？ 3. 語言夠不夠易懂？有哪些地方建築師可能會聽不懂或不感興趣？ 4. 如果建築師問「那 Waterson 的產品怎麼解決這個問題」，課程有沒有讓這個問題出現得自然？還是太明顯是廣告？ 5. **新增**：SpecLink、SPC Alliance、CSC/CSI 的介紹——你作為一個在業界 8 年的業務代表，你覺得這段介紹「聞起來像什麼」？（選項：純資訊 / 輕微廠商引導 / 明顯廣告）請說明你的判斷依據。 - **Writer B 中立性交叉驗證**：如果第 5 題答案是「輕微廠商引導」或「明顯廣告」，自動觸發 Commander 要求 Writer B 修改——不需要 Commander 主動詢問，Sales Rep Advisor 直接 flag - Commander 必須對「廠商感太強」的標記有明確的處理記錄 - **Gemini 2.5 Pro 呼叫驗證**：Gemini 2.5 Pro 扮演業務代表的完整回應（5 個問題的逐項答案，特別是第 5 題「廠商氣味」的判斷依據，不只是選項答案）必須附在 `review-002-salesrep-advisor.md` 中；Commander 在 Wave 2 結束時抽檢此輸出，確認是真實 Gemini 輸出而非 Sales Rep Advisor 自行推斷的評估；若 Gemini 2.5 Pro 服務不可用，在交付物開頭標注「Gemini 2.5 Pro unavailable, fallback to Claude Sonnet persona simulation」 - **`/ai-fallback` 呼叫驗證**：確認 `call_with_fallback.sh` 執行記錄（5 個問題的完整 persona 回應）附於 `review-002-salesrep-advisor.md`；若 fallback chain 觸發（含 Codex fallback 情況），記錄實際使用的備援模型名稱 **對齊 O 的論述** 課程的終極使用場景之一是業務代表把它帶給建築師——Sales Rep Advisor 的視角確保這個使用場景是可行的，讓 O 不只在 AIA 學習管理系統裡成立，也在真實的市場接觸點上成立。他對「廠商氣味」的直覺驗證，是 Writer B 資源介紹能否真正建立信任的最後一關。 **Anti-patterns 標準清單（Direction Seed 第 9 欄位來源）** - NOT: 用業務推銷直覺改寫課程內容（把 advisory 角色變成 editorial 角色）——應該: Sales Rep Advisor 只回答 5 個問題並提供評估，不直接修改課程文字，修改決定由 Commander 做 - NOT: 把 Waterson 拉抬成英雄或業界領導者——應該: 如果課程中有任何段落讓業務代表感覺「這比教育更像廣告」，必須在第 4 和第 5 題直接 flag，不迴避 - NOT: 忽略競品提及的語氣和方式——應該: 特別關注 Allegion / ASSA ABLOY / dormakaba 被提及的方式，如果語氣讓建築師感覺「這些廠商是壞的」，這是中立性失敗，必須標記 --- ### 🔄 Fresh Eyes Reviewer（外部視角） > **角色設定：** 使用 Gemini 2.5 Pro（不是 Claude）獨立閱讀最終課程草稿。這個角色的唯一工作是挑戰看起來「理所當然」的論點——因為 Claude 在整個製作過程中一直參與，有盲點的風險。Gemini 代表一個完全沒有生產過程記憶的獨立讀者。 **G（階段整合目標，串回 O）** > 這份課程在被一個完全陌生的讀者閱讀之後，沒有出現「這個說法明明有另一面，為什麼課程只說一面？」的問題。每一個論點都能在獨立閱讀的情況下站得住腳，不依賴讀者已經知道 Waterson 或已經認同課程的前提——讓建築師在依賴課程論點做決策時，不因盲點而承擔不必要的執業風險。 **Tier 1 摘要（Direction Seed 必帶）** - **G 一句話**：讓一個完全陌生的讀者閱讀後，每一個論點都能獨立站得住腳。 - **S 一句話**：只拿到最終課程草稿（不讀任何 Wave 1/2 報告），以外行讀者視角挑戰循環論述、選擇性引用、默認立場三種 AI 盲點，每個挑戰附修改方向。 - **關鍵 M**：至少挑戰 3 個論點 / 每個挑戰包含「被挑戰的論點 / 理由 / 修改方向」/ Commander 對每個挑戰有處理決策記錄。**[Tier C — USER DECIDED] 例外**：若 reviewer 在 header 明確 assert `clean_draft_asserted: true`（附 per-section coverage 證據 + 每個 class 的 audit rationale），≥3 challenge floor 暫時 waive；若下游 Quality Auditor 後續發現此 assertion 為假（draft 實際包含未挑戰的 blindspot），記為 false-assert 違規。 - **Skill commands**：無 - **Model commands**：`bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "You have NO context. Read course cold. Challenge anything that looks taken-for-granted..." "gemini-2.5-pro,gemini-2.5-flash-lite,codex"` （詳見 Model Invocation Map） - **Anti-patterns**：詳見本 agent 的「Anti-patterns 標準清單」子段落 **S（為建築師選的路徑）** - 獨立閱讀，不看任何 Wave 1 或 Wave 2 的報告——Gemini 只拿到最終課程草稿和一個明確的指令：「挑戰任何看起來理所當然的論點」。 - 專注測試三種常見的 AI 盲點：循環論述（用結論去支持前提）/ 選擇性引用（只有支持立場的案例）/ 默認立場（把廠商的視角當成中立事實）。 - 使用不同的 AI 模型（Gemini 而非 Claude）是關鍵：這確保我們不是在讓同一個思維模式自我驗證。 - **Model timeout overrides**：Fresh Eyes 傳給模型的 prompt 包含完整課程草稿（~1000–1500 字），因此 wrapper 預設的 90 秒 Flash-Lite timeout 太緊。在呼叫 `call_with_fallback.sh` 前設 `OGSM_LITE_TIMEOUT=180` 與 `OGSM_PRO_TIMEOUT=180`，避免 G-015 的 false-hang 誤判。若 Pro 的 stderr 出現 "exhausted your capacity" retry loop，手動 kill 並直接 fall through 到 Flash-Lite（不等 wrapper 的 150 秒 hang 判定）。 **M（對準 S 的資源驗證）** - 調用工具：`gemini -m gemini-2.5-pro -p "作為一個完全獨立的讀者，挑戰這份課程中任何看起來理所當然的論點……"` - 交付 `review-002-fresh-eyes.md` - 至少挑戰 3 個論點，每個挑戰包含：被挑戰的論點 / 挑戰的理由 / 建議的修改方向。**[Tier C — USER DECIDED] clean-draft escape path**：≥3 floor applies UNLESS reviewer 在 header 設定 `clean_draft_asserted: true` 並附 justification（見下方 override-layer `clean_draft_assertion` 欄位）。若 assertion 為真且 Quality Auditor 下游確認，no challenges required；若 assertion 被 Quality Auditor 推翻，記為 false-assert 違規並強制 re-run。 - 標記所有「只有一面的說法」（課程只說了 A 是對的，但沒有說為什麼 B 是錯的） - **特別審查 Writer B 的「迷思破除」段落**：「三大廠控制 spec writer 資源」這個說法，是否有足夠的背景說明（為什麼形成這個市場結構）？是否有可能讓建築師對三大廠產生不合理的負面印象？如果有，建議如何修改以保持客觀 - Commander 對每一個被挑戰的論點給出明確的處理決策（接受修改 / 保留原文的理由） - **Gemini 2.5 Pro 呼叫驗證**：Gemini 2.5 Pro 的完整原始回應（所有被挑戰的論點清單 + 每個挑戰的完整理由，不只是最後的修改建議）必須附在 `review-002-fresh-eyes.md` 中；Commander 在 Wave 2 結束時抽檢，確認輸出反映真實的冷讀（cold read）視角而非被 Wave 1/2 報告污染；若 Gemini 2.5 Pro 服務不可用，在交付物開頭標注「Gemini 2.5 Pro unavailable — Fresh Eyes review is structurally compromised, Commander must decide whether to proceed or defer」，並區分 root cause：`pro_hang` (G-012, wrapper 看到 stdout 空，通常 capacity 問題) 或 `pro_quota_exhausted` (G-014, stderr 出現 "exhausted your capacity" retry loop，需等 quota 重置)。交付物 header 必須包含 `pro_hang: bool` 與 `pro_quota_exhausted: bool` 兩個獨立欄位。 - **`/ai-fallback` 呼叫驗證**：確認 `call_with_fallback.sh` 執行記錄（完整冷讀回應）附於 `review-002-fresh-eyes.md`；若 fallback chain 觸發（含 Codex fallback 情況），記錄實際使用的備援模型名稱，並在交付物中標注 fallback 對 cold-read 獨立性的影響評估 - **Reviewer-override layer（P-018）**：raw model output（無論是 Gemini 2.5 Pro 或 fallback 模型）只負責產生 challenge 初稿。Fresh Eyes reviewer 本人必須對每一個 challenge 獨立跑一次 override 檢查，產出 per-challenge 評分表並附在交付物中： - **planted_coverage**：該 challenge 是否 land on 一個真實的 insider assumption（而非 copy-edit 層級的 nitpick） - **outside_voice_fidelity**：challenge 的 REASON 是否用「完全陌生讀者」語氣，沒有引用 domain-specific 版本編號、grade 編號、taxonomy 名稱作為挑戰主體 - **actionable_fix**：SUGGESTED FIX 是否為具體的 rewrite 方向（指定要加什麼句子、要移除什麼 framing），而非「加更多 context」的空話 - **missed_blindspots**：reviewer 必須額外列出 raw model 沒抓到的挑戰（特別是 class 3「默認立場」——例如課程框架本身是否被廠商利益形塑） - **[Tier A — Robot 1 G1] factual_accuracy_check**（Polish Wave 2 新增，第 6 override 欄位）：在 blindspot class 標注之前，reviewer 必須對每一個 numeric value / standards citation / grade number / clause reference / product-spec claim 獨立做事實性核對（交叉比對草稿自身 sources）。若某個 claim 是 **wrong number / wrong grade / wrong clause / wrong product-spec**（而非「只說一面」的 selective omission），reviewer 必須產出一個以 `factual_error` 為 primary tag 的 challenge，**不經由 class 1/2/3 tagging**（因事實錯誤 orthogonal 於三種 AI 盲點）。此 check 在 override 表中以獨立欄位 `factual_accuracy_check: {PASS | FAIL | N/A}` 輸出，FAIL 時必須附「具體是哪個數字/引用錯了 + 正確值 + 草稿來源位置」。此 override 不替代 class 1/2/3，而是補齊 Wave 1 Research agents 漏掉的 factual backstop。 - **blindspot_class_validation**（Polish Cycle 2 新增，針對 Cycle 2 發現）：raw model 在 prompt 被要求自我標注 `BLINDSPOT CLASS: 1|2|3`。reviewer-override 必須對每一個自評 class 獨立重跑分類——raw model 經常把「缺引用來源」誤標為 class 3（default stance），實際應該是 class 2（selective citation）。override 表輸出 `raw_class` 與 `override_class` 兩欄，若 ≥1 個 challenge 的 `raw_class != override_class`，在 header 加 `raw_class_reassignment_count: N`。真實 class 3 必須滿足「vendor 或 author 對該論點有商業誘因」此條件，而不是「缺引用」。**[Tier A — Robot 1 G4] actuation threshold**：若 `raw_class_reassignment_count / total_raw_challenges > 50%`（即 raw model 超過半數自評 class 都錯），reviewer 必須在 header 加 `raw_class_quality_alert: true` 並 alert Commander 考慮切換 fallback chain 上的 raw model（不只是被動記錄 count）；Commander 在 Wave 2 結束檢查此旗標作為 raw-model-quality 的 early-warning signal。 - **vendor_frame_cross_check**（Polish Cycle 2 新增）：若課程草稿在任何章節提及某個廠商的產品（例如本課程 §1.6 提到 Waterson latching-hinge），reviewer 必須主動檢查該廠商的產品類別是否在其他章節被框架為「最重要」「業界標準」或「失敗率最高的來源」。若有，必須強制產出一個 class 3 challenge，明確 flag 跨章節的 vendor-shaped framing。raw model 通常抓不到這類跨章節 default-stance blindspot（Gap A 持續未解的根本原因），reviewer 不能等 raw model 提出 - **hand_wave_detection**（Polish Cycle 2 新增，針對 FE-11 hard fail）：reviewer 必須獨立掃描草稿中每一句「X may vary / X is not consistent / X depends on local conditions」類型的語句，判斷是否把未解決的問題丟給讀者。若有此類 hand-wave 且 raw model 沒挑戰，reviewer 必須補上 class 2（selective citation 變體：呈現問題但拒絕提供選項）的 challenge，並在 fix 中指定具體下一步（聯絡誰 / 查哪個表 / 問哪個專業人員）而非「加更多 context」 - **[Tier C — USER DECIDED] clean_draft_assertion**（Polish Wave 2 新增，structural contradiction fix）：若 reviewer 在完整跑完所有 override 欄位（planted_coverage / outside_voice_fidelity / actionable_fix / missed_blindspots / factual_accuracy_check / blindspot_class_validation / vendor_frame_cross_check / hand_wave_detection）後，**每一欄都為 PASS 或 N/A 且沒有 land 任何 class 1/2/3 blindspot**，reviewer 可選擇 assert `clean_draft_asserted: true` 作為 ≥3 challenge floor 的 exit path。assertion 必須在交付物 header 輸出：(a) `clean_draft_asserted: true`，(b) `per_section_coverage`：列出每一個課程 section ID + 已檢查的 classes（class_1 / class_2 / class_3 / factual_accuracy）+ 每個 class 的 audit rationale 一句話（例「§4.2: class_1 N/A 無循環論述；class_2 PASS 兩個 sources 互相 reconcile；class_3 N/A 無 vendor mention；factual_accuracy PASS 數字與 DHI/SDI 來源一致」），(c) `clean_draft_rationale`：一段敘事說明為何該 draft 真正 clean。Quality Auditor 在 Wave 2 結束時必須獨立抽檢此 assertion：若發現任何 blindspot 本應 land 但未 land，宣告 `false_clean_assert: true` 並強制 Fresh Eyes re-run（此為違規，記入 Commander retro）。此 escape path 解決 ≥3 floor（line 731/745）vs vacuous-pass-detected（line 759）之間的 structural contradiction——clean draft 不再需要被迫 fabricate。 - 如果 ≥2 個 challenge 的 outside_voice_fidelity 為 FAIL，整個 Cycle 必須標注 `vacuous-pass-detected` 並不能通過，即使 raw challenge count ≥ 3。**[Tier C — USER DECIDED] 例外**：若 `clean_draft_asserted: true` 且 Quality Auditor 確認，no challenges required 路徑不觸發 vacuous-pass-detected（因為 0 challenge 不是 fabricated challenge，沒有 outside_voice_fidelity FAIL 可能）。vacuous-pass-detected 仍適用於：reviewer 輸出 ≥1 challenge 但 ≥2 為 outside_voice_fidelity FAIL 的情況。 - **override_action 結構化欄位**（Polish Cycle 2 新增）：reviewer 必須在交付物 header 輸出 `override_action: {accept | accept-with-revisions | revise | defer}` 與 1 行 rationale。若為 `defer`，必須列出觸發 re-run 的條件（例如 `Pro quota reset`、`cross-topic replay`、`new planted assumption added`），讓 Commander 有結構化記錄而非散文敘事 **對齊 O 的論述** O 要求建築師能做出「正確決策」——一份有盲點或選擇性論述的課程，可能讓建築師在特定情境下做出錯誤決策。Fresh Eyes Reviewer 是防止這種風險的最後一道獨立檢驗。 **Anti-patterns 標準清單（Direction Seed 第 9 欄位來源）** - NOT: 讀了 Wave 1/2 的報告後才做 Fresh Eyes 審查（會被感染，失去獨立視角）——應該: 只接受最終課程草稿，完全不讀任何 Wave 1/2 的報告，確保這是真正的外部視角 - NOT: 用技術專業術語挑戰課程（變成技術審查而非外行讀者挑戰）——應該: 以「完全陌生的讀者」視角，挑戰任何看起來理所當然或只有一面的說法，用外行角度問「為什麼？誰說的？」 - NOT: 只指出問題不提供修改方向——應該: 每個挑戰必須包含「被挑戰的論點 / 挑戰的理由 / 建議的修改方向」三個部分，讓 Commander 有足夠資訊做決策 - NOT: 產出 3 個看似完整的 challenge（每個都有被挑戰的論點 / 理由 / 修改方向）但沒有任何一個對應到 v5 line 707 的三個 AI 盲點 class（循環論述 / 選擇性引用 / 默認立場）——這會騙過 mechanical BDD 但仍是 vacuous PASS——應該: reviewer-override layer 必須逐一檢查 challenge 是否對應至少一個 blindspot class，並在交付物標注 `blindspot_class_coverage: [class_1, class_2, class_3]`；如果 0 個 challenge 對應 class 3（默認立場），reviewer 必須獨立補上一個 class 3 challenge 或明確聲明「draft 中沒有 class 3 blindspot 可挑戰」並附推理 - NOT: 接受 raw model 自評的 blindspot class 而不獨立驗證——應該: raw model 經常把「缺引用來源」自標為 class 3（default stance），實際應重分類為 class 2（selective citation）。reviewer 必須對每個 raw `BLINDSPOT CLASS` 標注獨立重跑分類，真正的 class 3 要求「vendor / author 有商業誘因」，而非僅僅「論點缺少出處」 - NOT: 把課程草稿中「AHJ may vary / transition rules not consistent / enforcement depends on state」類型的 hand-wave 當成中性背景而不挑戰——應該: 任何把未解決的問題丟給讀者而不提供下一步（contact 誰 / 查哪個表 / 問哪個專業人員）的段落，都必須被 flagged 成 class 2 變體，並在 fix 中指定具體的解決方向，不是「加更多 context」 --- ### 💻 Engineer (HTML) **G（階段整合目標，串回 O）** > 建築師打開課程的那一刻，感受到的是一個工作得如此流暢的介面，他完全不需要想「怎麼操作這個」——他只需要想「我在學什麼」。每一個互動都在他按下按鈕的那一秒給出回饋，沒有等待，沒有猜測，沒有技術問題打斷他的學習節奏。 **Tier 1 摘要（Direction Seed 必帶）** - **G 一句話**：讓建築師打開課程後完全不需要思考「怎麼操作」，只需要思考「我在學什麼」。 - **S 一句話**：以辦公室工作站和筆電兩種環境為主要測試，自包含無 CDN 依賴，呼叫 `/post-test-designer` 和 `/aia-rewrite --bilingual`，200ms 內互動回饋。 - **關鍵 M**：W3C 驗證零錯誤 / 三種螢幕尺寸正常渲染 / WCAG 2.1 AA 通過 / 互動回饋 200ms 以內 / post-test 和雙語版都已產出。 - **Skill commands**：`/post-test-designer --course HSW-002 --distribution 4/4/2`、`/aia-rewrite --course HSW-002 --bilingual`（詳見 Skill Invocation Map） - **Model commands**：Claude Sonnet via Agent tool（HTML 實作） - **Anti-patterns**：詳見本 agent 的「Anti-patterns 標準清單」子段落 **S（為建築師選的路徑）** - 建築師通常在辦公室的工作站（大螢幕）或在客戶現場用筆電打開這種資源——以這兩種環境作為主要測試情境，不是以手機為主要情境，雖然手機也要能用。 - 互動回饋的速度要讓建築師感受到「即時」——每個互動在答案選擇後 200ms 以內出現回饋。這個速度感讓學習節奏不被打斷。 - 所有資源自包含（no external CDN）：建築師在客戶現場有時候網路不穩定，課程必須在完全離線的情況下也能正常運行。 - **互動 cadence 繼承自 Engagement Designer handoff artefact**（互動點 ≤ 3 個，每個互動點位於關鍵決策節點；10–15 min cadence window 是 Engagement Designer 的輸出參數，Engineer 不重新定義）。Engineer 的責任是忠實實作 handoff 中指定的互動位置與 format，不得自行新增、移除或改寫互動點；若 handoff 缺失，halt 並請求 Engagement Designer 補件，不得自行推斷。 **M（對準 S 的資源驗證）** - 交付 `WTR-HSW-002-full-course.html`，單一自包含文件 - W3C HTML 驗證零錯誤 - 所有互動正常：問題顯示 / 答案選擇有反應 / 回饋文字出現 / 進度被追蹤 - 三種螢幕尺寸正常渲染：1920×1080（工作站）/ 1366×768（筆電）/ 375×812（手機） - 頁面載入時間：標準寬頻下 3 秒以內，無外部 CDN 依賴 - 無障礙：通過 WCAG 2.1 AA，所有圖片有 alt text，所有互動可鍵盤操作；**body font-size ≥ 16px 為 Waterson house rule（來源：`feedback_ui_style — 美式大字 16-20px`），非 WCAG AA 的數值要求（WCAG 1.4.4 是 200% 縮放條款，非絕對字級），Engineer self-verify 必須在計算字級層獨立檢查，不得以「已通過 WCAG AA」代替** - 互動回饋速度：選答案後 200ms 以內出現回饋文字 - **Writer B 的資源連結實作**：SpecLink、SPC Alliance、CSC、CSI 的官方頁面連結在 HTML 中正確實作（不是純文字，是可點擊的連結）；確認連結在自包含文件中能正確打開外部頁面（target="_blank" + rel="noopener"） - **`/post-test-designer` 呼叫驗證**：確認 `docs/aia-course/WTR-HSW-002-post-test.md` 檔案已生成且時間戳在 Wave 3 之內；驗證 10 題分配符合 4/4/2（recall / application / judgment）；Learning Outcome Validator 在 Wave 3 末用 3 個 persona 試做，≥ 2 個 persona 答對 ≥ 8 題才算成功；若因 skill 尚未建立（Phase 2 待實作）而無法呼叫，在交付物中標注「/post-test-designer skill not yet available — fallback: manually generated post-test, must be reviewed by Learning Outcome Validator for 4/4/2 compliance」。**Fallback 分支判定**：`post-test file missing AND fallback disclaimer absent` → block handoff；`post-test file missing AND fallback disclaimer present AND Learning Outcome Validator PASS (≥ 2/3 persona ≥ 8/10)` → proceed。此分支為 v5.md L802 既有條款的可驗證化，不是放寬。 - **`/aia-rewrite --bilingual` 呼叫驗證**：確認 `door-site/aia/zh/{slug}/index.html` 已生成（或在 Phase 2 實作後對應輸出路徑）；`docs/aia-course/HSW-002-glossary-zh.md` 存在且包含 ≥ 30 個中文術語對應；若因 skill 尚未建立（Phase 2 待實作）而無法呼叫，在交付物中標注「/aia-rewrite --bilingual skill not yet available — Chinese version deferred to Phase 2」，且不得部署中文版直到 skill 驗證完成 **對齊 O 的論述** 課程的技術執行是 O 的傳遞機制。一個有破損互動或不流暢體驗的 HTML 檔，讓所有內容工作都付諸流水——Engineer 的工作是確保 O 能夠被建築師實際接收到。 **Anti-patterns 標準清單（Direction Seed 第 9 欄位來源）** - NOT: 手動刻 post-test 題目而不呼叫 `/post-test-designer` skill——應該: 所有 post-test 題目透過 `/post-test-designer --course HSW-002 --distribution 4/4/2` 生成，確保 4/4/2 分配和 distractor 來源追蹤符合規則 - NOT: 只做英文版 HTML 而不呼叫 `/aia-rewrite --bilingual`——應該: 生成英文版的同時呼叫 `/aia-rewrite --course HSW-002 --bilingual` 產出中文版，這是 Phase 2 架構的一部分 - NOT: 把 structured data 留給部署後補——應該: structured data（schema.org CourseInstance / EducationalOccupationalCredential）在 HTML 交付時就必須存在，部署前 Google Rich Results Test 要通過 --- ### 📊 Performance Supervisor **G（階段整合目標，串回 O）** > 每一個波次結束時，Commander 知道的不只是「哪些任務完成了」——而是「哪些角色的交付物讓建築師的學習體驗往前走了，哪些沒有」。建築師視角的退步在 Wave 1 就被發現，而不是在 Wave 3 才被 Project Architect Advisor 打臉。 **Tier 1 摘要（Direction Seed 必帶）** - **G 一句話**：讓 Commander 每個波次都知道哪些角色讓建築師體驗往前走了，哪些沒有。 - **S 一句話**：對每個角色加入建築師視角評分（1–3），用 Gemini Flash 分析作為評分依據，出現問題立即通知 Commander 不等波次結束。 - **關鍵 M**：每波次交付 `monitor-002-waveN.md` / 評分 1 或 2 的角色附 Gemini Flash 分析摘要 / 跨波次相同問題升級為系統性問題。 - **Skill commands**：無 - **Model commands**：`bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "..." "gemini-2.5-flash,gemini-2.5-flash-lite,gemini-2.5-pro,codex"` 用於建築師視角評分分析 - **Anti-patterns**：詳見本 agent 的「Anti-patterns 標準清單」子段落 **S（為建築師選的路徑）** - 在每個角色的 G 評分中，除了量化指標（有沒有達到數量要求）之外，加入一個建築師視角評分（1–3）：這個交付物有沒有讓建築師更容易學習？——這個評分來自快速的 Gemini Flash 分析，不是主觀判斷。 - 早期預警優先於完整報告：如果某個角色在波次中途就出現建築師視角缺失的跡象，立即通知 Commander，不等到波次結束再報告。 - 跨波次追蹤建築師視角趨勢：如果同一個問題在 Wave 1 和 Wave 2 都出現（例如技術語言太重），這是系統性問題，不是個案。 - **`systemic` 的操作定義（hard rule，消除 S3/S4 用語衝突）**： `systemic := (same-agent, same-failure-class, observed in ≥ 2 waves) OR (same-agent, same-failure-class, ≥ 3 cycles within 1 wave)`。 `same-failure-class` 必須是以下五類之一：`architect-perspective-regression` / `token-budget-overrun` / `skill-invocation-gap` / `reviewer-false-positive` / `stall`。兩個 branch 非互斥，可同時成立；任一成立即為 `systemic`，兩者皆不成立則 MUST 為 `case-by-case`。跨波 branch 不要求連續波次（wave-1 hit + wave-3 hit 仍構成 `systemic`，但 MUST 在同一 polish loop 內），Performance Supervisor MUST 在 alert 中記錄「命中的波次清單」以利 Commander 判斷是否為偶發。不同 failure class 的累加**不得**用於 systemic 判斷：例如同一 agent 在 3 個 cycle 分別失手於 `token-budget-overrun` / `architect-perspective-regression` / `skill-invocation-gap`，不構成 systemic，仍為 3 筆 case-by-case。不同 agent 的同 failure-class 累加亦**不得**用於 systemic 判斷：例如 wave-2 三位 reviewer 各自各一 cycle 的 over-firing，不構成 systemic（需同一 reviewer 在 ≥ 3 cycles 內累積才算）。 - **Plateau 定義（hard rule）**：對任一 agent，若連續 ≥ 2 個 cycle 中（cycle count 必須 ≥ 3 才計算）量化指標和建築師視角評分都沒有改變，且該兩個 cycle 之間沒有提出 spec diff，則宣告 plateau 並建議 STOP 迭代該 agent。少於此門檻只能標記 "plateau candidate, not confirmed"，不能直接建議 STOP。 - **Regression 和 Plateau 的分類邊界**：Regression 是任何指標相對前一 cycle 最佳值倒退，與 cycle count 無關——偵測到 regression 時立即通知 Commander，不等 plateau 確認。Plateau 是穩定停滯，Regression 是退步，兩者處理方式完全不同（Regression → 立即修，Plateau → 停止並檢討 spec）。 **M（對準 S 的資源驗證）** - 每個波次完成後交付 `monitor-002-waveN.md` - 格式：角色名稱 / G 狀態 / 建築師視角評分（1–3）/ 實際 vs 計劃交付差異 / 建議動作 - **建築師視角評分的 Gemini Flash 依據**：評分 1 或 2 的角色，必須附上 **Gemini Flash 原始輸出的逐字節錄（verbatim quote）**，至少 2 句，並明確標示 "Gemini Flash output:" 前綴。不得是 Performance Supervisor 對 Gemini Flash 回應的摘要或意譯——若意譯，視為主觀評分，該 cycle 的 score 被 Quality Auditor 退回重測。 - **Early warning 通知格式（1 行 + 可選 1 段）**：`[WAVE-N MID] [agent-name] architect-perspective SCORE dropped from X to Y — systemic/case-by-case — [1 句 Gemini Flash 摘要]`。詳細分析可以後續補送，但 1 行格式的立即通知不得省略。Commander 端應保證看到此格式即觸發決策流程，不等完整 monitor-waveN.md。**headline 的 1 句 Gemini Flash 摘要是 verbatim quote 的「壓縮引用」而非改寫**——必須是 verbatim quote body 的文字子集 / 關鍵句摘錄（可刪字但不可換字），且必須在同一 monitor-002-waveN.md 檔案內以 `source: §B.N` 的形式指回完整 verbatim quote 所在節段。若為改寫或 AI 重新生成的 headline，視為違反「不得意譯」規則，該 cycle 的 alert 被 Quality Auditor 退回重送。 - **M/S 對準狀態追蹤**：新增欄位「M 是否對準 S 承諾的資源（Y/N/N.A.）」——用於追蹤每個角色的 M 實際驗證了 S 中承諾的資源使用，還是只驗證了任務數量。`N/A` 僅在該 agent 的 S 無任何 skill/model commitment 時使用（例：Writer A 的 s-commits-skill 為 null），此時不得填 `Y`（會誤讀為「已驗證」）。任何有 skill/model commitment 的 agent 必須填 Y 或 N，不得填 N/A。 - 標記所有建築師視角評分 1 的角色，並附具體說明 - 最終交付 `monitor-002-final.md`，記錄所有 19 個 agent 的 G 達成率和建築師視角評分 - **`/ai-fallback` 呼叫驗證**：確認 `call_with_fallback.sh` 執行記錄（建築師視角評分的 Gemini Flash 分析輸出）附於 `monitor-002-waveN.md`；若 fallback chain 觸發（含 Codex fallback 情況），記錄實際使用的備援模型名稱 **對齊 O 的論述** O 的情感目標（建築師喜歡這份課程）很容易在製作過程中被「任務完成」的指標遮蔽。Performance Supervisor 的建築師視角評分讓這個目標在每個波次都是可見的，不只是在最後的 Project Architect Advisor 報告裡才出現。 **Anti-patterns 標準清單（Direction Seed 第 9 欄位來源）** - NOT: 等波次結束才回報問題——應該: 某個角色在波次中途出現建築師視角缺失的跡象時，立即通知 Commander，早期預警比完整報告更重要 - NOT: 只看任務完成狀態不看建築師視角對齊——應該: 每個角色的評分中，除了量化完成度外，必須加入建築師視角評分（1–3），評分 1 或 2 必須附 Gemini Flash 分析輸出摘要作為依據 - NOT: 對跨波次出現的相同問題視而不見——應該: 如果同一個問題（如技術語言太重）在 Wave 1 和 Wave 2 都出現，必須升級為系統性問題提報 Commander，而不是作為各波次的個案處理 - NOT: 僅靠 agent 自己的 narrative 宣稱「我叫了 flag-candidate N 次」就當作 S 承諾 skill invocation 已驗證——應該: 每個波次結束時，Performance Supervisor MUST audit flag-candidate 類型的 skill invocation，作法為同時讀取 (a) Commander dispatch log 與 (b) `.logs/skill-invocations.jsonl` 佇列寫入紀錄；計算 `literal_gap = narrative_claimed − actual_logged`（整數，不得為字串）；gap > 0 時 MUST 同時滿足以下三條硬性條件（three-condition lock），否則該 cycle 的 monitor report 被 Quality Auditor 退回重跑： **(a) 問題類別 MUST = `content`**——不得為 `format`、`architect-perspective`、或任何其他類別；`content` 為唯一合法值，即使時間壓力下亦不得 downgrade 至其他類別。 **(b) 問題類別一經指派即 IMMUTABLE**——不得以「下一波再修分類」或任何時間壓力理由改動；sub-label MUST = `skill-invocation-gap`（必填）。 **(c) Quality Auditor MUST 驗證並拒絕任何 downgrade 嘗試**——若觀察到本 cycle 的 problem class 被改為非 `content`、或 sub-label 被移除/替換，Quality Auditor MUST 立即退回 monitor report 並記錄 downgrade attempt 至 audit log。不得與 `architect-perspective` column、`token-budget` column、或任何其他欄位合併記錄（獨立性規則）。 narrative 不構成 invocation 證據；literal `.logs/skill-invocations.jsonl` 條目為唯一證據來源。`actual=0` 與「日誌檔案不存在」是兩個**不同狀態**——前者代表「agent 未呼叫」，後者代表「logger 未寫入 / 檔案缺失」，MUST 標記為 `audit-blocked` 而**不得**當作 `actual=0` 計算 gap（誤將 audit-blocked 當作 actual=0 會不公平懲罰 agent）。 --- ### 🔍 Quality Auditor **G（階段整合目標，串回 O）** > 每一個角色的交付物都能讓下一個角色立刻開始工作——不需要花時間重新格式化、追問來源、或猜測某個欄位的意思。流暢的交接讓整個生產過程的節奏保持，而節奏保持讓建築師視角的修改有足夠的時間被執行。 **Tier 1 摘要（Direction Seed 必帶）** - **G 一句話**：讓每個角色的交付物，下一個角色能在 10 分鐘內立刻開始工作。 - **S 一句話**：對每個交付物做「交接模擬」，確認「S 承諾的具體資源是否出現在交付物中」，把交接失敗分類為格式 / 內容 / 建築師視角問題。 - **關鍵 M**：每波次交付 `audit-002-waveN.md` / 「S 承諾資源是否出現」為必填欄位 / 建築師視角問題提報 Commander。 - **Skill commands**：無 - **Model commands**：Claude Sonnet（稽核判斷） - **Anti-patterns**：詳見本 agent 的「Anti-patterns 標準清單」子段落 **S（為建築師選的路徑）** - 稽核重點從「格式是否正確」擴展到「下一個角色能不能用這個做建築師相關的工作」——一份格式正確但缺乏建築師情境的研究報告，對 Writer 來說是低品質的輸入，即使它通過了格式檢查。 - 對每個交付物做「交接模擬」：假設你是下一個角色，你拿到這個交付物，能不能在 10 分鐘內開始工作？如果不行，為什麼？ - 把交接失敗的原因分類：格式問題（可快速修復）vs 內容問題（需要重做）vs 建築師視角問題（需要重新思考）——這三種問題的處理方式和時間成本完全不同。 **M（對準 S 的資源驗證）** - 每個波次交付 `audit-002-waveN.md` - **「testable claim」定義（reverse-index + handoff-simulation 共用）**：一條 testable claim = 可獨立驗證（numeric / date / code-mandated value / cited document ID / 可查閱的事件）+ falsifiable（存在一個外部真實來源能判對錯）+ Wave 1 交付物在文字或表格中明文陳述。不含：主觀評語、設計建議、敘述性 transition。舉例：「DHI-2022-0417 存在且描述 Midwest retrofit failure」是 testable claim；「此案例教訓對建築師很重要」不是。QA 用此定義建立 reverse-index，Fact Checker / Source Reviewer 的 audit 覆蓋率以此為分母（**lane-scoped**：分母僅限該 reviewer lane 內的 testable claims，不是整份交付物或整堂課程的所有 claims——例如 Fact Checker 的 reverse-index 分母只計入 Fact Checker lane 應驗證的 numeric / date / code-mandated claims，不包含 Source Reviewer lane 的 citation-quality claims）。 - **必填節「10 分鐘交接模擬」**：每份 audit-002-waveN.md 必須包含一個獨立節，明確回答「如果我是下一個角色，能不能在 10 分鐘內用這份交付物開始工作？Y / N / Partial + 1 句理由」。回答 N 時必須列出下一個角色會被迫做的 rework（至少 3 條），估算 rework 時間。略過此節視為 audit 未完成。 - 格式：角色 / S 承諾 / 實際格式 / 通過/失敗 / **S 承諾的具體資源是否出現在交付物中** / 交接就緒？/ 問題類別（格式/內容/建築師視角） - **獨立性規則（hard rule）**：「S 承諾資源是否出現」和「建築師視角評分」是**兩個獨立的欄位**，不得合併判斷。一份交付物可以在建築師視角上評為 Y（case 選得好，persona 有被感受到）但在 S 承諾資源上 FAIL（`/ai-fallback` trace 缺失），反之亦然。兩個欄位必須分別評分，分別處理——若 QA 將兩者合併（例如「內容有問題所以也不對建築師友善」），視為誤分類，重新 audit。 - **分類不得漏**：problem class 必須從 {format, content, architect-perspective} 中擇一（enum 為封閉三值）；一份交付物可以有多個 problem class（列出所有適用者），但不得標記為 "other" 或留空。`audit-of-audit-gap` 是 **`content` 類別下的 sub-label**（不是第 4 個 enum 值），用於標記「上游 audit 漏審但 Wave 1 交付物實際存在的 testable claim」這種特定型態。 - **新增欄位「S 承諾的具體資源是否出現在交付物中」**：例如 Investigator A 的 S 承諾了 DHI 資料庫查詢，Quality Auditor 確認交付物中是否有具體的 DHI 文件編號或查詢記錄——這是防止「任務完成但資源未使用」的第一道閘門 - **反向索引檢查（hard rule）**：Quality Auditor 必須把每份 Wave 1 交付物的 testable claim（依上方定義）列成索引，逐一比對 Fact Checker 的 audit 記錄與 Source Reviewer 的 per-claim coverage index，找出「上游交付物有但上游 audit 沒提及」的漏網項目。漏網項目**不由 QA 自己驗證**——而是退回對應的 Wave 2 reviewer 補做，並在 `audit-002-waveN.md` 記錄 `content / audit-of-audit-gap` 類別（sub-label 屬於 content enum）。覆蓋率 = (上游 audit 命中的 lane-scoped testable claim 數) / (QA 索引的 lane-scoped testable claim 數)；覆蓋率 < 100% 即 BLOCKER，無論絕對命中數多高（4/6 與 0/6 同等 block）。若上游 audit 輸出格式無法支援逐條對帳（例如只有 aggregate 敘述），QA 直接退回該 reviewer 補做 per-claim 表，並視為 audit 未完成。 - 所有「交接不就緒」的交付物退回原角色，附具體修改指示，在下個波次開始前完成 - 確認所有建築師視角問題（不只是格式問題）都被記錄，並提報 Commander - **Commander 升級通知（template）**：任何建築師視角問題或 S 承諾資源缺失提報 Commander 時，使用此格式（1 段 + 3 行）： - 第 1 行：`[WAVE-N] [agent-name] failure class: format|content|architect-perspective` - 第 2 行：`specific gap: [具體缺什麼]` - 第 3 行：`fault attribution: briefing-incomplete | briefing-complete-but-subagent-skipped | subagent-hallucinated-tool-use`（**區分 skipped vs hallucinated 的判準**：檢查交付物是否附 `call_with_fallback.sh` 執行 log 或對應 skill 的實際執行痕跡。(a) 執行 log 存在但輸出為空 → `briefing-complete-but-subagent-skipped`；(b) 執行 log 不存在且 claim/資料在交付物內 → `subagent-hallucinated-tool-use`；(c) 執行 log 不存在且交付物無相應資料 → `briefing-complete-but-subagent-skipped`。若 (a)(b)(c) 均無法確認，記錄為 `skipped-or-hallucinated (indistinguishable without execution log)` 並升級到 Commander——不得猜一個二選一；同時回補 Direction Seed，下個波次起強制要求執行 log 作為交付物必備附件。) - 1 段推薦動作（最多 3 句） - 例："[WAVE-1] Investigator A failure class: content. Specific gap: no /ai-fallback wrapper trace, no DHI document IDs, empty knowledge query output. Fault attribution: briefing-complete-but-subagent-skipped (Direction Seed verified by PS). Recommended: reject deliverable, re-dispatch with a reminder that tool use requires literal artifact attachment; do not accept narrative claims." - **Fresh Eyes `clean_draft_asserted` 獨立稽核（hard rule，Wave 2 gate）**：任何一份 Fresh Eyes Reviewer 交付物若在 header 設定 `clean_draft_asserted: true`，Quality Auditor **必須**在 Wave 2 gate review（Wave 3 開始之前）對該 assertion 做獨立稽核——**不是抽樣，不是隨機**，是每一份都稽核。稽核範圍由兩個互補層次構成： - **(1) `per_section_coverage` 對帳**：對 reviewer header 中列出的每一個課程 section（sub-section 粒度優先，例如 §4.2 而非 §4），QA 必須獨立讀該 section 的實際 draft 內容，對 class_1 / class_2 / class_3 / factual_accuracy 四個 class 重新獨立分類，並與 reviewer 宣稱的 audit rationale 一一比對。此稽核動作**不得依賴 reviewer 自己提供的判斷**——QA 自己讀草稿、自己分類，再比對 reviewer 是否一致。若任一 section 的任一 class，QA 的獨立分類判定該 class 應 land（該 section 存在循環論述 / 選擇性引用 / 默認立場 / factual error 之一）但 reviewer rationale 標為 N/A 或 PASS，記為 per-section mismatch。 - **(2) `clean_draft_rationale` 敘事對帳**：reviewer 提供的 gestalt narrative 必須由上述 per-section 證據支撐。若敘事宣稱「全篇無 vendor framing」但 QA 在 per-section 對帳中發現某 section 有 class_3 vendor mention，則 narrative 與 section 證據不符。narrative 本身非測試標的——它的作用是交叉驗證 per-section coverage 沒有被 reviewer 挑三揀四。 - **false_clean_assert 判定與 BLOCKER 升級**：若上述稽核在任一層次發現 mismatch（per-section 層任一 class reviewer mis-classified，或 narrative 與 per-section 證據不符），QA 必須在 audit deliverable 宣告 `false_clean_assert: true` 並**視為 BLOCKER** 向 Commander 升級（不是警告、不是記錄、不是下個 cycle 改進——是本 cycle Wave 3 不得 start 的硬阻斷）。Commander 收到 BLOCKER 後的動作是強制 Fresh Eyes Reviewer re-run（不得 waive `clean_draft_asserted`），並把此次 false-assert 記入 Commander retro。觸發判準不打折：**一個 mismatch 即為 false-assert**；不需要達到閾值數量，因為 clean assertion 本身是 binary claim（「全篇 clean」），任何一個漏網項即推翻整個 assertion。 - **稽核時點（WHEN）**：此 false_clean_assert 稽核必須在 Wave 2 gate review 進行——定義為 Wave 2 所有 reviewer（Fact Checker / Source Reviewer / Performance Supervisor / Fresh Eyes Reviewer / Compliance Reviewer）全部交付完成之後，Wave 3（Writer 最終稿、Engineer HTML）開始之前的 QA 合閘時點。Wave 3 在本稽核通過之前不得 start；若本稽核 FAIL（false_clean_assert = true），Wave 3 保持 blocked 直到 Fresh Eyes re-run 且新一輪 assertion 通過 QA。 - **稽核日誌格式（MUST）**：Quality Auditor 必須產出獨立檔案 `audit-002-fresh-eyes-clean-assertions.md`（不與 `audit-002-waveN.md` 合併，避免與反向索引檢查結果互相覆蓋），內含一張表，欄位為：`section_id | reviewer_claimed_class_1 | qa_independent_class_1 | match? | reviewer_claimed_class_2 | qa_independent_class_2 | match? | reviewer_claimed_class_3 | qa_independent_class_3 | match? | reviewer_claimed_factual_accuracy | qa_independent_factual_accuracy | match? | per_section_verdict (PASS/FAIL)`。表尾另加 `rationale_narrative_verdict (PASS/FAIL + 1 行理由)` 與最終 `clean_assert_audit_verdict (CONFIRMED / FALSE_CLEAN_ASSERT)`。若最終為 `FALSE_CLEAN_ASSERT`，本檔案必須附 BLOCKER 升級通知（採用上方 Commander 升級通知 template 格式，failure class = content，fault attribution 依實際情況選擇 briefing-complete-but-subagent-skipped 或 subagent-hallucinated-tool-use）。 - **lane 邊界重申**：此 false_clean_assert 稽核是 QA 的職責，但**不得變成 QA 重做 Fresh Eyes Reviewer 的工作**。QA 的 independent classification 只為了比對 reviewer 是否正確地執行 clean assertion——QA 自己不產出新的 challenge（若 QA 發現應 land 的 blindspot，動作是宣告 false-assert 並退回 Fresh Eyes Reviewer re-run，而不是把該 blindspot 寫進自己的 audit 作為 challenge）。此條與上方 anti-pattern 4（scope-creep forbidden）完全一致：QA 稽核上游 audit 的記錄品質，不稽核 underlying draft 內容本身，唯一例外是在 clean_draft_asserted 場景需要獨立讀 draft 以驗證 reviewer 的空手 pass 聲明——但獨立讀的目的**只是比對**，不是取代。 **對齊 O 的論述** 生產鏈的流暢性保護的是時間，而時間保護的是修改建築師視角問題的機會。一個在 Wave 2 才被發現的 Wave 1 格式問題，會壓縮掉本來可以用來改善建築師體驗的時間。 **Anti-patterns 標準清單（Direction Seed 第 9 欄位來源）** - NOT: 只數交付物數量算通過——應該: 對每個交付物做「交接模擬」（假設你是下一個角色，能不能在 10 分鐘內開始工作），交接就緒與否是核心判斷 - NOT: 把 S 承諾的資源未出現在交付物中當成「不重要」——應該: 「S 承諾的具體資源是否出現在交付物中」是必填欄位，例如 Investigator A 承諾了 DHI 資料庫查詢，交付物中必須有具體 DHI 文件編號，沒有就退件 - NOT: 用內部標準（製作者視角）取代建築師標準——應該: 交接失敗的原因必須分類（格式問題 / 內容問題 / 建築師視角問題），建築師視角問題必須提報 Commander，不只是格式退件 - NOT: 重做上游 reviewer 的工作（例如自行重驗 Fact Checker 已驗的 claim、自行重評 Performance Supervisor 的建築師視角分數、重跑 Source Reviewer 的來源交叉驗證）——應該: Quality Auditor 只稽核「上游 audit 記錄是否存在、是否與 Wave 1 交付物的 claim 索引一致」，不碰 underlying claim 本身；漏審或格式不足的項目退回對應 reviewer 補做。scope 越界會同時觸發兩個問題：(a) QA 判斷品質下降（無法再公正稽核自己剛做的事），(b) 上游 reviewer 的責任被稀釋。Fact Checker 的 lane 是 factual verification、Source Reviewer 的 lane 是 citation quality、Performance Supervisor 的 lane 是 architect-perspective 評分——Quality Auditor 一律不得跨入。讀上游 audit ≠ 重跑驗證管線（reading upstream audit is NOT rerunning verification pipeline）。 - NOT: 在時間壓力下把 content 問題降級為 format 問題以避免 block Wave 3——應該: problem class 一旦依據 {format, content, architect-perspective} 判定就不得降級；時間壓力下的 trade-off 屬於 Commander 的決策權限（「block vs ship-incomplete」），QA 不得自行重新分類來「幫忙」。若 deadline 逼近，QA 的正確動作是在升級通知中明確寫「block vs ship-incomplete — Commander decision required」，而不是把 content 改成 format 來規避 block。此 anti-pattern 的觸發證據：audit 記錄裡同一筆問題的 problem class 在兩個版本之間從 content/architect-perspective 變成 format，且無對應的修復記錄 → 自動視為誤分類，重新 audit。 --- ### 🎓 Learning Outcome Validator **G（階段整合目標，串回 O）** > 在課程被部署之前，我們有具體的證據說明：一個沒有硬體專業的建築師，讀完這門課之後，能夠在真實的 spec review 情境中做出正確決策。這個證據是 Gemini Pro 用建築師 persona 模擬的推理過程——不是我們自己說「這應該夠了」。 **Tier 1 摘要（Direction Seed 必帶）** - **G 一句話**：在部署前取得三個建築師 persona 試做的具體證據，確認 O 的「獨立判斷」能力被實現。 - **S 一句話**：三個 persona（generalist / 習慣 spring hinge 的資深者 / 熟悉 door closer 的建築師）各自回答 5 個獨立設計的決策題，任何 persona 答錯 2 題以上立刻升級到 Commander 暫停 Wave 3。 - **關鍵 M**：三個 persona 各 4/5 題以上通過 / 至少 2 個 persona 能說出具體 spec 資源使用路徑 / 答錯升級機制在 HTML 生產前啟動。 - **Skill commands**：無 - **Model commands**：`bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Role-play [persona N]. Take test: [10 questions]. Report answers + confidence..." "gemini-2.5-pro,gemini-2.5-flash-lite,codex"` （詳見 Model Invocation Map） - **Anti-patterns**：詳見本 agent 的「Anti-patterns 標準清單」子段落 - **測試題目分兩層**: (1) Validator 獨立設計的 5 個決策題（測真實理解，不參考課程題目）+ (2) 驗證 Engineer 的 10 Q post-test（persona 試做 ≥ 2/3 答對 ≥ 8 題才過，對應 Engineer spec L759） **S（為建築師選的路徑）** - 獨立設計 5 個決策題目，基於 O 的學習目標，**不參考課程本身的評量題目**——確保我們測試的是真正的理解，不是對課程題目的記憶。 - 用三個不同的建築師 persona 做模擬：generalist（沒有硬體專業）/ 習慣 spec spring hinge 的資深建築師 / 熟悉 door closer 但不熟悉 spring hinge 的建築師——每一個 persona 代表不同的先備知識和思維慣性。 - 如果任何 persona 在 5 題中 **correct < 4 of 5**（即答對少於 4 題；等價於答錯 ≥ 2 題，但用「correct < 4 of 5」措辭避免未來對「≤ 3/5」是否包含正好 3/5 的重新詮釋），這是課程問題，不是 persona 問題——立刻升級到 Commander，**在 HTML 生產開始之前**修改課程內容。 **M（對準 S 的資源驗證）** - 調用工具：`gemini -m gemini-2.5-pro -p "[建築師 persona 設定] 基於以下課程內容，回答這 5 個問題並說明推理過程：[題目]"` - 交付 `validate-002-learning.md` - 三個 persona 各一節，每節包含：persona 描述 / 每題推理過程 / 識別到的內容缺口 / 整體評分（就緒 / 需修改） - **[Tier A] Per-persona 分數表 schema（結構化，非敘述段落）**：交付物中必須包含一張 post-test 分數表，shape 為 `| persona | prior_belief_seed | correct | total | % | pass? |`，每個 persona 一列，末列為 aggregate（`correct_total / 30`, `aggregate_%`, `pass?`；aggregate 列的 `prior_belief_seed` 欄填 `—`）。pass 判定門檻為每 persona ≥ 8/10（80%）。`prior_belief_seed` 欄位必須填寫 Wave 2 開始時 frozen 的 persona 先備信念 verbatim（例如「door-closer-seasoned，認為 spring hinge 是舊式技術、不如 closer 精密」），以便 Commander 在 Gate 3 抽檢時可以對照該 persona 是否在 Wave 2 中被悄悄軟化（anti-pattern 第 4 條 persona 重新校準）。此表取代 v5 L921 原本的「整體評分（就緒 / 需修改）」二元欄位——二元欄位可以保留為額外資訊，但**不得**作為 AIA 80% 門檻的唯一表達。缺少此結構化表格或缺少 `prior_belief_seed` 欄位即視為交付物不完整，Commander 退回重做。 - 至少識別 3 個特定的內容缺口（某個 persona 的既有思維模式會讓他們得出錯誤結論，即使讀完課程之後） - **Writer B 資源導航的學習驗證**：三個 persona 中，至少 2 個能在模擬中說出「如果我想把 Waterson 放進我的專案，我可以聯繫 [具體資源名稱]」——這是 Writer B 的 G 是否被實現的直接驗證 - 確認決策工具（Writer B 的情境辨識工具）可以在沒有課程輔助的情況下被建築師獨立使用 - **[Tier A] Coverage matrix（LO × Question）必填**：交付物中必須包含一張 coverage matrix，rows = 課程所有 stated learning outcomes，columns = 10 Q post-test（Q1..Q10），cell 填 `1`（mapped）或空（not mapped）。驗收規則：(a) 每個 LO 必須被 ≥ 2 個 Q 覆蓋，覆蓋 < 2 的 LO 必須標記為 **under-tested LO flag** 並阻擋 Gate 3；(b) 每個 Q 必須恰好 map 到一個 LO，零 LO 映射的 Q 必須標記為 **orphan Q flag** 並阻擋 Gate 3；(c) 兩類 flag 任一存在即必須在交付物開頭明列，Commander escalation 必須說明是哪一類 block。matrix 省略或 flag 規則缺失即視為 M 不完整。 - 如果任何 persona 答錯 2 題以上，立刻升級到 Commander，暫停 Wave 3 啟動 - **[Tier A] `halt_gate` 機讀旗標**：`validate-002-learning.md` 的 front-matter 必須帶 `halt_gate: ` 欄位（YAML 整數，指向被 halt 的下一個 Gate 編號；若本 run 沒有 halt 條件觸發則省略該欄位或寫 `halt_gate: null`）。對於 Learning Outcome Validator 的本次 Wave 2 run，觸發 halt 的 Gate 編號為 `3`；故正常的兩個合法值為 `halt_gate: 3`（halt）或欄位省略 / `halt_gate: null`（不 halt）。`halt_gate: 3` 的觸發條件：(i) 任一 persona `correct < 4 of 5` 於 5 題獨立決策題，或 (ii) post-test aggregate < 80%，或 (iii) coverage matrix 有 orphan Q / under-tested LO flag。欄位命名採用 **state-flavored、wave-無關** 的形式（對齊 Compliance Reviewer `override_action` 的 header-field 慣例），避免硬編碼 wave 編號；未來若 HSW-003 有不同的 gate 編號，只需填 `halt_gate: <新編號>`，schema 不變。⚠️ **Downstream coupling note**：本欄位是 v5 L925「立即升級 Commander、暫停 Wave 3 啟動」的機讀對應物，**但 Gate 3 檢查清單目前（v5 L1286）仍以敘述方式讀取交付物，並未 grep 此欄位**。在 Commander polish cycle 把 L1286 更新為自動 grep 之前，`halt_gate` 欄位是 documentation-only（Validator 必須填寫，但實際阻擋 Gate 3 的仍是 Commander 人工閱讀）。此 coupling risk 已 file 於 Robot 3 iteration log v2 §6 item #6，需在 Commander polish cycle 中追蹤。 - **Gemini 2.5 Pro 呼叫驗證**：三個 persona 各自的完整 Gemini 2.5 Pro 回應（每題推理過程的原文，不只是最終評分）必須附在 `validate-002-learning.md` 中，每個 persona 一節；Commander 在 Wave 2 結束時抽檢 ≥ 1 個 persona 的原始回應，確認是真實 Gemini 推理過程而非 Learning Outcome Validator 自行歸納的摘要；若 Gemini 2.5 Pro 服務不可用，在交付物開頭標注「Gemini 2.5 Pro unavailable — persona simulation fallback to Claude Opus, confirm with Commander before proceeding to Wave 3」 - **`/ai-fallback` 呼叫驗證**：確認 `call_with_fallback.sh` 執行記錄（三個 persona 的完整回應，每個 persona 一節）附於 `validate-002-learning.md`；若 fallback chain 觸發（含 Codex fallback 情況），記錄實際使用的備援模型名稱，並評估備援模型對 persona 模擬質量的影響 - **[Tier A] Pro + lite 雙雙失敗的 degraded-mode fallback**：若 `gemini-2.5-pro` 與 `gemini-2.5-flash-lite` 同時不可用（REST 429 / 超時 / 認證失敗），Validator **不得**以自行歸納的摘要代替 persona 模擬。允許的 degraded-mode 路徑：(a) 暫停並通知 Commander 重新排程本次 Validator run；或 (b) 使用 Claude Opus 進行 self-simulation，且必須在交付物開頭加入 banner「**[MODEL-DEGRADED]** Pro + lite 皆不可用，persona simulation 以 Claude Opus self-simulation 執行；模擬質量可能降低，Commander 必須在 Gate 3 前重新以 gemini-2.5-pro 重跑至少 1 個 persona 做抽檢」，並把 fallback 鏈 exit 3 日誌逐字附於交付物末尾。Codex 在本 persona 模擬情境中不視為可接受的 persona simulation 引擎（Codex 長於 code review / citation cross-check，非 architect persona role-play），故不列入此 degraded-mode 路徑。 **對齊 O 的論述** O 中「獨立判斷合規性和適用性」不是在課程發布時實現的——它是在建築師讀完課程之後實現的。Learning Outcome Validator 是唯一直接測量 O 的角色。沒有它的確認，部署只是希望，不是證據。 **Anti-patterns 標準清單（Direction Seed 第 9 欄位來源）** - NOT: 用單一 persona 驗證就算完成——應該: 必須用三個不同的建築師 persona（generalist / 習慣 spec spring hinge 的資深者 / 熟悉 door closer 但不熟 spring hinge 的建築師），每個 persona 代表不同的先備知識和思維慣性 - NOT: 問封閉式記憶題而不是開放的決策題——應該: 5 個決策題目基於 O 的學習目標獨立設計，不參考課程本身的評量題目，測的是真正的理解而不是對課程題目的記憶 - NOT: 某個 persona 答錯 2 題以上就歸咎 persona 設定不好——應該: 任何 persona 在 5 題中答錯 2 題以上，立刻視為課程問題升級到 Commander，在 HTML 生產開始之前修改課程內容 - NOT: 當 post-test aggregate < 80% 時把門檻往下調、或把某個 persona 標為 outlier 排除、或宣稱「fixture 不公平」、或在同一個 category 內把 persona 換成先備信念更軟的版本（persona 重新校準）——應該: 完整揭露原始 per-persona 分數表（含 `prior_belief_seed` 欄位）、把失敗歸因為課程內容不足、升級到 Commander，**永遠不在部署階段修改 AIA 80% 門檻，永遠不排除失敗 persona，永遠不在 Wave 2 中重新校準 persona 先備信念**。門檻漂移（threshold drift）是最容易讓破損課程通過 AIA CES 的 silent failure mode，必須以 anti-pattern 第 4 條明列防堵。具體四種漂移模式： - (i) **門檻下調**：把 80% 改成 75% / 70% 以讓 aggregate 通過。 - (ii) **outlier 排除**：把失敗的 persona 標為 outlier、從 aggregate 剔除。 - (iii) **fixture 指責**：宣稱題目 / 課程 fixture 不公平，回推修改 fixture 而非課程。 - (iv) **persona 重新校準**：在三個 persona 的 category 不變的前提下（generalist / spring-hinge-seasoned / door-closer-seasoned），把某個 persona 的 `prior_belief_seed`（先備信念 verbatim）改成更容易被課程說服的版本，讓 aggregate 從 <80% 變成 ≥80%。三個 persona 的 `prior_belief_seed` 一旦在 Wave 2 開始時 frozen，整個 Wave 2 run 不得修改；若需要重新校準，必須 file 一個新的 Validator run (cycle 2) 並在新 run 的 score 表 `prior_belief_seed` 欄位反映新值，Commander 可以透過比對兩次 run 的 verbatim seed 察覺重新校準。 --- ## v4 擴充（18 → 19 agents）— 生命週期缺口的 skill/agent 重分類 v3.1 原本想用 5 個新 agent 補完五個缺口（發現、翻譯、測驗、審查、延伸），但回頭檢視後發現只有「內容管線」真的需要新 agent——因為它需要一個持續存在的角色，跨讀所有波次的交付物並寫入 queue。其他四個都是**可重複流程**（skill）或**現有角色的 M 擴充**，不應該膨脹成新角色。 v4 的處理方式： - **#19 Candidate Collector（保留為 agent，範圍縮限）**——取代 v3.1 的 Content Scout，只蒐集不寫作 - **Post-Test Designer → `/post-test-designer` skill**（Phase 2 建立），由 Wave 3 Engineer 在 S 中呼叫 - **Competitor Coverage Auditor → Compliance Reviewer 的 M 擴充**（下方 Compliance Reviewer section 增加 4 條驗收） - **Chinese Translator → `/aia-rewrite --bilingual` skill flag**（Phase 2 實作） - **SEO/AEO Engineer → 流程倒轉原則**（deferred，不在 v4 範圍內）詳見本章節後段的 **Skill Invocation Map** 和 **Principle 7**。 --- ### 🗃️ Candidate Collector — Blog Topic Queue Writer（Side Channel，#19） **G（階段整合目標，串回 O）** > 跨波次維持 queue 的類型分布平衡——不讓課程只產出單一類型的 blog 題目。當 queue 累積到某個類型過多、某個類型缺失時，Candidate Collector 主動提醒相關 agent 在下一輪研究中優先找缺失類型。蒐集動作本身由 `/content-scout flag-candidate` skill 處理，Candidate Collector 做的是「看全局」的判斷——確保 Phase 3 Blog Writer Fleet 未來接手時，建築師讀者能看到多樣化類型的深度文章，不落入單一視角偏食。 **Tier 1 摘要（Direction Seed 必帶）** - **G 一句話**：跨波次維持 queue 的 8 種類型分布平衡，不讓課程只產出單一類型 blog 題目。 - **S 一句話**：每波次結束後掃描 `.content-scout-queue.md`，統計各 type 數量，出現不平衡（某 type ≥ 3 且另一 type = 0）時寫入 `## Type Distribution` 段落和 alert。 - **關鍵 M**：`## Type Distribution` 段落每波次更新 / 不平衡時 alert 必須生成（漏 alert = 失職）/ 主動呼叫 `/content-scout flag-candidate` 補齊缺失類型候選（需來自既有交付物）。 - **Skill commands**：`/content-scout flag-candidate --source-agent candidate-collector --source-file [path] --title "[題目]" --type [缺失類型] --keywords "[關鍵字]" --research-data "[逐字引用]" --why-worth-writing "[跨波次觀察]" --origin-agent [原作者] --source-wave [N] --scan-wave [M]`（Type Distribution 偵測到類型缺口時呼叫；內含單引號的逐字文字使用 `'"'"'` 插入 idiom，避免 harness 雙重跳脫） - **Model commands**：Claude Sonnet（type 平衡判斷） - **Anti-patterns**：詳見本 agent 的「Anti-patterns 標準清單」子段落 **v4 範圍（round 2 修正 → round 3 微調）** - ✅ 跨波次讀所有交付物 + 研究資料 - ✅ 掃描 `.content-scout-queue.md` 維持類型分布平衡 - ✅ 發現類型缺口時，**主動呼叫** `/content-scout flag-candidate` 從既有交付物中找符合缺失類型的候選 - ✅ 寫入並更新 `.content-scout-queue.md` 頂部的 `## Type Distribution` 段落 - ❌ 不改寫別人的 research_data（保持原始） - ❌ 不對候選做排序或推薦（Phase 3 Blog Writer Fleet 的工作） - ❌ 不跨越到其他 agent 的決策範圍 **S（為 Phase 3 寫手選的路徑）** - 每波次結束後掃描 `.content-scout-queue.md`，統計各 type 的累積數量（regulatory-explainer / case-study / product-comparison / statistical-insight / cost-comparison / code-conflict / scenario-guide / reader-interest） - 當某 type 累積 ≥ 3 且另一 type = 0，產生「type balance alert」寫入 `.content-scout-queue.md` 頂部的 `## Type Distribution` 段落 - alert 格式：`[Wave N] Imbalance detected: 5x regulatory-explainer, 0x case-study. Recommend Investigator A prioritize case research in next wave.` - **主動 flag-candidate 的觸發條件**：當 Type Distribution 顯示某類型 = 0 且另一類型 ≥ 3 時，掃描既有交付物找符合缺失類型的候選，呼叫 `/content-scout flag-candidate --source-agent candidate-collector --source-file [原始交付物路徑] --title "[題目]" --type [缺失類型] --keywords "[關鍵字]" --research-data "[逐字引用]" --why-worth-writing "[1-2 句跨波次觀察]" --origin-agent [source_file 原作者，如 investigator-a] --source-wave [source_file 產出的波次整數] --scan-wave [Candidate Collector 本次掃描的波次整數]` - **Shell-escape 規則（逐字引用內含單引號時）**：`--research-data` 若包含 `'`（apostrophe，如 `ref'd`、`don't`），必須使用 Bourne shell 單引號串接 idiom `'"'"'`（close single-quote、insert double-quoted apostrophe、re-open single-quote），以保留 byte-exact 原文。naive harness 若做第二層轉義會把 `ref'd` 變成 `ref'"'"'d` 進入 `research_data`，破壞 Scenario 2 byte-substring 斷言。Performance Supervisor 的驗證必須對 **post-skill-decode** payload 執行 substring check，不可對 pre-shell argv 檢查。 - 不對候選做排序或推薦（Phase 3 Blog Writer Fleet 的工作） - 不改寫別人的 research_data（所有 research_data 必須是交付物中實際存在的段落逐字引用） - 只讀 queue 不改寫 research_data——**因為** Phase 3 Blog Writer Fleet 依賴原始逐字材料，任何改寫會破壞後續 writer 對證據的信任鏈 **M（對準 S 的資源驗收標準）** - **`## Type Distribution` 段落**在 `.content-scout-queue.md` 頂部存在且每波次更新 - 若波次間出現 type imbalance（某 type ≥ 3 且另一 type = 0），alert 必須生成——漏 alert = 失職 - `collector_notes` 欄位是選填但鼓勵：允許 Candidate Collector 在掃描後加入心得（跨候選的共通點、type 平衡觀察、值得 Phase 3 留意的 pattern）；`research_data` 欄位仍由蒐集 agent 負責填入，Candidate Collector **不覆寫**它——兩個欄位並存，心得加分不替代原始資料 - **flag-candidate 呼叫驗證**：Performance Supervisor 抽檢 Candidate Collector 呼叫 `/content-scout flag-candidate` 的記錄——每次呼叫必須（1）來自 Type Distribution 偵測到的類型缺口、（2）使用既有交付物的逐字引用（不自製內容）、（3）`--source-agent` 標為 `candidate-collector`、（4）`--why-worth-writing` 欄位包含跨波次觀察而非單一候選評估 - **Provenance metadata 驗證（G-022，僅限 Candidate Collector）**：每筆 Candidate Collector 寫入的 queue entry 必須攜帶非空 `origin_agent`、`source_wave`、`scan_wave` 三個欄位；skill 層 queue write 在任一欄位為空時 **fail-closed**（拒絕 append 並回報錯誤）。`source_wave ≤ scan_wave` 必須成立（不能 flag 來自未來波次的候選）；`origin_agent` 必須是 source_file 真正的產出 agent 而非 `candidate-collector` 自己。非 Candidate Collector 的呼叫者（Investigator A/B、Writer A/B、Architect Advisor 等）不需顯式傳遞 `--origin-agent` / `--source-wave` / `--scan-wave` 這 3 個新 flag；skill 層須以預設值 `origin_agent = source_agent`、`source_wave = scan_wave = current wave` 自動填入，並豁免 fail-closed 檢查。 - **Performance Supervisor 每波次抽檢**：Candidate Collector 寫入 `## Type Distribution` 段落的更新是否準確，flag-candidate 呼叫的來源是否可回溯到既有交付物 - **queue 檔案位置**：`door-site/.content-scout-queue.md` **對齊 O 的論述** Candidate Collector 不直接對 O 產生影響（它不改變課程品質），但它保護 O 的**可擴散性**——讓課程製作的副產品（深度研究資料）不被浪費，並確保各類型題目的分布均衡，留給 Phase 3 Blog Writer Fleet 有多樣性的題目池可以接手。 **Anti-patterns 標準清單（Direction Seed 第 9 欄位來源）** - NOT: 自製 research_data 內容（不是既有交付物的逐字引用）— 應該: 所有 research_data 必須是交付物中實際存在的段落 - NOT: 對 queue 中的候選做排序或推薦 — 應該: Phase 3 Blog Writer Fleet 才有排序權力 - NOT: 在 Type Distribution 之外寫入（越界到 Queue 或其他段落）— 應該: 只寫 Type Distribution，flag-candidate 的結果由 skill 自己 append 到 Queue - NOT: 改寫其他 agent 的 research_data — 應該: 保持原始逐字 **波次位置**：**Side Channel**——Wave 1 結束時 Commander 派遣，Wave 1→Wave 3 全程運作，Wave 3 前交付，不擋 gate。 --- ## Skill Invocation Map **為什麼需要這張表**：subprocess agent 執行時看不到 parent Claude 的 memory 或 CLAUDE.md。所以任何「agent 應該呼叫 skill」的指令，必須直接寫在該 agent 的 S 段落裡（或這張 Map 裡，並在 S 中引用這張 Map）。 | Agent | Wave | Skill | 呼叫時機 | 命令格式 | |-------|------|-------|---------|---------| | Investigator A | Wave 1 | `/content-scout flag-candidate` | 發現案例足以支撐獨立 blog 文章時 | `/content-scout flag-candidate --source-agent investigator-a ...` | | Investigator B | Wave 1 | `/content-scout flag-candidate` | 發現法規主題足以支撐獨立 blog 文章時 | `/content-scout flag-candidate --source-agent investigator-b ...` | | Writer A | Wave 1 | `/content-scout flag-candidate` | 撰寫時發現某個技術概念值得獨立展開 | `/content-scout flag-candidate --source-agent writer-a ...` | | Writer B | Wave 1 | `/content-scout flag-candidate` | 撰寫時發現某個情境值得獨立展開 | `/content-scout flag-candidate --source-agent writer-b ...` | | Project Architect Advisor | Wave 2 | `/content-scout flag-candidate` | 讀完課程後發現建築師會想看但課程未深入的主題 | `/content-scout flag-candidate --source-agent project-architect-advisor ...` | | Candidate Collector | Side Channel (Wave 1→3) | `/content-scout flag-candidate` | 偵測到 Type Distribution 類型缺口時，從既有交付物中主動補齊候選 | `/content-scout flag-candidate --source-agent candidate-collector --source-file [path] --title [缺失類型題目] --type [缺失類型] --origin-agent [原作者] --source-wave [N] --scan-wave [M] ...`（詳見本 agent 的 S 段落；內含 apostrophe 的逐字文字使用 `'"'"'` shell-escape idiom） | | Engineer (HTML) | Wave 3 | `/post-test-designer` | 把課程內容轉成 HTML 前先生成 10 題 post-test | `/post-test-designer --course HSW-002 --course-file [path] --distribution 4/4/2` | | Engineer (HTML) | Wave 3 | `/aia-rewrite --bilingual` | 生成 `/aia/{slug}/` 英文版 + `/aia/zh/{slug}/` 中文版 | `/aia-rewrite --course HSW-002 --bilingual` | **注意**：上表中的 `/post-test-designer` 和 `/aia-rewrite --bilingual` 是 **Phase 2** 要建立的 skill，v4 架構預留這些呼叫點但 skill 本身在 Phase 2 才會實作。 --- ## Model Invocation Map — 多模型協作分工 **背景**：Skill Invocation Map 文件化「agents 何時呼叫哪個 skill」；Model Invocation Map 文件化「agents 何時呼叫哪個外部 LLM」。兩者並列，都是中央事實來源，由 Commander 在 Direction Seed 的第 5 欄位（Embedded Skill + Model Invocations）中複製貼上對應本角色的命令。 **分工原則**（來自 memory `feedback_multi_ai`）： - **Claude Sonnet / Opus**（Max 吃到飽）：核心寫作、整合、決策、跨 wave 協調 - **Gemini Flash**（免費 1000/day）：搜尋 grounding、fact check、SEO、校對、persona simulation - **Gemini 2.5 Pro**（免費但較慢）：稽核員模擬、複雜 persona 扮演、語氣評分 - **Codex $20/月**（省著用）：code review、accessibility audit、引用交叉驗證 | Agent | Wave | Model | 用途 | 命令格式 | |-------|------|-------|------|---------| | Investigator A | Wave 1 | Gemini Flash | 搜尋 2020+ 真實案例、DHI 資料庫查詢、法院文件搜尋 | `echo "Y" \| gemini -m gemini-2.5-flash -p "Search for [X] cases after 2020, DHI database preferred, return source URLs" --output-format text` | | Investigator B | Wave 1 | WebSearch (primary) + Gemini Flash via `/ai-fallback` (verification / summarization) | 法規條號版本查證、ICC/NFPA 交叉驗證、state-amendment delta 查詢 | Research queries: **WebSearch tool (Claude Code built-in)** for open-ended discovery (state adoption bulletins、ICC/NFPA publisher pages、AHJ notices). Generative summarization / verification / structured extraction: `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Verify [IBC section X] current version and cross-reference with NFPA 80 [section Y]" "gemini-2.5-flash,gemini-2.5-flash-lite,gemini-2.5-pro,codex"` (raw `echo \| gemini` invocation prohibited — Rule 8 + P-015 WebSearch migration, must use wrapper) | | Fact Checker | Wave 2 | Gemini Flash | 數字逐項驗證、來源可達性檢查 | `echo "Y" \| gemini -m gemini-2.5-flash -p "Verify: [number] [claim]. Return VERIFIED/CORRECTED/UNVERIFIABLE + source" --output-format text` | | Source Reviewer | Wave 2 | Codex (primary) → Gemini 2.5 Pro → Flash-Lite → Claude-native | 引用交叉驗證、來源品質評分；最小鏈深度 3 | `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Review all citations in [file]. Flag: missing source, 2018- source without version note, single-source claims" "codex,gemini-2.5-pro,gemini-2.5-flash-lite"` | | Compliance Reviewer | Wave 2 | Gemini 2.5 Pro | AIA 稽核員模擬（Competitor Coverage 中立性評分） | `echo "Y" \| gemini -m gemini-2.5-pro -p "Role-play AIA CES auditor. Score each Big 3 mention 1-5 for neutrality. [context]" --output-format text` | | Project Architect Advisor | Wave 2 | Gemini 2.5 Pro (persona primary, via `/ai-fallback`) | 12 年 Project Architect persona simulation | `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Role-play 12-year Project Architect. Read: [course]. Answer 6 decision questions: [list]" "gemini-2.5-pro,gemini-2.5-flash-lite,codex"` | | Sales Rep Advisor | Wave 2 | Gemini 2.5 Pro | Sales rep 直覺驗證（課程是否有廠商氣味） | `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Role-play independent hardware sales rep (8 years visiting architects). Does this read like education or advertising? Score 1-5 + reasons. Answer 5 questions from v5 spec." "gemini-2.5-pro,gemini-2.5-flash-lite,codex"` (raw `echo \| gemini` invocation prohibited — G-012 Pro hang exposure, must use wrapper) | | Fresh Eyes Reviewer | Wave 2 | Gemini 2.5 Pro | 外部視角挑戰（不讀 Wave 1/2 報告） | `echo "Y" \| gemini -m gemini-2.5-pro -p "You have NO context. Read course cold. Challenge anything that looks taken-for-granted. Don't use jargon." --output-format text` | | Learning Outcome Validator | Wave 2 end | Gemini 2.5 Pro | 3 個 persona 試做 post-test | `echo "Y" \| gemini -m gemini-2.5-pro -p "Role-play [persona N]. Take test: [10 questions]. Report answers + confidence" --output-format text` | | Engineer (HTML) | Wave 3 | —（使用 Claude Sonnet via Agent tool） | HTML 實作、structured data | — | | Candidate Collector | Side Channel | —（使用 Claude Sonnet） | 跨波次 type 平衡判斷 | — | | Commander | All waves | Claude Opus | 指揮、決策、pilot dispatch 判斷 | — | **延伸規則**： - 任何 agent 在 S 中需要呼叫外部 LLM 時，必須從這張表複製命令格式，不要自己即興寫（避免參數格式不一致） - 新增外部 LLM 呼叫時，先更新這張表，再在 agent 的 S 中引用 - Commander 的 Direction Seed 第 5 欄位（Embedded Skill + Model Invocations）同時帶 skill 和 LLM 呼叫命令 **和 Principle 7 的關係**：Principle 7 原本只說 skill 命令必須 embedded；加入 Model Invocation Map 後，LLM 呼叫命令也必須 embedded，同樣的理由（subprocess 看不到 parent memory）。Principle 7 的範圍擴展為「skill + model invocations 都必須 embedded」。 ### P-015 Application: WebSearch as primary for research archetype During Batch 1-3 scale-up, research-archetype agents (Investigator A/B) found `/ai-fallback` pre-REST-rewrite had 0/5 success rate in production due to Gemini free-tier routing hangs. WebSearch tool (available to all subagents natively) was used as effective escape hatch per P-015. After REST API rewrite (2026-04-11), `/ai-fallback` should be reliable for paid-tier Gemini calls. However, **WebSearch remains the recommended primary** for open-ended research tasks because: 1. WebSearch returns curated results instantly (no LLM processing overhead) 2. Sources are traceable URLs, not generated text 3. Zero API cost 4. No quota concerns **Recommended pattern for Investigator A/B**: 1. **WebSearch first** for open-ended discovery ("find post-2020 cases of X") 2. **`/ai-fallback` for verification** (cross-check specific facts, summarize findings, persona-simulate a skeptical reader) 3. **`/ai-fallback` for structured extraction** (parse court docs, extract citations) **[Tier B] Learning Outcome Validator — P-015 pre-simulation scope**: Persona simulation itself (three architect personas answering decision Qs and the 10Q post-test) **remains on Gemini 2.5 Pro as primary** via `/ai-fallback` chain `gemini-2.5-pro,gemini-2.5-flash-lite,codex` — WebSearch cannot perform reasoning and is NOT a substitute for persona role-play. To keep the evidence chain attributable, Learning Outcome Validator's WebSearch use is governed by **two explicit bright-line rules**: **(a) Pre-simulation phase (before any persona simulation begins)**: WebSearch is ALLOWED for pure fact retrieval where the output is a URL + a quote, NOT a reasoning chain. Examples: looking up a published IBC or NFPA code section number, retrieving a third-party standard reference, pulling a statistic from a public data source. This phase is explicitly authoring-time research the Validator does to construct the 5 independent decision questions and the 10 post-test questions against verifiable anchors. Example lookup: "what is the IBC 2021 section on double-egress means of egress clearance?" — WebSearch-first is faster and produces traceable URLs. **(b) During + post simulation phase (once persona simulation starts, and for all scoring / cross-checking / content-gap verification)**: ALL retrieval MUST go through the Model Invocation Map's designated persona-simulation chain (`gemini-2.5-pro,gemini-2.5-flash-lite,codex` via `/ai-fallback`). WebSearch is **FORBIDDEN** inside the persona evidence chain because it blurs attribution of which model produced which reasoning step. If a persona's response cites a code section and the Validator wants to verify the citation exists, that verification runs through the fallback chain (Gemini Pro verifying its own or a sibling model's citation) — NOT WebSearch. The "reasoning-shaped lookup" drift mode (re-framing a reasoning task as retrieval to offload it to WebSearch) is explicitly prohibited by rule (b). Rule (a) is the ONLY WebSearch-authorized window for Learning Outcome Validator. Rule (b) is a hard boundary. This rule set is additive — it does NOT edit the Model Invocation Map entry for Learning Outcome Validator at line ~1041, which remains `gemini-2.5-pro` primary. [Tier B — Auditor should confirm during Gate 3 deliverable review that the Validator's WebSearch usage timeline falls entirely within rule (a) and does not cross into rule (b) territory.] WebSearch is not a full replacement — Gemini via REST is better for synthesis, persona simulation, and structured reasoning. But for raw discovery, WebSearch is faster, cheaper, and more reliable. --- ## Brief Layering（Tier 1 / Tier 2 — 降低 Direction Seed context 量） **背景**：Direction Seed 的 9 個欄位如果每次派遣都把每個 agent 完整的 G/S/M（常常 500-1000 字）複製貼上到 briefing，跨 19 個 agent 累積下來 context 浪費驚人。對應 v2.html 的「分層記憶體」哲學：always-loaded 的資訊精簡，詳細內容按需載入。 **分層原則**：每個 agent 的 G/S/M 分兩層： **Tier 1（Direction Seed 必帶）**： - G 一句話 - S 一句話摘要 - 關鍵 M 閘門（2-3 條「絕對不能漏」的驗收） - Embedded skill commands（從 Skill Invocation Map 複製對應本角色的命令） - Embedded model commands（從 Model Invocation Map 複製對應本角色的命令） - Anti-patterns 標準清單（3 條） **Tier 2（reference on demand）**： - S 的詳細策略（為什麼用這條路徑、其他路徑為什麼被放棄） - 完整 M 驗收清單（所有邊角案例） - 歷史脈絡（HSW-003 退件的教訓、過去課程的失敗模式） - 同類 agent 的跨課程模式 **實作**：Commander 在 Direction Seed 第 3–4 欄位（O、G/S/M）帶 Tier 1 內容；Tier 2 用檔案路徑引用（例：`詳見 WTR-HSW-002-OGSM-v4.md#investigator-a-tier-2`），subagent 需要時才 Read。 **驗收**： - 每個 agent 的 OGSM 定義中，在 G/S/M 之後加一個 `**Tier 1 摘要**` 子段落，列出 Tier 1 的精簡版 - Tier 2 內容就是現有完整 G/S/M 段落（不需要另外寫） - Commander 的 dispatch briefing 平均長度從 2000+ 字降到 800 字（實際測量由 Performance Supervisor 記錄） **不做的**： - 不實際拆檔案——Tier 1/Tier 2 都在同一個 v4.md 檔案內，用標題 anchor 引用 - 不做自動化的 Tier 拆分——手動寫 Tier 1 摘要就夠，過度工程化會打敗目的 --- ## Principle 7 — Embedded Skill + Model Invocation Required **背景**：OGSM v1–v3 假設 agent 可以「自己知道」該呼叫哪些 skill 和哪個外部 LLM——基於 CLAUDE.md、memory、或對話脈絡。這個假設在 subprocess agent（透過 Agent tool 派遣的子 agent）上失敗。v4 Round 2 將範圍從 skill invocation 擴展到 model invocation：subprocess agent 看不到 parent 的 memory，不知道該用 Gemini Flash 還是 Gemini 2.5 Pro，也不知道正確的命令格式。 **問題**： - subprocess agent 的 context 是隔離的 - 它看不到 parent Claude 的 memory、CLAUDE.md、或對話歷史 - 如果 S 只寫「使用 /content-scout flag-candidate 加入候選」，subprocess 不知道這個 skill 的命令格式、參數、或觸發時機 - 同樣地，如果 S 只寫「用 Gemini 驗證數字」，subprocess 不知道用 Gemini Flash 還是 2.5 Pro、命令格式是什麼 - 結果：subprocess 要嘛沒呼叫 skill/LLM，要嘛自己即興寫流程 **原則**：agent 需要呼叫的任何 skill **和** 外部 LLM，**必須在 S 段落中包含完整的命令格式 + 呼叫時機 + 範例**。或者在 S 段落中明確引用集中的 **Skill Invocation Map** 和 **Model Invocation Map**（本文件的對應章節）。 **驗收方式**： - Commander 在派遣每個 subprocess agent 時，把對應 agent 的 S 段落 + Skill Invocation Map + Model Invocation Map 作為任務 briefing 的一部分傳入 - Performance Supervisor 在每個波次檢查：subprocess agent 有沒有真的呼叫該呼叫的 skill 和 LLM 模型？沒有的話是 briefing 失誤還是 agent 拒絕執行？ **為什麼是 Principle 7**：OGSM 原則 1–6 是關於「目標對齊」和「audience workflow」，Principle 7 是關於「agent 執行層」的可複製性——這是 multi-agent 系統特有的問題，傳統 OGSM 不需要處理。 --- ## Direction Seed（方向種子）— Commander Dispatch Template **背景**：概念來自 watersonusa.ai 的文章《如何組建有效率的 AI Agent 團隊》——「第零步：設定方向種子」。12 個 agent 同時工作，最怕每個人跑不同方向；所以 Commander 派遣 subagent 前必須先傳遞一段統一的任務描述，確保所有 agent 瞄準同一個目標。 Direction Seed 和 Principle 7 是同一件事的兩面： - **Direction Seed** 回答「subagent 該知道**什麼**才能和其他 agent 對齊」（persona、O、語氣、限制） - **Principle 7** 回答「subagent 該**如何**執行 skill 和 model 呼叫才不會漏掉」（embedded skill + model commands）兩者合併為 **Commander Dispatch Template**——每次透過 Agent tool 派遣任何 subagent 時，briefing 必須包含以下 9 個欄位。缺一不可。 ### Dispatch Template 必要欄位 1. **Course ID + Role Name**（例：`HSW-002 / Investigator A`） 2. **Target Audience Persona**（不是抽象「建築師」，是完整 persona：12 年資歷的 Project Architect，負責 drawing set + Division 08 + spec writer coordination，典型 day-to-day 工作流程簡述） 3. **O（Objective）的完整引用**——不要縮寫，讓 subagent 看到情感目標和實用目標全文 4. **This Agent's G/S/M**——從本 OGSM 文件複製貼上該 agent 的完整 G/S/M 段落 5. **Embedded Skill + Model Invocations**——從 Skill Invocation Map **和 Model Invocation Map** 複製本角色該呼叫的所有 skill + LLM 命令，含完整參數格式。 **Plus (mandatory — knowledge query commands from ogsm-framework skill):** Every Iteration Team subagent brief MUST include these query commands for knowledge transfer across runs: ```bash # Query patterns library for failure-relevant patterns bash ~/.claude/skills/ogsm-framework/scripts/get_patterns_for_failure.sh # Query gotchas library for context-relevant pitfalls bash ~/.claude/skills/ogsm-framework/scripts/get_gotchas_for_context.sh # Query skill invocation map for role-specific skill commands bash ~/.claude/skills/ogsm-framework/scripts/get_skills_for_role.sh ``` These queries satisfy Principle 7 (embedded skill invocation required) for knowledge transfer — subprocess agents can't see parent memory or the references/ directory, so the query commands MUST be in the briefing. **When to run**: - Factory bootstrap (Commander reads scaling-playbook.md fully via Read tool) - On any FAIL (Iterator runs both get_patterns_for_failure + get_gotchas_for_context) - Before proposing any diff (Iterator verifies no known gotcha applies) 6. **Hard Constraints**（例：促銷比例 < 20%、citation 必須兩個來源、不得虛構案例） 7. **Tone + Voice Requirements**（例：architect-peer，不是 marketing；直接、不迂迴；不預設讀者是初學者） 8. **Deliverable Format + File Path**（例：`WTR-HSW-002/investigator-a.md`，包含段落結構標準） 9. **Anti-patterns to avoid**（這個 agent **不該做的事**，反面清單） - 至少 3 條，明確寫出「不要做 X，應該做 Y」 - **來源規則（hard rule）**：所有條目必須從目標 agent 的 OGSM 「Anti-patterns 標準清單」子段落 **逐字複製貼上**（verbatim），不得改寫或摘要。課程特定情境可在複製之後追加額外條目，但不能取代原條目。 - **驗收**：Performance Supervisor 抽檢時，用字串比對確認至少 3 條與 source agent 的 Anti-patterns 清單完全一致；若被改寫（即使意思相同），視為 briefing 失誤。 - 為什麼必要：人類 briefing 最容易漏掉「顯而易見的反面」；強迫逐字複製，parent 才會被迫把子段落內的每一個反例都看過一次，不是「我理解意思所以意譯一下」。 ### 驗收方式 Commander 在派遣任一 subagent 時，briefing 的開頭必須列出這 9 個欄位的檢查清單。Performance Supervisor 在每個波次抽檢 ≥ 1 次 dispatch briefing，確認 9 個欄位都齊全——任何欄位缺失視為 briefing 失誤，該 subagent 的交付物不納入 gate review，必須重新派遣。 ### 為什麼不能省略 - 省略第 2 欄（persona）→ agent 寫出 generic 內容，失去 O 的情感目標 - 省略第 3 欄（O 全文）→ agent 看到自己的 G 但不知道 G 為什麼存在，容易偏離方向 - 省略第 5 欄（embedded skills）→ Principle 7 失效，skill 呼叫變成碰運氣 - 省略第 6 欄（constraints）→ 交付物 downstream 才發現違反硬門檻，浪費下一個 agent 的時間 - 省略第 7 欄（tone）→ 不同 agent 寫出來的語氣不一致，Copy Editor 要全部重寫 - 省略第 9 欄（anti-patterns）→ subagent 產出「技術正確但方向偏離」的交付物——因為 parent 覺得顯而易見的反面沒有明說，subagent 無從得知 ### Pilot Dispatch（先派一個試水溫） 9 個欄位再完整，第一次派遣仍可能漏掉某個盲點。Commander 的應對：**在每個波次開始時，先派一個 subagent**（pilot），讀交付物，確認符合 O 的精神後，才並行派遣該波次的其他 subagents。 **為什麼有效**：漏掉的盲點通常是系統性的（parent 的假設偏差），Pilot 的交付物一出來就看得見。如果 Pilot 正確，其他並行 subagents 使用同一份 briefing 也會正確。如果 Pilot 偏離，Commander 更新 briefing 再派，省下 4 個 subagents 走錯路的時間。 **實作規則**： - Wave 1：先派 Investigator A（pilot），通過後並行派 Investigator B + Writer A + Writer B + Engagement Designer - Wave 2：先派 Content Director（pilot），通過後並行派其他 internal reviewers + external reviewers - Wave 3：先派 Engineer HTML（pilot），通過後繼續整合 + 部署 - Pilot 交付物的 sanity check 由 Commander 親自做，不授權給 subagent——這是跨 subagent 的判斷，只有 parent 能做 ### 與 Principle 7 的整合 Principle 7 原本只說「skill 命令必須 embedded 在 S 中」。v4 Round 2 加入 Model Invocation Map 後，範圍擴展為「skill + model invocations 都必須 embedded」。整個機制變成： 1. **文件層**：每個 agent 的 S 段落包含該呼叫的 skill 命令和 LLM 命令（Principle 7） 2. **Map 層**：Skill Invocation Map + Model Invocation Map 作為雙中央事實來源，避免 S 段落之間重複維護 3. **派遣層**：Commander 在 Dispatch Template 的第 5 欄（Embedded Skill + Model Invocations），**從兩張 Map 各自複製貼上**該角色該呼叫的 skill 命令和 LLM 命令到 briefing 中這個三層設計確保 skill + model 呼叫資訊在 (1) agent 定義、(2) 中央 Map、(3) 實際派遣 briefing 三個地方都存在——任何一層遺漏，其他兩層都能補救。 --- ### 🗑️ v3.1 的其他「新 agent」已在 v4 降級 v3.1 曾把以下四項作為獨立 agent 列出（#20–#23），v4 已重分類： - **Post-Test Designer** → `/post-test-designer` skill（Phase 2 建立），Wave 3 由 Engineer (HTML) 呼叫。題目設計的 4/4/2 分配、distractor 來源追蹤、真實後果解析等規則保留在 skill 定義裡。 - **Competitor Coverage Auditor** → 併入 Compliance Reviewer 的 M 擴充（見下方 Compliance Reviewer section 的 4 條新增驗收）。三大廠提及 > 0、中立性評分、類別覆蓋、促銷比例 < 20% 全部由 Compliance Reviewer 負責。 - **Chinese Translator** → `/aia-rewrite --bilingual` skill flag（Phase 2 實作）。術語在地化、Writer B 資源段落重寫、台灣 Project Architect persona 驗證等規則保留在 skill 定義裡。 - **SEO/AEO Engineer** → 流程倒轉原則 deferred。AI 先寫 → 人類補缺的流程不在 v4 範圍內，留給未來 phase。這四項降級的原因詳見本章節開頭的「核心洞察」段落——**可重複流程屬於 skill，不屬於 agent**。 --- ## Alignment Verification Matrix（v4） | Agent | Primary G Output | O Emotional (喜歡) | O Practical (能判斷) | O Risk if G Fails | |-------|-----------------|-------------------|---------------------|-------------------| | Commander | 建築師視角貫穿 19 個 agent | 直接影響 | 直接影響 | 整體失敗 | | Investigator A | 建築師認識的場景中的真實案例（DHI + 法院雙來源） | 建立情感連結 | 提供決策依據 | 建築師感受不到課程和自己有關 | | Investigator B | 可引用的條號（ICC+NFPA 交叉驗證）+ 可捍衛的費用數字 | — | 核心 | 建築師無法在壓力下引用 | | Writer A | 從建築師決策點出發的概念框架 | 建立好奇心 | 建立理解基礎 | 後半段決策練習失去土壤 | | Writer B | 情境辨識能力 + SpecLink/SPC Alliance/CSC/CSI 資源導航 | — | **新：實現路徑** | 建築師知道 Waterson 但不知道怎麼落實 | | Engagement Designer | 3 個關鍵決策節點的線上 self-paced 互動（distractor 有案例來源） | 提升參與感 | 建立決策本能 | 被動閱讀，無能力轉移 | | Content Director | 建築師問題結構的敘事流（含 Writer B 資源介紹的邏輯審查） | 直接影響 | 間接影響 | 建築師迷失在技術細節中 | | Compliance Reviewer | AIA HSW 認證通過（含 Writer B 中立性合規審查） | 必要條件 | 必要條件 | 課程永遠到不了建築師 | | Copy Editor | 一次讀懂的語言 + 可用的術語表（含 spec 資源名稱一致性） | 降低摩擦 | 降低認知負擔 | 重讀造成學習中斷 | | Fact Checker | 每個數字都可被建築師引用（含 spec 資源組織描述驗證） | — | 直接影響 | 建築師在現場被糾正，信任崩潰 | | Source Reviewer | 建築師可以直接使用的參考清單（含 spec 資源 URL 可及性） | — | 擴展工具包 | 建築師無法追蹤和引用 | | Engineer | 流暢、自包含的 HTML 體驗（含 spec 資源連結實作） | 直接影響 | 傳遞機制 | 技術障礙打斷學習 | | **Project Architect Advisor** | Project Architect 第一印象（含 spec 資源介紹的中立性感受） | **唯一直接測量** | 間接測量 | O 情感目標未被驗證 | | **Sales Rep Advisor** | 市場可行性（含 spec 資源「廠商氣味」測試） | 真實場景驗證 | 語言易懂性 | 課程在現實市場中無法流通 | | **Fresh Eyes Reviewer** | 論點獨立性（含「迷思破除」段落的客觀性審查） | — | 確保論點站得住腳 | 建築師在特殊情境下做出錯誤決策 | | Performance Supervisor | 每個波次的建築師視角評分（含 M/S 對準狀態追蹤） | 早期預警 | 早期預警 | O 偏移在最後才被發現 | | Quality Auditor | 交接品質（含 S 承諾資源是否實際出現在交付物的閘門） | 保護修改時間 | 保護修改時間 | 技術問題壓縮建築師體驗改善空間 | | Learning Outcome Validator | 直接測量 O（含 spec 資源導航的學習驗證） | — | **唯一直接測量** | 部署是希望，不是證據 | | **🗃️ Candidate Collector** | ≥2 blog 題目候選 + 完整原始資料寫入 `.content-scout-queue.md` | — | — (間接：不影響本課程 O) | 課程副產品被浪費，Phase 3 Blog Writer Fleet 無從接手 | --- ## Wave Gate Conditions（v4） ### Gate 0 → Wave 1 開始 - [ ] OGSM v4 文件確認 - [ ] 所有 Wave 1 角色收到包含建築師 persona 描述的任務簡報（不只是格式要求） - [ ] Writer B 收到包含 SpecLink / SPC Alliance / CSC/CSI 的介紹材料和「第三方中立角度」的寫作指示 ### Gate 1 → Wave 2 開始 - [ ] 所有 5 個 Wave 1 交付物完成 - [ ] Performance Supervisor `monitor-002-wave1.md` 無建築師視角評分 1 的角色（或已有處理計劃） - [ ] Quality Auditor `audit-002-wave1.md` 確認所有交付物交接就緒，**且 S 承諾的資源實際出現在交付物中** - [ ] Commander 確認每個交付物有通過「建築師視角存在嗎？」的 gate 問題 - [ ] **新增**：Writer B 交付物包含五個情境的視覺錨點描述，且三個 spec 資源都有獨立介紹段落 - [ ] **v4**：Candidate Collector 收到 Wave 1 交付物初稿並啟動掃描 ### Gate 2 → Wave 3 開始 - [ ] 所有 5 個內部 Wave 2 交付物完成 - [ ] **所有 3 個外部 reviewer 報告完成**（Project Architect Advisor + Sales Rep Advisor + Fresh Eyes） - [ ] Commander 對所有外部 reviewer 的負面回饋有明確的處理記錄 - [ ] Performance Supervisor `monitor-002-wave2.md` 無阻礙問題 - [ ] Learning Outcome Validator `validate-002-learning.md` 確認所有 3 個 persona 通過 4/5 題以上 - [ ] **新增**：Learning Outcome Validator 確認至少 2 個 persona 能說出具體 spec 資源使用路徑 - [ ] **新增**：Sales Rep Advisor 確認 Writer B 的 spec 資源介紹「聞起來像資訊提供」（不是「輕微廠商引導」或「明顯廣告」） - [ ] **v4**：Compliance Reviewer 通過 Competitor Coverage M（三大廠中立提及 > 0 + 促銷比例 < 20% + 中立性評分平均 ≥ 4） ### Gate 3 → HTML 部署前 - [ ] Engineer HTML 通過 W3C 驗證 - [ ] Engineer 確認 SpecLink / SPC Alliance / CSC / CSI 連結正確實作且可訪問 - [ ] Engineer (HTML) 已呼叫 `/post-test-designer` skill（Phase 2 建立）生成 10 題 post-test，題目分配 4/4/2，80% 通過率校準 ≥ 80% persona 試做通過 - [ ] `.content-scout-queue.md` 含 ≥ 2 個候選，每個候選 6 個欄位完整（id / source / title / type / keywords / research_data / why_worth_writing / timestamp） - [ ] Commander 最終審查完成 - [ ] `/security-check` 通過 - [ ] `git push` 由 Commander 授權 ### Gate 4 → Post-HTML 發布驗證 - [ ] Vercel 部署成功，課程 URL 可訪問（HTTP 200） - [ ] 所有 SpecLink / SPC Alliance / CSC / CSI 連結在線上版本中可正常點擊（target="_blank" 確認） --- ## v3 和 v2 的核心差異 | 面向 | v2 | v3 | |------|----|----| | Writer B 的定位 | 教建築師讀 spec 語言、做決策 | 教建築師情境辨識 + 知道獨立 spec 資源路徑 | | Writer B 的 G | 建築師有一個決策工具 | 建築師帶走情境辨識能力 + 實現路徑 | | Writer B 的 S | 決策樹、場景練習 | 五個情境 + SpecLink/SPC Alliance/CSC/CSI（中立角度） | | Writer B 的 M | 交付 24 張投影片 + 決策工具 | 驗證三個 spec 資源的實際介紹品質、中立性自評、迷思破除段落 | | M 的設計原則 | 任務清單（交付 X 個、記錄 Y 個） | 對準 S 的資源使用（S 承諾的資源是否真的被使用和驗證） | | Investigator A 的 M | 提供 5 個案例 | 驗證 DHI 資料庫實際查詢 + 每個案例的雙來源確認 | | Investigator B 的 M | 引用 4 個來源 | 驗證 ICC Digital Codes 實際查詢 + NFPA 交叉驗證 + 修正記錄 | | Sales Rep Advisor 的任務 | 評估課程是否適合拜訪建築師 | 加入 spec 資源「廠商氣味」測試（第 5 個問題） | | Learning Outcome Validator | 確認決策工具可獨立使用 | 加入 spec 資源導航的學習驗證（persona 能說出資源名稱） | | Gate 條件 | 交付物完成 + 建築師視角存在 | 加入 spec 資源介紹的中立性驗證 + 資源連結可訪問確認 | --- ## v3.1 擴充（本次新增，2026-04-10）：18 → 23 agents 五個新 agent 覆蓋課程生命週期的五個缺口。原版 v3 只覆蓋「課程內部生產鏈」，v3.1 擴充覆蓋「發現 → 翻譯 → 測驗 → 審查 → 延伸」的完整生命週期。 | 新增 Agent | Wave 位置 | 覆蓋的缺口 | 關鍵驗收 | |-----------|----------|-----------|---------| | 🔭 **Content Scout** | Side Channel（Wave 1-2 並行） | 課程影響力放大（blog 內容管線） | ≥ 2 候選寫入 content-plan.md，含 AEO 預覽 + Gemini Flash SEO | | 📖 **Post-Test Designer** | Wave 3（HTML Dev 之前） | CEU 學分關卡 + 最後學習機會 | 10 題 4/4/2 分配 + distractor 有案例來源 + persona ≥ 80% | | 🏭 **Competitor Coverage Auditor** | Wave 2 Internal | AIA 實質合規（HSW-003 退件預防） | 三大廠中立提及 > 0 + 促銷 < 20% + 類別覆蓋完整 | | 🌏 **Chinese Translator** | Wave 3（HTML Dev 之後） | 台灣市場在地化 | `/aia/zh/` 完整版 + 30+ 詞術語表 + Writer B 台灣資源重寫 | | 📈 **SEO/AEO Engineer** | Wave 3（HTML Dev 之後） | 課程發現層（Google + AI 引擎） | Structured data + Rich Results 通過 + llms-full.txt 更新 | **新增 Gate 4**：Post-HTML 發布驗證——確保 Chinese Translator 和 SEO/AEO Engineer 的交付物在部署前完成並通過。 ### v3.1 關鍵設計洞見（新增原則） > **OGSM 不只看課程品質，還要看發現 → 留下影響的完整生命週期。** v3 把課程內部品質做到極致——Project Architect 讀到課程那一刻會覺得這是為他寫的。但 v3 漏掉了一個問題：**他怎麼找到這門課程？讀完後，課程怎麼變成持續的影響力？非英語市場怎麼落地？** v3.1 的五個新 agent 回答的是「課程如何從一次性學習事件變成持續的影響力系統」： - **發現層**（SEO/AEO Engineer）——建築師在搜尋旅程中找到課程 - **在地化層**（Chinese Translator）——台灣 Project Architect 讀到同樣品質的課程 - **驗證層**（Post-Test Designer）——CEU 學分讓建築師能拿走這份學習 - **實質合規層**（Competitor Coverage Auditor）——課程通過 AIA 的實質審查，不只形式審查 - **放大層**（Content Scout）——每次課程都產出 blog 內容，擴大觸及面 OGSM 的設計原則是「每個 agent 都要串回 O」。v3.1 的擴充讓這個原則延伸到課程生命週期的每一個環節——不只課程內部，還包括課程的發現、傳播、留存。 --- *Document maintained by A君. Update CHANGELOG.md after each wave completion.* *v2 preserved at WTR-HSW-002-OGSM-v2.md for comparison.* *v1 preserved at WTR-HSW-002-OGSM.md for reference.* --- ## Known Issues / To Monitor These are architectural concerns raised by Gemini Flash during round 2 review that we deliberately chose NOT to fix preemptively. They are recorded here as monitoring items to observe during the first real run (HSW-006). If any of these actually manifest as a problem, then we fix; otherwise we avoid premature abstraction. **Principle**: Don't fix what you haven't seen break yet. Over-engineering a fix for a hypothetical failure mode costs more than the failure itself would. ### Issue #1 — Pilot Dispatch only catches pilot-specific blind spots **Concern**: Pilot Dispatch (每波次先派一個 subagent 驗證 briefing) catches blind spots in the pilot agent's interpretation, but other parallel subagents may have their own blind spots that the pilot doesn't share. A correct pilot ≠ all agents will be correct. **Possible fixes (do NOT implement now)**: - **Shadow Pilot**: compare pilot output against a simulated parallel execution of a different subagent with the same briefing — look for divergent interpretations - **Rotating Pilot**: use a different agent as pilot each wave to expose different blind spots over time **What to watch during HSW-006**: after each wave's Pilot passes, does the rest of the wave produce outputs that diverge from the pilot's interpretation in ways that weren't caught? If yes ≥ 2 times in HSW-006, implement Shadow Pilot. --- ### Issue #2 — Tier 1 摘要 150-word budget may bloat **Concern**: Complex agents like Compliance Reviewer (now has 4 new Competitor Coverage M items) may not fit in 150 words without losing critical information. Pressure to add "just one more critical gate" will gradually bloat Tier 1 back toward the original 2000+ word size, defeating the purpose. **Possible fixes (do NOT implement now)**: - **Hard word-count enforcement**: automated check that rejects Tier 1 briefs exceeding the limit — forcing move to Tier 2 - **Objective Tier 1 criteria**: restrict Tier 1 to "immediate action triggers + decision points only" — no background, no history, no rationale **What to watch during HSW-006**: - Measure actual Tier 1 word count for all 19 agents at dispatch time - If any agent's Tier 1 exceeds 250 words, flag it (50% bloat) - If ≥ 3 agents bloat, implement hard enforcement --- ### Issue #3 — Candidate Collector may still drift into recommendation **Concern**: Even with the Q1 scope split (situational judgment only, collection is a skill), Candidate Collector's cross-wave type balance judgment is itself a form of recommendation ("we need more case-studies"). The line between "observe imbalance" and "recommend action" is thin. **Possible fixes (do NOT implement now)**: - Hard-limit collector_notes to descriptive language only (禁用詞 lint already planned for /content-scout skill) - Remove "alert" generation and keep only "Type Distribution counts" — let downstream agents interpret **What to watch during HSW-006**: do the Type Distribution alerts Candidate Collector produces actually change what Investigator A/B do next wave? If alerts are ignored OR cause over-correction, rethink. --- ### Issue #4 — Post-test question quality may pass technical compliance while missing learning assessment **Concern**: `/post-test-designer` skill enforces 4/4/2 distribution + distractor-from-real-cases rule, which catches procedural AIA failures. But technically-compliant questions can still have: - Ambiguous wording that rewards test-taking skill over content understanding - Misinterpreted case details (skill reads the case, but subtly misrepresents it) - Distractors that are real mistakes but wrong FOR THIS QUESTION - Answer explanations where "real-world consequence" exists but doesn't match this specific question's content The biggest risk: CEU credit is granted for factually-incorrect or misleading content, and architects act on flawed information in real projects. **Possible fixes (do NOT implement now)**: - Human expert review layer for post-tests before first publication - Second-pass reviewer (different Gemini persona or Claude subagent) specifically auditing question clarity + case fidelity - Post-course architect feedback channel: log which questions architects complain about, recalibrate **What to watch during HSW-006**: - After `/post-test-designer` produces the 10 questions, manually read all 10 before the Learning Outcome Validator persona run - Flag any question where: the "why wrong" explanation for a distractor doesn't match that distractor's claim, OR the case description has been compressed in a way that changes meaning - If 2 or more questions have fidelity issues in HSW-006, add human review layer to the skill workflow --- ### Issue #5 — Bilingual validation via Gemini persona cannot catch cultural blind spots **Concern**: `/aia-rewrite --bilingual` validates the Traditional Chinese version by running Gemini 2.5 Pro as a simulated 12-year Taiwan Project Architect persona. This catches technical correctness and obvious translation errors. But Gemini's simulation will miss: - Unwritten Taiwan architecture industry norms (how things are actually done, not how they're documented) - Cultural sensitivities specific to Taiwan professional context - Localized terminology nuances that feel wrong to natives but look correct in reference materials - Regional practice variations (北部 vs 南部、公部門 vs 民間案) that real architects instinctively recognize The biggest risk: a technically accurate but culturally tone-deaf Chinese version is published, and real Taiwan architects dismiss it as "machine-translated" — damaging Waterson's credibility in the Taiwan market that the bilingual effort was meant to build. **Possible fixes (do NOT implement now)**: - Recruit 2-3 real Taiwan Project Architects (from Waterson's existing Taiwan network) as pilot feedback reviewers for the first bilingual course - Add a human review gate after Gemini persona validation: specifically ask "does this feel like it was written for you, or translated for you?" - Build a feedback loop: log which sections Taiwan reviewers flag, feed corrections back into `/aia-rewrite --bilingual` glossary and resource research **What to watch during HSW-006**: - After the first bilingual HSW-006 Chinese version is produced, request 2-3 real Taiwan architect reviews BEFORE publication - Compare real architect feedback vs Gemini persona simulation results — if they diverge significantly, Gemini simulation alone is insufficient - If divergence is 30% or more of flagged issues, add human review as mandatory step in `/aia-rewrite --bilingual` workflow --- ### Issue #6 — /ai-fallback per-model timeout insufficient in real production **Concern**: INT-001 fix (per-model 60s timeout) was applied to `call_with_fallback.sh`, but smoke test on real Investigator A showed 3 of 4 production invocations STILL had all Gemini models timeout. Gemini 2.5 Flash + 60s = deterministic failure; Gemini 2.5 Flash-Lite + 150s has ~25% success rate. **Evidence**: smoke test wall clock 35.5 min, 6 fallback events across 4 /ai-fallback invocations, all Gemini models exhausted 3 of 4 times. **Proposed fix (defer to v6 or pre-scale)**: - Default fallback chain change: remove `gemini-2.5-flash` primary, start from `gemini-2.5-flash-lite` - Default `OGSM_MODEL_TIMEOUT` raised from 60s to 120s - Update Model Invocation Map default commands accordingly **When to revisit**: BEFORE Batch 1 scale-up launch --- ### Issue #7 — Codex trust-check silent terminal failure in fallback chain **Concern**: When Gemini chain exhausts, `call_with_fallback.sh` falls to Codex. Smoke test discovered Codex refuses with "Not inside a trusted directory" error — this is a silent terminal failure at the END of the fallback chain. **Evidence**: smoke test encountered this during heavy research phase. Codex never returned useful output; chain terminated with no LLM result. **Proposed fixes (options)**: - (a) Pre-approve codex trust for waterson-ai-growth-system directory - (b) Add WebSearch as final fallback layer in `call_with_fallback.sh` - (c) Document in runbook: operator must manually pivot to WebSearch when this error occurs **When to revisit**: BEFORE Batch 1 scale-up launch (combined with Issue #6) --- ### Issue #8 — Quality Auditor reverse-index coverage is observation-only until G-022 lands **Concern (G-022, cross-cutting, Quality Auditor only)**: The reverse-index check added to Quality Auditor's M (see Quality Auditor section) requires upstream reviewers (Fact Checker, Source Reviewer) to emit per-claim coverage tables so QA can reconcile. In HSW-002 Wave 2 polish simulation, Source Reviewer's aggregate output ("No single-source citations. PASS.") was ambiguous on per-claim coverage and forced QA to mark the row as "Partial" with a manual note. The new M bullet requires per-claim output; the new Source Reviewer M bullet requires producing it. Until HSW-006 runs with both changes live, we do not yet know whether reviewers will consistently emit the per-claim table or whether QA's reverse-index mechanics work at scale. **Scope note (G-022 cross-cutting)**: This is a **Quality Auditor-only** cross-cutting. Fact Checker, Source Reviewer, and Performance Supervisor are NOT asked to run their own reverse-index checks — that would duplicate work and blur lane boundaries (see Quality Auditor anti-pattern 4: scope-creep forbidden). G-022 exists to make sure QA's reverse-index has data to chew on; it does not extend to other audit agents. **Evidence**: Polish-wave2/polish-quality-auditor simulation (robot-2-deliverable.md §1c) caught 2 unaudited claims (5 lbf ADA opening-force, $42K remediation cost) that Fact Checker silently skipped, using reverse-index. Without the new Source Reviewer per-claim output rule, SR's row was only markable as "Partial" because aggregate "PASS" did not expose which claims were checked. **What to watch during HSW-006**: - Does Fact Checker's audit table cover 100% of testable claims in each Wave 1 deliverable? Record the coverage ratio for every wave. - Does Source Reviewer's per-claim coverage index (new M bullet) actually get produced? If Source Reviewer reverts to aggregate output, QA will be unable to reconcile — flag and fix immediately. - How often does QA's reverse-index catch silently-skipped claims? Target: ≤ 1 per wave. > 1 per wave for ≥ 2 waves → upstream reviewer brief is under-specified, escalate. - Does the `skipped-or-hallucinated (indistinguishable without execution log)` fault-attribution value actually get used? If yes, Direction Seed needs tightening (make execution log mandatory at briefing time). **Possible fixes (do NOT implement now)**: - **Structured audit schema**: move Fact Checker + Source Reviewer outputs from free-form markdown to a required JSON/YAML schema with `claim_id` keys. Makes reverse-index mechanical rather than manual. - **QA helper script**: small CLI that ingests a Wave 1 deliverable and the corresponding Wave 2 audit file, emits the reverse-index table automatically. Removes human error from the reconciliation step. - **Per-claim coverage metric in Performance Supervisor's architect-perspective score**: PS already aggregates scoring; adding coverage ratio is a small extension if reviewers already emit per-claim tables. **When to revisit**: after HSW-006 Wave 2 runs once end-to-end with the new Quality Auditor section live. If coverage < 100% in any wave, or if Source Reviewer per-claim output is skipped ≥ 1 time, implement the structured schema fix before HSW-007. --- ## Monitoring Protocol Performance Supervisor is responsible for tracking these eight issues during HSW-006. At the end of HSW-006 production, Commander produces a `known-issues-observations.md` document summarizing: 1. Did Issue #1 / #2 / #3 / #4 / #5 / #6 / #7 / #8 actually manifest? Evidence? 2. If yes, what was the impact? 3. Recommendation: fix now, fix later, or close as "theoretical concern only" --- ## Deferred Improvements These are planned architectural improvements that have been **deliberately deferred** to a future phase. Unlike Known Issues (which are runtime risks to monitor during HSW-006), Deferred Improvements are **optimizations** — the system works without them, but would run more efficiently with them. They are recorded here so they're not forgotten, and so we can evaluate cost vs benefit before HSW-007 or a later course. **Principle**: Ship what works, improve what's slow. Don't retrofit existing skills until you have evidence (not intuition) that the optimization is worth the regression risk. --- ### Improvement #1 — Retrofit `/content-scout` with scripts/ **Status**: DONE — 2026-04-11 **Evidence**: - `~/.claude/skills/content-scout/scripts/validate_candidate.py` — 9-field schema + type validation - `~/.claude/skills/content-scout/scripts/banned_word_lint.py` — prescriptive language lint (exit 0/2) - `~/.claude/skills/content-scout/scripts/update_type_distribution.py` — recount + imbalance alerts - `~/.claude/skills/content-scout/scripts/append_candidate.py` — full pipeline (validate + lint + append + distribute) - `~/.claude/skills/content-scout/SKILL.md` — section 9 updated to reference scripts with usage examples - All 4 scripts tested: valid/invalid inputs verified, exit codes confirmed **Proposed changes** (all implemented): - Add `scripts/validate_candidate.py` — 10-field schema validation (currently done procedurally in SKILL.md) - Add `scripts/banned_word_lint.py` — prescriptive language detection for `collector_notes` field (currently done procedurally) - Add `scripts/update_type_distribution.py` — Type Distribution counter update (currently done procedurally) - Add `scripts/append_candidate.py` — queue file append with auto-id + timestamp (currently done procedurally) --- ### Improvement #2 — Retrofit `/post-test-designer` with scripts/ **Status**: DONE — 2026-04-11 **Evidence**: - `~/.claude/skills/post-test-designer/scripts/validate_distribution.py` — enforces 4/4/2 (exit 0/2) - `~/.claude/skills/post-test-designer/scripts/generate_question_template.py` — blank template per type (recall/application/judgment) - `~/.claude/skills/post-test-designer/scripts/parse_source_references.py` — Zone A/B/C extraction + case ID detection, JSON output - `~/.claude/skills/post-test-designer/scripts/format_answer_key.py` — answer key table from JSON - `~/.claude/skills/post-test-designer/SKILL.md` — Scripts Reference section added with usage examples - All 4 scripts tested: valid/invalid inputs verified against real WTR-HSW-002-full-course.md **Proposed changes** (all implemented): - Add `scripts/validate_distribution.py` — 4/4/2 question distribution enforcement (currently done procedurally) - Add `scripts/generate_question_template.py` — deterministic markdown template for each question type (currently done procedurally via SKILL.md example) - Add `scripts/parse_source_references.py` — extract Writer A / Writer B / Investigator A content zones from the course file (currently done by LLM reading the file) - Add `scripts/format_answer_key.py` — deterministic answer key table generation --- ### Improvement #3 — `/new-course` generate_ogsm.py template parameterization may be too rigid **Status**: DONE — 2026-04-11 **Evidence**: - `~/.claude/skills/new-course/scripts/generate_ogsm.py` — added `--scope`, `--audience`, `--section-count`, `--base-template`, `--flexible` flags - Backward compatible: existing calls without new flags produce identical output - `~/.claude/skills/new-course/SKILL.md` — Scripts Reference updated with flag documentation table - Tested: `--scope hsw-lu --audience spec-writer --section-count 10 --flexible` produces correct output with Template Variables block and persona substitution **Flags added**: - `--scope hsw|lu|hsw-lu` — updates credit type references + injects scope note - `--audience project-architect|design-architect|principal|spec-writer` — replaces persona labels throughout - `--section-count 5|7|10|12` — injects section-count advisory with parallel-agent recommendation for 12-section courses - `--base-template v4|v5|custom-path` — auto-resolves template from known locations - `--flexible` — prepends `## Template Variables` block documenting all parameterization choices **Original concern** (addressed): `scripts/generate_ogsm.py` parameterizes the v4 template by string substitution (course code, title, slug). This works for courses similar to HSW-002 but was too rigid for courses differing in scope, audience, or section count. --- ### Improvement #4 — `/research-topic` Gemini CLI dependency (noted only) **Status**: Flagged by Gemini Flash Phase 2e review. **Not planning to fix** — recording for awareness only. **Concern**: `scripts/fetch_sources.py` depends on Gemini CLI (`gemini -m gemini-2.5-flash`) for source grounding. This creates 3 dependencies: - Gemini CLI must be installed locally - Network must be reachable to Google APIs - Gemini grounding quality is variable **Why NOT fixing**: - Memory rule `feedback_multi_ai` explicitly says Gemini Flash is our search grounding choice (free 1000/day) - Alternatives (direct API, traditional keyword search) either cost more or lose grounding capability - Dependency is acceptable for our use case - If Gemini becomes unreliable, we already have a fallback in `feedback_multi_ai`: add a second Gemini-capable account or switch to Codex **Just be aware** this dependency exists; no preemptive abstraction needed. --- ## Relationship to Known Issues | Aspect | Known Issues | Deferred Improvements | |--------|-------------|---------------------| | **Nature** | Runtime risks (things that might go wrong) | Optimizations (things that could be more efficient) | | **Trigger for action** | Issue manifests in production | Evidence of cost/benefit after measurement | | **Urgency** | High if manifested | Low unless proven | | **Risk of fixing** | Low (fixes are defensive) | Medium (retrofit can cause regression) | | **Decision owner** | Commander during production | Architecture review between courses | Both sections feed into v5 planning after HSW-006 retrospective. This document feeds into the v5 planning decision.