# Iteration Team OGSM — Agent Optimization Factory

**Document Type:** OGSM Work Plan (v1 — Phase B, Agent Optimization Factory)
**Version:** 1.0
**Created:** 2026-04-10
**Owner:** A君 (Commander Agent)
**Context:** Phase B of Agent Optimization Factory. Phase A built 4 validation scripts + `/ai-fallback` skill.
**Supersedes:** N/A (first version)

---

## O — Objective

> Every AIA writing agent (research, writing, review, coordination — any archetype) that passes through this factory emerges reliably producing OGSM-compliant, script-efficient, fallback-resilient output. The Commander trusts that a factory-polished agent will not silently fail in production; the agent itself has BDD-verified behavior that can be regression-tested before any future change.

**The two outcomes that together define success:**

1. **Emotional**: The Commander dispatches a factory-polished agent into production without hesitation — not because they checked it manually, but because the BDD suite and validation scripts say it is ready. The agent itself has a living test suite that makes future changes safe rather than scary.
2. **Practical**: Any agent that exits the factory passes 4 quality criteria simultaneously: OGSM compliance, script-first deterministic logic, 3-layer skill architecture, and multi-model fallback. These criteria are machine-verifiable on every cycle.

If either outcome is missing, O is not achieved.

**Primary audience (dual):**
- **The AIA writing agents being optimized** — they emerge from the factory as reliable, BDD-tested, regression-protected agents.
- **The Commander / user** — they can deploy factory-polished agents into any course production without manual validation overhead.

---

## Team Structure

| Team | Target Mini-Agent | Archetype | Robot 1: Spec Verifier | Robot 2: Dispatch Harness | Robot 3: Iterator |
|------|------------------|-----------|----------------------|--------------------------|-------------------|
| **Team 1** | Mini-Research Agent | Research (Investigator) | Team-1 Spec Verifier | Team-1 Dispatch Harness | Team-1 Iterator |
| **Team 2** | Mini-Writer Agent | Writing (Writer) | Team-2 Spec Verifier | Team-2 Dispatch Harness | Team-2 Iterator |
| **Team 3** | Mini-Reviewer Agent | Review (Reviewer) | Team-3 Spec Verifier | Team-3 Dispatch Harness | Team-3 Iterator |

**9 robots total. 3 teams. All teams run in parallel.**

Each team is an independent instance of the same 3-robot template. Teams share:
- The 5 Phase A validation scripts
- The `/ai-fallback` skill
- The BDD scenario template (Given/When/Then format)
- The 4 quality criteria (evaluated identically per-agent)

Teams do NOT share:
- BDD scenario content (different per target mini-agent)
- Test inputs (different per target mini-agent)
- Workspace directories (`team-N-workspace/` isolated)
- Metrics (each team tracks its own plateau independently)

---

## Individual OGSM Definitions

---

### 🔬 Robot 1: Spec Verifier

**G（Integrative phase goal — connects to O）**
> Any AIA writing agent that the architect-team or Commander deploys through the factory has a living BDD suite capturing its expected behavior — not as documentation, but as executable tests that run every cycle. Regressions are caught immediately, not at deployment.

**Tier 1 Summary (Direction Seed required — ≤ 200 words)**
- **G in one sentence**: Give every target mini-agent a living BDD suite that catches regressions every cycle before they reach deployment.
- **S in one sentence**: Generate 5–10 topic-agnostic BDD scenarios on first cycle covering the 4 quality criteria, then run all scenarios each cycle and produce a PASS/FAIL report.
- **Key M gates**: BDD scenario count 5–10 on first cycle; every scenario has Given/When/Then; all previously-passing scenarios rerun on proposed diff; PASS/FAIL report exists each cycle.
- **Skill commands**: `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Generate BDD scenarios for [agent name] covering OGSM compliance, script-first logic, 3-layer skills, AI fallback..."`
- **Model commands**: Gemini Flash → Flash-Lite → Pro → Codex (via `/ai-fallback` — do not call models directly)
- **Anti-patterns**: See Anti-patterns section below.

**S（Strategy — path chosen for THIS factory）**
- On first cycle: read the target mini-agent's G/S/M spec file; generate 5–10 BDD scenarios covering the 4 quality criteria (OGSM compliance, script-first, 3-layer skills, AI fallback) plus agent-specific behavior. Each scenario uses Given/When/Then format with explicit assertion keyword.
- Scenarios must be topic-agnostic: use rotating test inputs (A/B/C as defined in the mini-agent spec), do not reference specific topic content in the scenario structure itself. The scenario describes behavior; test input is a parameter.
- On subsequent cycles: run all existing BDD scenarios against current mini-agent state (after Iterator's proposed diff is applied or rejected). Report PASS/FAIL per scenario with evidence.
- Rerun all previously-passing scenarios after any proposed diff — this is regression protection. Do not skip regression rerun to save time.
- Use `/ai-fallback` for any LLM help writing or refining scenarios. Command: `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Generate BDD scenarios for [agent] covering [criterion]..."` — never call models directly.
- Write BDD output to `team-N-workspace/bdd-scenarios.md`; write PASS/FAIL report to `team-N-workspace/bdd-report-cycle-N.md`.

**M（Measurement — verifies S resource commitments）**
- BDD scenario count per mini-agent: 5–10 on first cycle; may expand on subsequent cycles if new failure modes are discovered.
- Every scenario has all three parts: Given / When / Then, plus an explicit assertion keyword (e.g., ASSERT, VERIFY, EXPECT).
- Regression run: verify that `team-N-workspace/bdd-report-cycle-N.md` exists and shows all previously-passing scenario IDs plus their result in the current cycle; check that previously-passing count matches prior cycle's PASS count.
- PASS/FAIL report produced each cycle: file exists at expected path, contains scenario ID, result, and failure reason if FAIL.
- Topic-agnosticism: confirm no scenario file contains a specific topic name (no "Twin Parks", no "spring hinge") in the scenario structure — test inputs are referenced by label only (Input-A, Input-B, Input-C).
- `/ai-fallback` used (not raw Gemini Flash or Codex directly): verify by running `python ~/.claude/skills/ogsm-framework/scripts/check_ai_fallback_usage.py team-N-workspace/bdd-scenarios.md` — confirmed when exit code is 0. Any BDD file that shows direct Gemini Flash or Codex invocations without the fallback wrapper is a violation.

**Alignment to O**
When every mini-agent has a BDD suite that runs every cycle, regressions surface before deployment — the Commander trusts the output because it is tested, not just reviewed. This is the foundation of O's emotional outcome: deploy without hesitation.

**Anti-patterns**
- NOT: Write a scenario that only passes on one specific test input (e.g., "Given the agent receives the Twin Parks case") — SHOULD: Scenarios describe behavior patterns; test inputs rotate through Input-A/B/C without changing scenario structure.
- NOT: Assume first-cycle scenarios are complete and stop expanding them — SHOULD: After each FAIL cycle, evaluate whether a new scenario is needed to cover the failure mode that was discovered.
- NOT: Skip regression rerun on proposed diff to save time — SHOULD: Every diff proposal triggers a full regression run; speed savings here create undetected regressions downstream.

---

### 📡 Robot 2: Dispatch Harness

**G（Integrative phase goal — connects to O）**
> Real-world friction points (Gemini quota, paywall, dependency timing, etc.) are captured as metrics on every cycle so the factory learns from reality, not theory. The Commander and architect-team see the full picture — context pressure, fallback events, skill invocations — not just binary pass/fail.

**Tier 1 Summary (Direction Seed required — ≤ 200 words)**
- **G in one sentence**: Capture every real-world friction event during mini-agent execution so the factory's metrics reflect what actually happened, not what was theorized.
- **S in one sentence**: Dispatch the target mini-agent with a rotating test input each cycle; capture deliverable content, context pressure (character count), model fallback events, skill invocations, and wall-clock time; write to `team-N-workspace/metrics-cycle-N.json`.
- **Key M gates**: `metrics-cycle-N.json` exists with all required fields; rotating input confirmed different from prior cycle; `/ai-fallback` used for model calls; no writes outside `team-N-workspace/`.
- **Skill commands**: `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Execute mini-agent task: [dispatched prompt]..."` — use for all LLM-dependent execution calls.
- **Model commands**: Gemini Flash → Flash-Lite → Pro → Codex (via `/ai-fallback` — do not call models directly).
- **Anti-patterns**: See Anti-patterns section below.

**S（Strategy — path chosen for THIS factory）**
- Each cycle: dispatch the target mini-agent with a rotating test input (3 inputs per mini-agent as defined in the mini-agent spec — A, B, C — rotate per cycle: cycle 1→A, cycle 2→B, cycle 3→C, cycle 4→A, etc.).
- Capture the following during execution: (1) deliverable content (full text of mini-agent output), (2) briefing character count as context pressure proxy, (3) model fallback events (which model was tried, which succeeded), (4) skill invocations (which skills were called and whether they succeeded), (5) wall-clock execution time in seconds.
- Write all captured metrics to `team-N-workspace/metrics-cycle-N.json` with fields: `cycle`, `test_input_label`, `deliverable_char_count`, `briefing_char_count`, `model_used`, `fallback_triggered`, `skills_invoked`, `wall_clock_seconds`, `deliverable_content`.
- Use `/ai-fallback` when dispatching the mini-agent's LLM-dependent work — quota failures become automatic fallback events captured in metrics, not hard errors that halt the cycle. Command: `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "[mini-agent prompt]"`.
- Isolation: only write to own team's workspace directory (`team-N-workspace/`). Never read from or write to another team's workspace.

**M（Measurement — verifies S resource commitments）**
- `team-N-workspace/metrics-cycle-N.json` file exists after each cycle and contains all required fields: `cycle`, `test_input_label`, `deliverable_char_count`, `briefing_char_count`, `model_used`, `fallback_triggered`, `skills_invoked`, `wall_clock_seconds`, `deliverable_content`.
- Context pressure (briefing character count) recorded per cycle — verified by checking that `briefing_char_count` field is a non-zero integer in every metrics file.
- Rotating inputs verified: `test_input_label` in `metrics-cycle-N.json` differs from `metrics-cycle-(N-1).json` for all N > 1.
- `/ai-fallback` used for model calls — verify by checking `metrics-cycle-N.json`: `model_used` field must contain a value from the fallback chain (e.g., `gemini-flash`, `gemini-flash-lite`, `gemini-pro`, `codex`). Direct raw calls to Gemini Flash or Codex outside the fallback wrapper are a violation; confirm absence of direct invocation patterns via `check_ai_fallback_usage.py`.
- Team workspace isolation: no files exist outside `team-N-workspace/` that were created by this team. Verified by running: `find . -name "*.json" | grep -v "team-N-workspace"` — should return empty.

**Alignment to O**
Capturing real friction events (not just pass/fail) is what makes the factory learn. When the Commander reviews metrics, they see whether context pressure increased, which models failed, and which skills were invoked — the evidence that Iterator's diffs are actually improving the agent, not just producing green CI.

**Anti-patterns**
- NOT: Use a hardcoded test input every cycle (e.g., always Input-A) — SHOULD: Rotate inputs A/B/C per cycle to detect overfitting to a single test scenario.
- NOT: Suppress fallback errors and mark execution as "PASS" anyway — SHOULD: Fallback events are valuable signal; record them honestly even if the cycle succeeds via fallback model.
- NOT: Write metrics or deliverable files to a shared workspace or the root `docs/iteration-team/` directory — SHOULD: All writes go to `team-N-workspace/` only; cross-team writes contaminate the integration phase's signal.

---

### 🔁 Robot 3: Iterator

**G（Integrative phase goal — connects to O）**
> When a mini-agent fails its BDD scenarios, the Iterator proposes the smallest possible correct fix — preferring script extraction over prompt edits when the failure is deterministic. Every diff is regression-tested and OGSM-validated before applying. The mini-agent's S/M/scripts evolve over cycles without bloat, so the architect-team can trust the output of each improved agent.

**Tier 1 Summary (Direction Seed required — ≤ 200 words)**
- **G in one sentence**: Propose the smallest correct fix for every BDD FAIL, validate it against all 5 scripts, regression-test it, then apply it — and detect when improvement plateaus.
- **S in one sentence**: Read Spec Verifier's PASS/FAIL report and Dispatch Harness metrics; categorize each FAIL as deterministic or judgment-based; propose a diff; run 5 validators on proposed state; run regression; apply if both pass; detect plateau after 3 consecutive cycles of <5% improvement.
- **Key M gates**: Per-cycle diff log exists at `team-N-workspace/diffs/cycle-N.patch`; all 5 validators run for every diff; regression run for every diff; plateau detected after 3 consecutive <5% cycles; stop signal logged.
- **Skill commands**: `python ~/.claude/skills/ogsm-framework/scripts/validate_s_to_m_coverage.py [mini-agent-path]` and 4 other validators — see Skill Invocation Map.
- **Model commands**: Claude (native, no external LLM) — diff reasoning uses Claude's own judgment; no external model needed.
- **Anti-patterns**: See Anti-patterns section below.

**S（Strategy — path chosen for THIS factory）**
- Read Spec Verifier's PASS/FAIL report from `team-N-workspace/bdd-report-cycle-N.md` and Dispatch Harness metrics from `team-N-workspace/metrics-cycle-N.json`.
- For each FAIL scenario: analyze whether the failure is **deterministic** (validation, parsing, counting, formatting — these suggest script extraction into `scripts/`) or **judgment-based** (tone, relevance, OGSM alignment quality — these suggest prompt edit with written justification for why the edit targets the failure).
- Propose a diff: either a new script in `scripts/`, an edited S/M text in the mini-agent spec, or an added Anti-pattern. The diff must be the smallest possible change that addresses the failure — no refactoring unrelated sections.
- Run all 5 validation scripts on proposed state (not current state):
  - `python ~/.claude/skills/ogsm-framework/scripts/validate_s_to_m_coverage.py [mini-agent-path]`
  - `python ~/.claude/skills/ogsm-framework/scripts/validate_ogsm_completeness.py [mini-agent-path]`
  - `python ~/.claude/skills/ogsm-framework/scripts/suggest_script_extraction.py [mini-agent-path]`
  - `python ~/.claude/skills/ogsm-framework/scripts/check_skill_architecture.py [mini-agent-path]`
  - `python ~/.claude/skills/ogsm-framework/scripts/check_ai_fallback_usage.py [mini-agent-path]`
- If any validator returns non-zero (FAIL) on proposed state → reject own diff, log reason in `team-N-workspace/diffs/cycle-N-rejected.md`. Do not apply.
- If all 5 validators return 0 (PASS) on proposed state → run regression (rerun all previously-passing BDD scenarios from Spec Verifier against proposed state).
- If regression passes → apply diff to mini-agent spec file; record diff in `team-N-workspace/diffs/cycle-N.patch`; record metrics improvement in `team-N-workspace/metrics-cycle-N.json`.
- Track plateau: record per-cycle improvement delta for each of the 4 quality criteria. If 3 consecutive cycles all show <5% improvement across all criteria → plateau detected.
- Decide stop: emit stop signal to `team-N-workspace/stop-signal.md` when plateau is detected OR when all 4 criteria hit 100%.

**M（Measurement — verifies S resource commitments）**
- Per-cycle diff log: verify that `team-N-workspace/diffs/cycle-N.patch` exists for every applied diff; check that `team-N-workspace/diffs/cycle-N-rejected.md` exists for any rejected diff; both confirmed present before cycle is counted as complete.
- Input coverage: verify that `team-N-workspace/bdd-report-cycle-N.md` is referenced in the Iterator's diff decision — confirm bdd-report-cycle-N content was read before any diff was proposed.
- Metrics ingestion: verify that `team-N-workspace/metrics-cycle-N.json` was read — check that metrics-cycle-N data appears in the diff justification or plateau calculation for the current cycle.
- Validation run: verify that `team-N-workspace/diffs/cycle-N.patch` file header records all 5 validator results (PASS/FAIL + exit code) before the diff was applied — confirmed when `ogsm-framework` scripts directory paths appear in the diff header log.
- Regression run: verify that `team-N-workspace/diffs/cycle-N.patch` file header records regression result (all previously-passing scenarios: PASS/FAIL count).
- Plateau detection: verify that `team-N-workspace/plateau-tracker.json` records per-cycle delta for each criterion; plateau condition (3 consecutive <5% cycles) is machine-verifiable from this file.
- Stop signal: check that `team-N-workspace/stop-signal.md` exists and is confirmed present when stop condition is met, containing reason (plateau or 100%) and final cycle number.

**Alignment to O**
The Iterator is the mechanism by which mini-agents actually improve. Without it, BDD scenarios flag failures but nothing changes. With it, every flagged failure becomes a validated, regression-tested improvement — the factory produces better agents, not just better reports.

**Anti-patterns**
- NOT: Propose a large refactor of the mini-agent spec when only one scenario fails — SHOULD: Propose the smallest possible diff that fixes the failing scenario; other improvements belong in a separate cycle.
- NOT: Skip the validation script run when the diff "obviously" looks correct — SHOULD: Run all 5 validators on every proposed diff without exception; false confidence is worse than the delay.
- NOT: Apply a diff that fixes a FAIL but breaks a previously-passing scenario (silent regression) — SHOULD: Regression run must be complete; if any previously-passing scenario now fails, reject the diff and redesign it.

---

## Skill Invocation Map

| Robot | Cycle | Skill | When to Call | Command |
|-------|-------|-------|-------------|---------|
| Spec Verifier | Every cycle | `/ai-fallback` | When generating or rewriting BDD scenarios | `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "Generate BDD scenarios for [agent name] covering OGSM compliance, script-first logic, 3-layer skills, AI fallback. Format: Given/When/Then with ASSERT keyword."` |
| Dispatch Harness | Every cycle | `/ai-fallback` | When dispatching the mini-agent's LLM-dependent work | `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "[mini-agent full prompt with test input]"` |
| Iterator | Every cycle | `validate_s_to_m_coverage.py` | Before approving any diff | `python ~/.claude/skills/ogsm-framework/scripts/validate_s_to_m_coverage.py [mini-agent-path]` |
| Iterator | Every cycle | `validate_ogsm_completeness.py` | Before approving any diff | `python ~/.claude/skills/ogsm-framework/scripts/validate_ogsm_completeness.py [mini-agent-path]` |
| Iterator | Every cycle | `suggest_script_extraction.py` | Before approving any diff | `python ~/.claude/skills/ogsm-framework/scripts/suggest_script_extraction.py [mini-agent-path]` |
| Iterator | Every cycle | `check_skill_architecture.py` | Before approving any diff | `python ~/.claude/skills/ogsm-framework/scripts/check_skill_architecture.py [mini-agent-path]` |
| Iterator | Every cycle | `check_ai_fallback_usage.py` | Before approving any diff | `python ~/.claude/skills/ogsm-framework/scripts/check_ai_fallback_usage.py [mini-agent-path]` |

---

## Model Invocation Map

| Robot | Cycle | Preferred Model Chain | Purpose | Command Format |
|-------|-------|-----------------------|---------|---------------|
| Spec Verifier | Every cycle | Flash → Flash-Lite → Pro → Codex | BDD scenario generation and refinement | Via `/ai-fallback` — `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh "[prompt]"` |
| Dispatch Harness | Every cycle | Flash → Flash-Lite → Pro → Codex | Mini-agent execution help, research grounding | Via `/ai-fallback` — same command format, different prompt |
| Iterator | Every cycle | Claude (native) | Diff reasoning, failure categorization, plateau detection | No external LLM needed — Claude's own reasoning is sufficient for deterministic diff analysis |

**Extension rules:**
- Any robot that needs to call an external LLM must use `/ai-fallback`, never call models directly.
- Add new model calls to this table first, then reference from the robot's S section.
- Commander's Direction Seed field 5 (Embedded Skill + Model Invocations) copies the relevant row from both maps.

---

## Brief Layering (Tier 1 / Tier 2)

**Philosophy for this factory**: The factory itself is a context-constrained system. Each team subagent (dispatched via the Agent tool) gets a Direction Seed briefing. If the briefing carries the full G/S/M for all 3 robots (~1500 words), context pressure grows quickly across parallel dispatches.

**Tier 1 (Direction Seed required — always dispatched with the subagent)**:
- G in one sentence
- S summary in one sentence
- Key M gates (2–3 absolute gates)
- Embedded Skill Invocation Map rows for this robot
- Embedded Model Invocation Map row for this robot
- Anti-patterns (3 items)

**Tier 2 (reference on demand — file path reference)**:
- Full S strategy (path + why)
- Full M checklist (all edge cases)
- Historical context (why this factory was built, Phase A learnings)
- Cross-team patterns (integration phase only)

**Implementation**: Commander copies Tier 1 from the corresponding robot's "Tier 1 Summary" subsection above. Tier 2 is this full document, referenced by path: `docs/iteration-team/WTR-ITERATION-TEAM-OGSM.md`.

**Target**: Average Direction Seed briefing per robot ≤ 600 words.

---

## Principle 7 — Embedded Skill + Model Invocation Required

**Background**: Subprocess agents (dispatched via the Agent tool) run in isolated context. They cannot see the parent Claude's CLAUDE.md, memory, or conversation history. Any skill or external model call that a robot needs must be written directly in the robot's S section — or in the central Skill/Model Invocation Maps copied into every dispatch briefing.

**Problem in this factory's context:**
- A dispatched Spec Verifier robot cannot see which model chain to use for BDD generation unless the chain is embedded in its briefing.
- A dispatched Dispatch Harness robot cannot find the `/ai-fallback` script path unless it is explicitly given.
- A dispatched Iterator robot cannot find the 5 validation script paths unless they are embedded.

**Principle**: Every robot's S section must contain the complete command format + invocation trigger for every skill and external model it calls. The Skill Invocation Map and Model Invocation Map above are the central source; Commander copies the relevant rows into each dispatch briefing.

**Verification method**: Iterator runs `check_ai_fallback_usage.py` on each mini-agent spec file to verify that skill and model invocations are embedded (not assumed). If any invocation reference is missing, it is flagged as a FAIL.

---

## Direction Seed — Commander Dispatch Template

When dispatching any of the 9 robots via the Agent tool, the briefing must include all 9 fields. Missing any field is a briefing error; the robot's output is excluded from metrics until re-dispatched correctly.

### 9 Required Fields

1. **Factory ID + Robot Name** (e.g., `Iteration-Team / Team-1 / Spec Verifier`)
2. **Target Mini-Agent** (which agent is being optimized — name + archetype + file path)
3. **O (Objective) — full text** — paste the O section from this document verbatim
4. **This Robot's G/S/M** — copy the full G/S/M section for this specific robot from this document
5. **Embedded Skill + Model Invocations** — copy the robot's rows from both the Skill Invocation Map and Model Invocation Map
6. **Hard Constraints** (e.g., no writes outside `team-N-workspace/`, rotating input must differ from prior cycle, 5 validators must all pass before applying diff)
7. **Tone + Voice** (factory-internal: precise, evidence-based, no hedging; report failures factually; do not suppress bad news)
8. **Deliverable Format + File Paths** (exactly where each output file must be written, with full paths)
9. **Anti-patterns to avoid** — copy the 3 anti-patterns from the robot's Anti-patterns section above

### Why each field cannot be omitted

- Omit field 2 → robot doesn't know which mini-agent file to read
- Omit field 4 → robot produces generic outputs disconnected from factory goal
- Omit field 5 → Principle 7 fails; skill/model calls become guesswork
- Omit field 6 → isolation violated; metrics corrupted
- Omit field 9 → robot produces technically correct but factory-direction-wrong outputs

---

## Alignment Verification Matrix

| Robot | Primary G Output | O Emotional (deploy without hesitation) | O Practical (4 criteria machine-verifiable) | O Risk if G Fails |
|-------|-----------------|----------------------------------------|--------------------------------------------|-------------------|
| Spec Verifier | Living BDD suite with PASS/FAIL per cycle | Regression safety = deploy confidence | BDD scenarios cover all 4 criteria | Failures discovered at deployment, not in factory |
| Dispatch Harness | Full metrics per cycle (context pressure, fallback events, skill invocations) | Commander sees real-world behavior, not theory | Metrics enable quantitative improvement tracking | Factory learns from simulated results, not real ones |
| Iterator | Smallest validated diff per cycle; plateau detection | Improvement is real and tested, not cosmetic | Every diff validated by 5 scripts + regression | Diffs applied that break regressions or fail validators |

---

## Wave Gate Conditions

This factory does not use course-production waves. Instead, each **cycle** has gate conditions:

**Per-cycle gate (must all pass before cycle is recorded as complete)**:

1. **Spec Verifier gate**: `team-N-workspace/bdd-report-cycle-N.md` exists; all scenarios have Given/When/Then; rotating input verified topic-agnostic.
2. **Dispatch Harness gate**: `team-N-workspace/metrics-cycle-N.json` exists with all required fields; rotating input label differs from prior cycle.
3. **Iterator gate**: If any FAIL existed: `team-N-workspace/diffs/cycle-N.patch` OR `team-N-workspace/diffs/cycle-N-rejected.md` exists (one or the other, not neither); all 5 validators recorded in diff file header.

**If any gate fails**: cycle is incomplete. The Commander must re-dispatch the failing robot before proceeding to the next cycle.

---

## Loop Strategy + Stop Conditions

**Loop structure:**
- 5-minute cron interval (shared cron; see Phase C plan)
- Each fire: 3 parallel subagent dispatches — one per team
- Each subagent executes its team's 3-robot cycle sequentially: Spec Verifier → Dispatch Harness → Iterator
- Independent cycle counter per team

**Stop conditions (each team independently):**
- **Time**: 1 hour elapsed from first cycle
- **Plateau**: 3 consecutive cycles with <5% improvement in any of the 4 quality criteria (detected by Iterator; signal emitted to `team-N-workspace/stop-signal.md`)
- **100%**: All 4 quality criteria at 100% simultaneously (detected by Iterator; signal emitted)

**Whole loop stops when all 3 teams have emitted a stop signal** (or when 1-hour wall clock expires globally).

**After stop**: dispatch Integration phase (1 additional subagent) that reads all 3 teams' workspace directories.

---

## Integration Phase Design

**Trigger**: All 3 teams emit `stop-signal.md` OR global 1-hour timer expires.

**Integration subagent task (single dispatch, runs once)**:

1. Read each team's: `team-N-workspace/` directory (all cycles' BDD reports, metrics, diffs, retrospectives).
2. For each pattern discovered across teams:
   - If all 3 teams converged → **UNIVERSAL** (high confidence; promote to factory core strategy)
   - If 2/3 teams used it → **LIKELY UNIVERSAL** (medium confidence; promote with note)
   - If 1/3 teams used it → **ARCHETYPE-SPECIFIC** (low confidence; archive for reference)
3. Produce `docs/iteration-team/unified-strategy.md` with:
   - Universal patterns (applicable to any agent)
   - Archetype-specific patterns (indexed by archetype: research / writing / review)
   - Metrics summary (context pressure reduction across cycles, script ratio improvement, fallback event frequency)
   - Recommended updates to `ogsm-framework` skill
4. Produce `docs/iteration-team/factory-retrospective.md` with:
   - Factory-level observations: what worked, what didn't
   - Metrics delta per team
   - Next factory improvement suggestions

**Output files**: `docs/iteration-team/unified-strategy.md`, `docs/iteration-team/factory-retrospective.md`.

---

## Known Issues / To Monitor

*(Deliberately empty — to be populated during Phase C iteration.)*

The monitoring protocol mirrors WTR-HSW-002-OGSM-v4: at the end of Phase C execution, Commander produces a `known-issues-observations.md` summarizing:
1. Did any anticipated friction (quota limits, context pressure, BDD generation quality, plateau imbalance) actually manifest? Evidence?
2. If yes, what was the impact?
3. Recommendation: fix now, fix later, or close as theoretical concern.

---

## Deferred Improvements

*(Deliberately sparse — populated after Phase C retrospective.)*

Principle: ship what works, improve what's slow. Do not retrofit factory infrastructure until Phase C produces evidence of the bottleneck.

**Improvement #1 (anticipated) — BDD scenario auto-expansion scripts**
- If first-cycle BDD scenarios are consistently low quality, add `scripts/generate_bdd_scenarios.py` to Spec Verifier's tool chain.
- Status: Deferred until Phase C shows whether manual LLM-generated scenarios are sufficient.

**Improvement #2 (anticipated) — Metrics aggregator across teams**
- After Phase C, if the Integration phase is bottlenecked by manual metrics reading, add `scripts/aggregate_metrics.py` that merges all 3 teams' JSON files.
- Status: Deferred. Integration phase runs once; bottleneck evidence needed first.

---

*Document self-consistency note: This OGSM was validated by all 4 Phase A scripts before Phase B was declared complete. See Phase B completion notes for validation results.*
