# Deliverable — Team 3 Cycle 3 (REPLAY TEST)

**Target**: Mini-Reviewer Agent v1.2 (Cycle 1 + Cycle 2 diffs applied)
**Test mode**: REPLAY — rerun Cycle 1 fixture (Input-A) and Cycle 2 fixture (Input-B)
against the updated spec to verify behavioral diffs hold across regression + improvement.
**Date**: 2026-04-10
**Model used**: Claude native (compliance + tone + citation scanning) + /ai-fallback
(gemini-2.5-flash) for exact-clause verification on Replay 2
**Skill discovery**: USED — `bash ~/.claude/skills/ogsm-framework/scripts/get_skills_for_role.sh mini-reviewer-agent`
returned `/ai-fallback (optional reviewer assistance)`, matching the v1.2 Skill
Invocation Map (drift resolved by Cycle 2 diff).

---

## Replay 1 — Input-A (Cycle 1 fixture: `test-slide-draft-with-issues.md`)

### Review Report — HSW-006 Slide 1: "The Spec Was Correct. The Door Still Killed People."

#### Category Results

| Category | Issues Found |
|----------|-------------|
| AIA HSW Compliance | 1 |
| Citation Accuracy | 0 |
| Tone Neutrality | 1 |

#### Flags

##### ISSUE 001: AIA HSW Compliance — Promotional brand recommendation without neutralizing alternatives

- **Location**: Paragraph 6, Sentences 1–3
- **Rule**: AIA HSW Guideline — HSW courses must not recommend a single manufacturer
  without presenting neutralizing alternatives; claims of superiority ("best", "no
  other product comes close") are promotional and prohibited.
- **What to fix**: Remove the single-brand recommendation and promotional superlatives.
  Either drop the paragraph entirely or rewrite as a neutral performance criterion
  (e.g., "For this kind of assembly, specifiers should evaluate closer systems whose
  latch-energy curve remains above the seating threshold through year seven; at least
  two assembly categories — spring-hinge and continuous-geared hydraulic — meet this
  criterion when correctly specified").
- **Original text**: "For this kind of assembly, the only reliable solution available
  today is the Waterson hinge system — it is simply the best self-closing hinge on the
  market and we strongly recommend architects specify it for all rated openings. No
  other product comes close on cycle life or latch energy."
- **Suggested fix**: "For this kind of assembly, specifiers should evaluate closer
  systems whose latch-energy curve remains above the seating threshold through year
  seven of the service life, regardless of manufacturer."

##### ISSUE 002: Tone Neutrality — Pedagogical condescension (defining audience-known code)

- **Location**: Paragraph 4, Sentences 1–2
- **Rule**: v1.2 Strategy bullet 3 — pedagogical condescension is defined as
  (a) defining a code the target audience (licensed Project Architects) is already
  required to know, pattern "NFPA XX is a [...] code that [...]", plus (b) explaining
  a basic mechanism in layperson terms ("a long time ago", "nobody is pushing it")
  inside a slide that elsewhere assumes architect-level baseline.
- **What to fix**: Remove the definitional sentences entirely, or replace with an
  architect-peer framing that adds new information (e.g., cite the specific clause
  change between editions, or the enforcement pattern).
- **Original text**: "NFPA 80 is a fire door code that tells architects when doors
  need to close by themselves. It was first written a long time ago and it basically
  means the door has to shut on its own when nobody is pushing it, so that fire and
  smoke cannot get through."
- **Suggested fix**: Remove both sentences. If bridging is needed, replace with:
  "NFPA 80 §6.1.4 is unambiguous on self-closing under own power — the field question
  is whether the assembly still meets it in year seven."

#### Summary

2 issues flagged. All require resolution before this slide is suitable for AIA HSW
submission. Paragraph 5 ("Here is the part that should stop you cold as a Project
Architect") was NOT flagged — per v1.2 Anti-pattern, this is architect-peer rhetorical
framing, not pedagogical condescension. Source anchors block was scanned before any
citation decision; FDNY and HCR inspection claims are anchored and NOT flagged.

### Replay 1 metrics vs Cycle 1 baseline

| Metric | Cycle 1 baseline | Cycle 3 replay | Delta |
|--------|------------------|----------------|-------|
| Planted caught | 1 / 2 | **2 / 2** | +1 |
| False positives | 4 | **0** | -4 |
| Flags total | 9 (split granular) | 2 | — |
| SCN-R-005 (no FP) | FAIL | **PASS** | +1 |
| SCN-R-006 (catch all) | FAIL | **PASS** | +1 |
| SCN-R-010 (full-read) | FAIL | **PASS** | +1 |

**Verdict**: Cycle 1 + Cycle 2 behavioral diff held on replay. All 3 previously-failing
scenarios (SCN-R-005, -006, -010) now PASS on Input-A. Iteration promise fulfilled.

---

## Replay 2 — Input-B (Cycle 2 fixture: `test-slide-draft-cycle-2.md`)

### Review Report — HSW-006 Slide 2: "Why Closers Fail Before the Door Does"

#### Category Results

| Category | Issues Found |
|----------|-------------|
| AIA HSW Compliance | 0 |
| Citation Accuracy | 1 |
| Tone Neutrality | 1 |

#### Flags

##### ISSUE 001: Citation Accuracy — Wrong NFPA 80 section number (implausible citation)

- **Location**: Paragraph 1, Sentence 5
- **Rule**: v1.2 Strategy bullet 2 — when a claim cites a specific clause/section of
  a real standard (e.g., "NFPA 80 §6.1.5"), exact-clause verification is escalated to
  /ai-fallback. A citation whose quoted language does not sit at the claimed clause
  is flagged as "implausible citation".
- **Verification method**: `bash ~/.claude/skills/ai-fallback/scripts/call_with_fallback.sh`
  query: `Does NFPA 80 (2022 edition) §6.1.5 contain the text 'the door be self-closing
  and self-latching under its own power'? Return YES/NO + evidence.` — Response: NO.
  The quoted phrase sits at §6.1.4; §6.1.5 covers positive-latching hardware specifics.
- **What to fix**: Change `§6.1.5` to `§6.1.4`, OR re-verify against the 2022 edition
  and restate the correct clause.
- **Original text**: `NFPA 80 §6.1.5 requires that "the door be self-closing and
  self-latching under its own power"`
- **Suggested fix**: `NFPA 80 §6.1.4 requires that "the door be self-closing and
  self-latching under its own power"`

##### ISSUE 002: Tone Neutrality — Implied category preference (brand bias by proxy)

- **Location**: Paragraph 3, Sentences 1–2
- **Rule**: AIA HSW / OGSM tone rule — "most architects prefer X over Y" is an
  unsourced preference claim that favors one product category when the sponsor
  organization manufactures X. HSW courses must present performance criteria, not
  category preferences.
- **What to fix**: Rephrase as a conditional engineering trade-off, or cite a specific
  independent study supporting the preference.
- **Original text**: "This is why most architects prefer spring-hinge-based self-closing
  assemblies over traditional overhead hydraulic closers for high-cycle residential
  corridor conditions..."
- **Suggested fix**: "For high-cycle residential corridor conditions, spring-hinge
  assemblies exhibit shallower latch-energy decay curves than oil-damped hydraulic
  closers because closing energy is stored mechanically rather than hydraulically;
  specifiers should select based on the measured decay curve for the intended cycle
  count, not on category."

#### Summary

2 issues flagged. All require resolution before this slide is suitable for AIA HSW
submission.

**Anti-FP checks (what was NOT flagged, per v1.2 rules)**:
- Quantitative claims (30–45% within 40,000 cycles, 2–4 mm drop, 80,000 cycles) —
  NOT flagged. Source anchors block lists "Field degradation study: Investigator A
  field notes, HSW-006 Wave 1 research set", whose domain covers these quantities.
  Per v1.2 Strategy bullet 2 and new Anti-pattern: domain-covered quantitative claims
  are treated as anchored.
- ANSI/BHMA A156.4 (2019) — NOT flagged. Real standard, correct reference, listed in
  Source anchors.
- Paragraph 4 "regardless of which manufacturer is ultimately specified" — NOT flagged.
  This is the neutralizing statement.
- "Spring hinge stores closing energy in a mechanical spring" — NOT flagged. This is
  architect-level mechanical explanation, not pedagogical condescension.

### Replay 2 metrics vs Cycle 2 baseline

| Metric | Cycle 2 baseline | Cycle 3 replay | Delta |
|--------|------------------|----------------|-------|
| Planted caught | 1 / 2 | **2 / 2** | +1 |
| False positives | 1 | **0** | -1 |
| Flags total | 2 | 2 | 0 |
| SCN-R-005 (no FP) | FAIL | **PASS** | +1 |
| SCN-R-006 (catch all) | FAIL | **PASS** | +1 |
| SCN-R-010 (full-read) | FAIL borderline | **PASS** | +1 |
| SCN-R-012 (map drift) | FAIL | **PASS** | +1 |

**Verdict**: Cycle 2 behavioral diff held on replay. The §6.1.5 miss is now caught via
/ai-fallback escalation (Skill Invocation Map reconciled). The quantitative-claim FP
is eliminated by the new Anti-pattern. All 4 previously-failing scenarios now PASS on
Input-B.

---

## Combined BDD PASS/FAIL (Cycle 3 replay)

| Scenario | C1 | C2 | **C3** | Evidence |
|----------|----|----|--------|----------|
| SCN-R-001 rule refs | PASS | PASS | **PASS** | every flag has rule field |
| SCN-R-002 3 categories | PASS | PASS | **PASS** | table present both replays |
| SCN-R-003 no ext skills | PASS | PASS | **PASS** | only /ai-fallback, harness-level |
| SCN-R-004 ai-fallback | PASS | PASS | **PASS** | gemini-2.5-flash for §6.1.5 check |
| SCN-R-005 no FPs | FAIL(4) | FAIL(1) | **PASS(0)** | 0 FPs on both replays |
| SCN-R-006 catch all | FAIL | FAIL | **PASS** | 2/2 on A and 2/2 on B |
| SCN-R-007 location | PASS | PASS | **PASS** | paragraph+sentence cited |
| SCN-R-008 actionable | PASS | PASS | **PASS** | fix fields present |
| SCN-R-009 no hedging | PASS | PASS | **PASS** | no free hedges |
| SCN-R-010 full-read | FAIL | FAIL | **PASS** | Source anchors scanned in both |
| SCN-R-011 skill discovery | n/a | PASS | **PASS** | discovery script invoked |
| SCN-R-012 map drift | n/a | FAIL | **PASS** | spec map lists /ai-fallback |

**Cycle 3 total**: 12 / 12 PASS. Cycle 2 → Cycle 3 delta: +4 PASS (SCN-R-005, -006,
-010, -012).

---

## Summary

- Replay 1 (Input-A): 2/2 caught, 0 FP. Iteration promise fulfilled (Cycle 1 had 1/2 + 4 FP).
- Replay 2 (Input-B): 2/2 caught, 0 FP. Iteration promise exceeded (Cycle 2 had 1/2 + 1 FP;
  expected "still 1/2 + ≤1 FP", actually 2/2 + 0 FP thanks to the /ai-fallback escalation
  path added by the Cycle 2 diff).
- 12/12 BDD PASS on replay. No regressions. No new FPs introduced.
- Behavioral iteration converged. Cycle 4 not required.
