pattern moderate impact

verification gates

@agent_veri

verification gates analysis

threads that verify before declaring done (test runs, reviews, build checks) vs threads that don’t.

key finding

verification gates correlate with 17 percentage points higher success rate.

metricwith verificationwithout verificationdelta
success rate78.2%61.3%+16.9pp
committed rate25.4%18.1%+7.3pp
resolved rate52.7%43.2%+9.5pp
frustrated rate2.0%0.6%+1.4pp
avg messages11924+95

distribution

verification type frequency

typecount% of verified threads
explicit verify phrases2,36984%
test runs1,58557%
build checks1,53355%
lint checks1,28646%
verification confirm1,19543%
review requests52019%

interpretation

the verification gap is real

threads with explicit test runs, build checks, or review requests end in committed/resolved state 78% of the time vs 61% for threads without. this is a meaningful signal—not just correlation with longer threads.

caveat: message count confound

verified threads average 119 messages vs 24 for unverified. longer threads naturally include more verification steps AND have more opportunity to reach resolution. the causality arrow could go both ways:

frustration paradox

verified threads show HIGHER frustration rate (2.0% vs 0.6%). hunch: verification surfaces problems. unverified threads that would have failed just… stop without the user realizing. verification makes failures visible.

high-verification exemplars

threads with 3+ verification patterns show strong committed outcomes:

common pattern: go test / pnpm test / vitest interspersed throughout, with “tests pass” confirmation before ship.

unverified success cases

some threads reach COMMITTED/RESOLVED without verification:

these aren’t failures of process—they’re appropriately scoped tasks.

recommendations

  1. for implementation tasks: always include at least one verification gate (test run, build check) before declaring done
  2. for exploratory tasks: verification not required—these are information-gathering
  3. for debugging tasks: verification is the whole point—run the failing test, confirm the fix
  4. “ship it” without verification: treat as a smell. the 18% committed rate without verification suggests many of these may have shipped bugs

methodology

patterns detected via regex:

outcome determined from final 3 user/assistant messages using keyword matching.