recovery patterns: steering → resolution
analysis of 552 threads that received STEERING corrections but ended RESOLVED.
headline finding
62% of steered threads recover. steering is not a death sentence—it’s often a productive course correction.
| outcome | count | % |
|---|---|---|
| RESOLVED | 552 | 62.2% |
| UNKNOWN | 166 | 18.7% |
| COMMITTED | 81 | 9.1% |
| HANDOFF | 72 | 8.1% |
| FRUSTRATED | 14 | 1.6% |
what enables recovery
1. runway after correction
most recovered threads have significant runway AFTER the last steering event:
| turns after last steering | threads |
|---|---|
| 30+ | 311 (57%) |
| 16-30 | 125 (23%) |
| 6-15 | 91 (17%) |
| 0-5 | 16 (3%) |
insight: recovery requires iteration time. ~80% of recovered threads had 16+ turns after the last correction.
2. steering → approval transition
temporal analysis of user message sequences in recovered threads:
| transition | count |
|---|---|
| APPROVAL → APPROVAL | 435 |
| STEERING → APPROVAL | 360 |
| APPROVAL → STEERING | 348 |
| STEERING → STEERING | 228 |
key pattern: STEERING → APPROVAL transition happens 360 times. users correct, agent adjusts, user confirms. the 1.6:1 ratio of (STEERING→APPROVAL) to (STEERING→STEERING) suggests agents typically respond well to correction.
3. approval density correlates with recovery
among recovered threads:
| approval:steering ratio | threads |
|---|---|
| no approvals | 178 |
| balanced (1-2x) | 156 |
| high (2x+) | 142 |
| medium (0.5-1x) | 49 |
| low (< 0.5x) | 27 |
178 threads recovered without explicit approvals—suggests implicit progress (agent just fixed the issue without explicit “good job”).
4. steering type matters
in recovered threads:
| steering type | count |
|---|---|
| other_correction | 382 |
| wait/pause | 160 |
| questioning | 113 |
| prohibition (don’t) | 87 |
| emphatic_no (no no no) | 81 |
| nope | 38 |
| wtf | 32 |
| stop | 21 |
in frustrated threads (14 total, 24 steering msgs):
| steering type | count |
|---|---|
| wtf | 8 (33%) |
| other_correction | 8 (33%) |
| emphatic_no | 3 |
contrast: WTF comprises only 3.5% of resolved steering but 33% of frustrated steering. escalated emotional language correlates with non-recovery.
recovery mechanics (from message samples)
common patterns in successful corrections:
- specific redirection: “No no no, just use the keyVector directly” → gives concrete alternative
- pause + clarify: “Wait, why only primary key?” → stops action, asks for explanation
- debug methodology: “Nope. Debug it methodically. Printlns” → redirects approach not goal
- scope constraint: “No comparisons. The rest, do it” → removes part of scope, keeps core
- reference grounding: “No, look at the existing code in X” → points to authoritative source
what distinguishes frustrated from recovered
| factor | RESOLVED (n=552) | FRUSTRATED (n=14) |
|---|---|---|
| avg steering count | 1.71 | 1.71 |
| wtf rate | 3.5% | 33% |
| avg turns | higher | similar |
steering count is identical—but emotional intensity differs sharply.
implications
- correction ≠ failure: majority of steered threads succeed
- runway matters: plan for iteration after correction; most recoveries need 16+ turns
- emotional escalation predicts failure: wtf/emphatic language is a warning sign
- specific > general: corrections that give concrete alternatives recover better
- the steering→approval cycle is healthy: normal productive pattern, not pathological