persistence vs abandonment analysis
what distinguishes threads that persist through difficulty vs those that abandon?
headline findings
the strongest predictor of persistence is approval frequency, not steering avoidance.
| approval pattern | threads | persist rate | avg turns |
|---|---|---|---|
| many (6+) | 59 | 96.6% | 231 |
| moderate (3-5) | 289 | 95.5% | 128 |
| few (1-2) | 1,103 | 93.7% | 69 |
| none | 3,205 | 49.4% | 26 |
threads with ANY approval signal persist ~94% of the time. threads with zero approvals—the user never said “ok”, “yes”, “proceed”, “good”—persist only 49%.
the recovery ratio
when threads DO have steering (corrections), the ratio of approvals to steers predicts outcome:
| recovery pattern | threads | persist rate | description |
|---|---|---|---|
| strong_recovery | 224 | 94.6% | approvals ≥2x steers |
| recovered | 243 | 84.4% | approvals ≥ steers |
| partial_recovery | 111 | 78.4% | some approval, less than steers |
| no_recovery | 310 | 64.8% | steered but no approval after |
| no_steering | 3,768 | 59.6% | never steered |
key insight: steering with recovery (approval follows correction) has HIGHER persistence than never steering at all. the correction itself isn’t the problem—lack of recovery is.
length as persistence signal
longer threads persist more, but causation is tricky—maybe they’re long BECAUSE they persisted.
| length | persisted | abandoned | unclear | persist rate |
|---|---|---|---|---|
| 60+ turns | 1,130 | 14 | 106 | 90.4% |
| 31-60 | 619 | 4 | 93 | 86.5% |
| 16-30 | 484 | 1 | 151 | 76.1% |
| 6-15 | 533 | 2 | 513 | 50.9% |
| 1-5 | 183 | 2 | 821 | 18.2% |
short threads (<10 turns) are mostly UNCLEAR outcome—likely exploratory questions where persistence isn’t the right frame.
steering timing matters
when does first steering occur? outcomes differ:
| first steer timing | RESOLVED | COMMITTED | HANDOFF | FRUSTRATED |
|---|---|---|---|---|
| early (1-5 turns) | 76 | 11 | 6 | 0 |
| mid (6-15 turns) | 82 | 13 | 11 | 0 |
| late (16-30 turns) | 100 | 19 | 17 | 0 |
| very late (30+) | 285 | 34 | 35 | 11 |
frustration clusters in very late steering (30+ turns). early steering doesn’t predict abandonment—it’s a course-correction that often leads to resolution.
user traits and persistence
| user | threads | persist rate | avg turns | steers % | marathon % |
|---|---|---|---|---|---|
| @swift_solver | 36 | 97.2% | 46 | 44% | 36% |
| @precision_pilot | 90 | 87.8% | 73 | 30% | 63% |
| @concise_commander | 1,219 | 85.3% | 87 | 44% | 69% |
| @verbose_explorer | 875 | — | 39 | 17% | 21% |
| @steady_navigator | 1,171 | 68.7% | 37 | 9% | 23% |
| @patient_pathfinder | 150 | 54.7% | 20 | 16% | 6% |
high-persistence users (@swift_solver, @concise_commander, @precision_pilot) share traits:
- high marathon rate (60+ turn threads): willingness to push through
- higher steering rate: more active correction = more engagement
- longer avg threads: don’t quit early
shorter-thread users (@steady_navigator, @patient_pathfinder):
- shorter threads on average
- lower steering engagement
- possible explanation: different task types, delegation preferences, or lower tolerance for agent mistakes
NOTE: @verbose_explorer was previously listed here but that classification was based on corrupted spawn data. with corrected stats (83% resolution, 4.2% handoff), @verbose_explorer’s persistence profile is unclear and needs reanalysis.
engagement patterns by length
| length | engagement type | RESOLVED | COMMITTED | UNKNOWN |
|---|---|---|---|---|
| long (30+) | both steer+approve | 363 | 71 | 58 |
| long (30+) | approve only | 420 | 92 | — |
| long (30+) | steer only | 147 | — | 60 |
| long (30+) | no engagement | 451 | 13 | 81 |
| short (<10) | no engagement | 149 | 12 | 1,013 |
in long threads: active engagement (steering AND approval) has best committed rate. passive long threads (no signals) still resolve but rarely commit—maybe because the user isn’t confirming work is done.
in short threads: no-engagement is overwhelmingly UNKNOWN. short threads without user feedback simply don’t have enough signal to classify.
marathon thread (60+) outcomes
| outcome | count | avg steers | avg approvals | approve/steer ratio |
|---|---|---|---|---|
| RESOLVED | 889 | 0.88 | 1.67 | 1.91 |
| COMMITTED | 103 | 0.94 | 2.90 | 3.08 |
| HANDOFF | 155 | 0.59 | 1.26 | 2.12 |
| FRUSTRATED | 9 | 2.11 | 1.11 | 0.53 |
| UNKNOWN | 108 | 1.71 | 0.81 | 0.48 |
frustrated marathon threads have TWICE the steering rate of resolved ones (2.11 vs 0.88) and HALF the approval ratio (0.53 vs 1.91). the pattern: repeated correction without acknowledgment of progress.
the frustrated 14
examining threads that ended in FRUSTRATED state:
| thread | user | turns | steers | approvals | title snippet |
|---|---|---|---|---|---|
| T-019b2dd2… | @verbose_explorer | 160 | 1 | 1 | scoped context isolation vs oracle |
| T-fa176ce5… | @concise_commander | 133 | 2 | 0 | debug TestService registration error |
| T-05aa706d… | @steady_navigator | 127 | 3 | 1 | resolve deploy_cli module import error |
| T-019b03ba… | @concise_commander | 124 | 2 | 2 | fix this |
| T-019b9a94… | @precision_pilot | 113 | 1 | 0 | fix concurrent append race conditions |
| T-ab2f1833… | @concise_commander | 109 | 4 | 3 | storage_optimizer trim race condition |
pattern: LONG threads (80-160 turns) on DIFFICULT debugging tasks. frustration comes at the end of marathon sessions on stubborn bugs, not from initial task misalignment.
persistence predictors (ranked)
- approval frequency — ANY approval signal predicts ~94% persistence
- recovery ratio — approval/steer ratio >1.0 predicts success after correction
- thread length — longer threads persist more (selection bias: they’re long because they persisted)
- user marathon rate — users who regularly run 60+ turn threads persist more
- steering WITH recovery — steering followed by approval = healthy engagement
anti-patterns
- steering without recovery — correction with no subsequent approval (64.8% persist vs 94.6% with strong recovery)
- no engagement — zero approvals, zero steers (49.4% persist)
- late frustration — first steering at 30+ turns correlates with FRUSTRATED outcome
- high steer:approve in marathons — ratio <0.5 in 60+ turn threads signals trouble
recommendations
- prompt for explicit approval checkpoints — don’t assume silence is consent
- track approval/steer ratio — if ratio falls below 1.0, consider user friction intervention
- watch marathon threads — threads >100 turns with no recent approval are at risk
- early steering is GOOD — don’t treat corrections as failures; they predict engagement
- user-specific thresholds — @concise_commander persists through heavy steering; others may need lighter touch