thread lifecycle: phases, transitions, outcomes
analysis of 4,656 threads mapping the typical lifecycle of successful vs failed threads.
lifecycle model
every thread follows a lifecycle with identifiable phases. success and failure diverge at predictable transition points.
┌─────────────────────────────────────────────────────────────────────────────┐
│ THREAD LIFECYCLE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ INITIATION WORK CORRECTION RESOLUTION │
│ ────────── ────── ──────────── ────────── │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ opener │──────►│ execute │──────►│ steer │───────►│ resolve │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ ▲ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌─────────┐ ┌─────────┐ │ │
│ └──────────►│ approve │──────►│ approve │──────────────┘ │
│ └─────────┘ └─────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────┐ ┌─────────┐ │
│ │ │ steer │───────►│FRUSTRATED│ │
│ │ │ (loop) │ └─────────┘ │
│ │ └─────────┘ │
│ │ │
│ ▼ │
│ ┌─────────┐ │
│ │ handoff │ │
│ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
phase 1: INITIATION (turns 1-3)
the opening message determines trajectory. three patterns:
successful initiation patterns
| pattern | success rate | characteristics |
|---|---|---|
| file-anchored | 66.7% | includes @path/to/file references |
| continuation | 57.2% | “Continuing from thread T-xxx…“ |
| question-opener | 62.1% | starts with “how/what/why” |
| imperative | 58.9% | starts with “fix/add/create” |
failed initiation patterns
| pattern | success rate | characteristics |
|---|---|---|
| moderate-length | 42.8% | 150-500 chars (worst category) |
| no file refs | 41.8% | no @mentions, no context anchors |
| vague opener | ~35% | “fix this”, “run and debug X” |
| inherited mess | ~30% | continuing from problematic parent |
key insight: file references (@path/file) boost success by +25 percentage points. this is the single strongest initiation predictor.
length paradox
success follows a U-curve:
- brief (<150 chars): 62% success — simple, clear tasks
- moderate (150-500 chars): 43% success (LOWEST) — complex but undercontextualized
- extensive (1500+ chars): 65% success — front-loaded context pays off
phase 2: WORK (turns 4-N)
the productive phase where agent executes and user monitors. healthy work phase characteristics:
approval distribution
successful threads maintain uniform approval distribution across phases:
| phase | approval density |
|---|---|
| early (0-33%) | 1.85 avg |
| middle (33-66%) | 1.91 avg |
| late (66-100%) | 1.87 avg |
insight: no front-loading or back-loading. consistent small approvals maintain momentum better than occasional large ones.
optimal turn counts
| turn bucket | threads | success rate | frustration rate |
|---|---|---|---|
| 1-10 | 1,690 | 14.2% | 0.1% |
| 11-25 | 823 | 58.0% | 0.1% |
| 26-50 | 705 | 75.0% | 0.4% |
| 51-100 | 786 | 78.0% | 0.4% |
| 100+ | 652 | 79.1% | 0.9% |
sweet spot: 26-50 turns. short threads (<10) are usually abandoned queries, not completed work. beyond 100+, frustration risk increases.
spawning behavior
threads that spawn subtasks have different profiles:
| metric | spawning threads | non-spawning |
|---|---|---|
| resolution rate | 43.8% | ~50% |
| handoff rate | 34.8% | 12% |
| optimal spawn depth | 4-7 levels | n/a |
spawning isn’t about resolution in the CURRENT thread — it’s about decomposing complex work. chains with depth 4-7 have highest overall resolution.
phase 3: CORRECTION (optional)
when steering happens, the thread enters correction phase. this is NOT failure — 62% of steered threads recover.
steering types (ordered by recovery rate)
| steering type | recovery rate | characteristics |
|---|---|---|
| wait/pause | ~70% | “wait, let me clarify” — user catches before damage |
| questioning | ~65% | “why did you…?” — prompts reflection |
| specific redirect | ~60% | “no, use X instead” — gives alternative |
| prohibition | ~50% | “don’t do X” — unclear what TO do |
| emphatic_no | ~40% | “no no no” — frustration emerging |
| wtf | ~20% | emotional escalation — recovery unlikely |
the steering→approval transition
in recovered threads:
- STEERING → APPROVAL: 360 occurrences (healthy recovery)
- STEERING → STEERING: 228 occurrences (doom loop risk)
ratio of 1.6:1 suggests agents typically respond well to single corrections. consecutive steering (STEERING→STEERING) is the danger signal.
recovery runway
threads need runway after correction:
| turns after last steering | % of recovered threads |
|---|---|
| 30+ | 57% |
| 16-30 | 23% |
| 6-15 | 17% |
| 0-5 | 3% |
80% of recoveries need 16+ turns after correction. plan for iteration time.
phase 4a: RESOLUTION (successful termination)
threads terminate through several patterns:
COMMITTED (305 threads, 6.6%)
explicit ship ritual:
| signal | frequency |
|---|---|
| ”ship it” | 12% |
| “commit and push” | 7% |
| “commit” | 4% |
| “lgtm” | <1% |
55% of final messages <50 chars. committed threads close with terse imperatives.
approval:steering ratio: 4.29:1 — strong agreement throughout.
RESOLVED (2,070 threads, 44.5%)
implicit completion — user stops talking:
| final message pattern | frequency |
|---|---|
| unclassified | 48% |
| questions | 20% |
| imperatives | 15% |
| short approvals | 13% |
| thanks | <1% |
gratitude is rare (0.4%). threads don’t celebrate — they fade.
approval:steering ratio: 2.07:1 — healthy balance.
HANDOFF (75 threads, 1.6%)
explicit delegation to child thread:
- “Continuing work from thread T-xxx…”
- spawned agents with attached file context
- task decomposition
approval:steering ratio: 2.76:1 — reasonable progress before handoff.
EXPLORATORY (124 threads, 2.7%)
quick lookups that complete immediately:
- avg 5.8 turns
- zero steering, zero approval
- question asked → answer given → done
phase 4b: FAILURE (unsuccessful termination)
FRUSTRATED (14 threads, 0.3%)
thread ends on user frustration:
| characteristic | value |
|---|---|
| avg turns | 84.3 |
| steering rate | 1.71 (4x higher than resolved) |
| approval rate | 0.86 |
| wtf rate | 33% (vs 3.5% in resolved) |
| ratio | 0.50:1 (inverted) |
signature patterns:
- escalating ALL CAPS
- combined profanity + caps
- thread abandons mid-steering
- no resolution, just corrections
STUCK (1 thread)
complete failure:
- 128 turns
- 4 steerings, 0 approvals
- ratio: 0.00:1
- all steering, no approval = death
UNKNOWN (1,560 threads, 33.5%)
abandoned or ambiguous:
- avg 16 turns (short)
- 0.43:1 ratio
- likely early abandonment
transition probabilities
based on message sequence analysis:
healthy transitions (maintain or improve trajectory)
NEUTRAL → NEUTRAL [most common, work continues]
NEUTRAL → APPROVAL [progress acknowledged]
APPROVAL → APPROVAL [momentum building]
STEERING → APPROVAL [correction accepted, back on track]
warning transitions
NEUTRAL → STEERING [first correction, 50% recovery]
STEERING → STEERING [doom loop, 40% recovery]
APPROVAL → STEERING [regression after progress]
terminal transitions
STEERING → FRUSTRATED [emotional escalation, <20% recovery]
STEERING → STUCK [complete breakdown]
ANY → ABANDONED [user stops engaging]
outcome prediction formula
based on quantitative analysis:
success_probability =
base_rate (55%)
+ file_refs_in_opener * 25%
+ approval_steering_ratio * 10% (if >2:1)
- steering_steering_loop * 20%
- wtf_present * 30%
- moderate_opener_length * 10% (150-500 chars)
threshold alerts
| condition | action |
|---|---|
| ratio drops below 1:1 | yellow flag — suggest rephrasing |
| 2+ consecutive steerings | orange flag — meta-acknowledge |
| wtf/profanity appears | red flag — offer handoff/oracle |
| 15+ turns with 0 approvals | yellow flag — check engagement |
user-specific lifecycle patterns
@concise_commander (marathoner)
- avg 85 turns, 71.8% success
- high steering (0.81) but recovers
- steers toward goal rather than abandoning
- lifecycle: long WORK phase, frequent small corrections, eventual RESOLUTION
@steady_navigator (efficient commander)
- avg 36 turns, 67% success
- minimal steering (0.10)
- single steering = serious
- lifecycle: short INITIATION → focused WORK → quick RESOLUTION
@verbose_explorer (context front-loader)
- avg 39 turns, 43% success
- high handoff rate (30%)
- threads designed to chain, not complete
- lifecycle: extensive INITIATION → WORK → HANDOFF (repeat)
@feature_lead (abandoner)
- avg 21 turns, 26% success
- low steering, low resolution
- lifecycle: INITIATION → brief WORK → UNKNOWN
summary: lifecycle stages
| stage | turns | healthy signal | warning signal |
|---|---|---|---|
| INITIATION | 1-3 | file refs, clear scope | vague, moderate length |
| WORK | 4-N | uniform approvals, spawning | long stretches without approval |
| CORRECTION | any | single steer, specific alternative | consecutive steering, escalation |
| RESOLUTION | final | terse imperative, silence | profanity, abandonment |
recommendations
- anchor with files: @mentions in opener boost success 25%
- approve consistently: uniform small approvals beat occasional large ones
- break steering loops: consecutive corrections = pause and confirm understanding
- plan for runway: corrections need 16+ turns to recover
- recognize closure: “ship it” is explicit; silence after approval is implicit
- spawn strategically: depth 4-7 chains have highest resolution rates
- monitor ratio: below 1:1 approval:steering = intervention needed