thread lifecycle: phases, transitions, outcomes

analysis of 4,656 threads mapping the typical lifecycle of successful vs failed threads.

lifecycle model

every thread follows a lifecycle with identifiable phases. success and failure diverge at predictable transition points.

┌─────────────────────────────────────────────────────────────────────────────┐
│                           THREAD LIFECYCLE                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   INITIATION          WORK              CORRECTION         RESOLUTION       │
│   ──────────         ──────            ────────────        ──────────       │
│                                                                             │
│   ┌─────────┐       ┌─────────┐       ┌─────────┐        ┌─────────┐       │
│   │ opener  │──────►│ execute │──────►│ steer   │───────►│ resolve │       │
│   └─────────┘       └─────────┘       └─────────┘        └─────────┘       │
│        │                 │                 │                  ▲             │
│        │                 │                 │                  │             │
│        │                 ▼                 ▼                  │             │
│        │           ┌─────────┐       ┌─────────┐              │             │
│        └──────────►│ approve │──────►│ approve │──────────────┘             │
│                    └─────────┘       └─────────┘                            │
│                         │                 │                                 │
│                         │                 ▼                                 │
│                         │           ┌─────────┐        ┌─────────┐          │
│                         │           │ steer   │───────►│FRUSTRATED│         │
│                         │           │ (loop)  │        └─────────┘          │
│                         │           └─────────┘                             │
│                         │                                                   │
│                         ▼                                                   │
│                   ┌─────────┐                                               │
│                   │ handoff │                                               │
│                   └─────────┘                                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

phase 1: INITIATION (turns 1-3)

the opening message determines trajectory. three patterns:

successful initiation patterns

pattern	success rate	characteristics
file-anchored	66.7%	includes @path/to/file references
continuation	57.2%	“Continuing from thread T-xxx…“
question-opener	62.1%	starts with “how/what/why”
imperative	58.9%	starts with “fix/add/create”

failed initiation patterns

pattern	success rate	characteristics
moderate-length	42.8%	150-500 chars (worst category)
no file refs	41.8%	no @mentions, no context anchors
vague opener	~35%	“fix this”, “run and debug X”
inherited mess	~30%	continuing from problematic parent

key insight: file references (@path/file) boost success by +25 percentage points. this is the single strongest initiation predictor.

length paradox

success follows a U-curve:

brief (<150 chars): 62% success — simple, clear tasks
moderate (150-500 chars): 43% success (LOWEST) — complex but undercontextualized
extensive (1500+ chars): 65% success — front-loaded context pays off

phase 2: WORK (turns 4-N)

the productive phase where agent executes and user monitors. healthy work phase characteristics:

approval distribution

successful threads maintain uniform approval distribution across phases:

phase	approval density
early (0-33%)	1.85 avg
middle (33-66%)	1.91 avg
late (66-100%)	1.87 avg

insight: no front-loading or back-loading. consistent small approvals maintain momentum better than occasional large ones.

optimal turn counts

turn bucket	threads	success rate	frustration rate
1-10	1,690	14.2%	0.1%
11-25	823	58.0%	0.1%
26-50	705	75.0%	0.4%
51-100	786	78.0%	0.4%
100+	652	79.1%	0.9%

sweet spot: 26-50 turns. short threads (<10) are usually abandoned queries, not completed work. beyond 100+, frustration risk increases.

spawning behavior

threads that spawn subtasks have different profiles:

metric	spawning threads	non-spawning
resolution rate	43.8%	~50%
handoff rate	34.8%	12%
optimal spawn depth	4-7 levels	n/a

spawning isn’t about resolution in the CURRENT thread — it’s about decomposing complex work. chains with depth 4-7 have highest overall resolution.

phase 3: CORRECTION (optional)

when steering happens, the thread enters correction phase. this is NOT failure — 62% of steered threads recover.

steering types (ordered by recovery rate)

steering type	recovery rate	characteristics
wait/pause	~70%	“wait, let me clarify” — user catches before damage
questioning	~65%	“why did you…?” — prompts reflection
specific redirect	~60%	“no, use X instead” — gives alternative
prohibition	~50%	“don’t do X” — unclear what TO do
emphatic_no	~40%	“no no no” — frustration emerging
wtf	~20%	emotional escalation — recovery unlikely

the steering→approval transition

in recovered threads:

STEERING → APPROVAL: 360 occurrences (healthy recovery)
STEERING → STEERING: 228 occurrences (doom loop risk)

ratio of 1.6:1 suggests agents typically respond well to single corrections. consecutive steering (STEERING→STEERING) is the danger signal.

recovery runway

threads need runway after correction:

turns after last steering	% of recovered threads
30+	57%
16-30	23%
6-15	17%
0-5	3%

80% of recoveries need 16+ turns after correction. plan for iteration time.

phase 4a: RESOLUTION (successful termination)

threads terminate through several patterns:

COMMITTED (305 threads, 6.6%)

explicit ship ritual:

signal	frequency
”ship it”	12%
“commit and push”	7%
“commit”	4%
“lgtm”	<1%

55% of final messages <50 chars. committed threads close with terse imperatives.

approval:steering ratio: 4.29:1 — strong agreement throughout.

RESOLVED (2,070 threads, 44.5%)

implicit completion — user stops talking:

final message pattern	frequency
unclassified	48%
questions	20%
imperatives	15%
short approvals	13%
thanks	<1%

gratitude is rare (0.4%). threads don’t celebrate — they fade.

approval:steering ratio: 2.07:1 — healthy balance.

HANDOFF (75 threads, 1.6%)

explicit delegation to child thread:

“Continuing work from thread T-xxx…”
spawned agents with attached file context
task decomposition

approval:steering ratio: 2.76:1 — reasonable progress before handoff.

EXPLORATORY (124 threads, 2.7%)

quick lookups that complete immediately:

avg 5.8 turns
zero steering, zero approval
question asked → answer given → done

phase 4b: FAILURE (unsuccessful termination)

FRUSTRATED (14 threads, 0.3%)

thread ends on user frustration:

characteristic	value
avg turns	84.3
steering rate	1.71 (4x higher than resolved)
approval rate	0.86
wtf rate	33% (vs 3.5% in resolved)
ratio	0.50:1 (inverted)

signature patterns:

escalating ALL CAPS
combined profanity + caps
thread abandons mid-steering
no resolution, just corrections

STUCK (1 thread)

complete failure:

128 turns
4 steerings, 0 approvals
ratio: 0.00:1
all steering, no approval = death

UNKNOWN (1,560 threads, 33.5%)

abandoned or ambiguous:

avg 16 turns (short)
0.43:1 ratio
likely early abandonment

transition probabilities

based on message sequence analysis:

healthy transitions (maintain or improve trajectory)

NEUTRAL → NEUTRAL     [most common, work continues]
NEUTRAL → APPROVAL    [progress acknowledged]
APPROVAL → APPROVAL   [momentum building]
STEERING → APPROVAL   [correction accepted, back on track]

warning transitions

NEUTRAL → STEERING    [first correction, 50% recovery]
STEERING → STEERING   [doom loop, 40% recovery]
APPROVAL → STEERING   [regression after progress]

terminal transitions

STEERING → FRUSTRATED [emotional escalation, <20% recovery]
STEERING → STUCK      [complete breakdown]
ANY → ABANDONED       [user stops engaging]

outcome prediction formula

based on quantitative analysis:

success_probability = 
  base_rate (55%)
  + file_refs_in_opener     * 25%
  + approval_steering_ratio * 10%  (if >2:1)
  - steering_steering_loop  * 20%
  - wtf_present             * 30%
  - moderate_opener_length  * 10%  (150-500 chars)

threshold alerts

condition	action
ratio drops below 1:1	yellow flag — suggest rephrasing
2+ consecutive steerings	orange flag — meta-acknowledge
wtf/profanity appears	red flag — offer handoff/oracle
15+ turns with 0 approvals	yellow flag — check engagement

user-specific lifecycle patterns

@concise_commander (marathoner)

avg 85 turns, 71.8% success
high steering (0.81) but recovers
steers toward goal rather than abandoning
lifecycle: long WORK phase, frequent small corrections, eventual RESOLUTION

@steady_navigator (efficient commander)

avg 36 turns, 67% success
minimal steering (0.10)
single steering = serious
lifecycle: short INITIATION → focused WORK → quick RESOLUTION

@verbose_explorer (context front-loader)

avg 39 turns, 43% success
high handoff rate (30%)
threads designed to chain, not complete
lifecycle: extensive INITIATION → WORK → HANDOFF (repeat)

@feature_lead (abandoner)

avg 21 turns, 26% success
low steering, low resolution
lifecycle: INITIATION → brief WORK → UNKNOWN

summary: lifecycle stages

stage	turns	healthy signal	warning signal
INITIATION	1-3	file refs, clear scope	vague, moderate length
WORK	4-N	uniform approvals, spawning	long stretches without approval
CORRECTION	any	single steer, specific alternative	consecutive steering, escalation
RESOLUTION	final	terse imperative, silence	profanity, abandonment

recommendations

anchor with files: @mentions in opener boost success 25%
approve consistently: uniform small approvals beat occasional large ones
break steering loops: consecutive corrections = pause and confirm understanding
plan for runway: corrections need 16+ turns to recover
recognize closure: “ship it” is explicit; silence after approval is implicit
spawn strategically: depth 4-7 chains have highest resolution rates
monitor ratio: below 1:1 approval:steering = intervention needed