thread title patterns and outcome prediction

analysis of 4,656 thread titles across outcome categories. do titles predict success?

summary

tldr: titles have WEAK predictive power. the strongest signals:

short titles (≤4 words) → 35% end UNKNOWN vs 6% RESOLVED
“error” in title → 14% FRUSTRATED vs 4% RESOLVED
“fix” in title → 17% COMMITTED (vs 8% RESOLVED)
verb-first titles slightly favor action outcomes (COMMITTED, HANDOFF)

titles mostly reflect what the thread BECAME, not what it was ASKED to be. amp auto-generates titles from content, so causality is muddy.

outcome distribution

status	count	%
RESOLVED	2,745	59%
UNKNOWN	1,560	34%
HANDOFF	75	1.6%
COMMITTED	305	7%
EXPLORATORY	124	3%
FRUSTRATED	14	<1%
PENDING	8	<1%
STUCK	1	<1%

title length

status	avg chars
UNKNOWN	34.2
EXPLORATORY	42.0
FRUSTRATED	41.4
RESOLVED	44.0
COMMITTED	43.8
HANDOFF	44.8

short titles correlate with UNKNOWN outcomes. 35% of UNKNOWN threads have ≤4-word titles vs only 6% of RESOLVED. makes sense: vague asks → vague results.

FRUSTRATED threads also skew short (21% are ≤4 words). sample titles:

“Fix this”
“Untitled”

verb patterns

% of threads where title starts with common action verbs:

status	starts with verb
EXPLORATORY	12%
RESOLVED	30%
UNKNOWN	33%
FRUSTRATED	36%
COMMITTED	36%
HANDOFF	37%

verb-first doesn’t strongly predict outcome. all categories cluster around 30-37% except EXPLORATORY (12%), which makes sense—exploratory threads are often noun-phrase questions.

keyword signals

”error” in title

status	% with “error”
RESOLVED	3.7%
COMMITTED	1.0%
EXPLORATORY	9.7%
FRUSTRATED	14.3%

“error” in title has 4x higher incidence in FRUSTRATED threads. these are often debugging sessions that don’t resolve cleanly.

”fix” in title

status	% with “fix”
EXPLORATORY	1.6%
RESOLVED	8.4%
UNKNOWN	8.5%
HANDOFF	13.4%
FRUSTRATED	14.3%
COMMITTED	16.7%

“fix” predicts COMMITTED (explicit git push) 2x more than RESOLVED. likely because “fix X” implies a discrete change that gets shipped.

”add” in title

status	% with “add”
RESOLVED	4.0%
COMMITTED	3.3%
FRUSTRATED	14.3%

“add” has unusually high incidence in FRUSTRATED threads. sample: “Add comprehensive tests for storage data reorganization”, “Add overflow menu to prompts list”. addition tasks may have more ambiguity/scope creep.

distinctive vocabulary by outcome

COMMITTED (high-lift words)

commit (5.7x lift), push (5.1x lift), lint (5.1x lift)
fix (3.8x lift), sizing (8.5x lift)
issue IDs like ISSUE-XXXX (8.7x lift)

these are narrow, well-scoped tasks with explicit git operations.

HANDOFF (high-lift words)

verification (7.3x lift), review-rounds (8.1x lift)
trpc, obsidian, plugin
agent coordination terms: dig, claims

handoff threads often involve spawning subagents or continuing elsewhere.

EXPLORATORY (high-lift words)

error (3.6x lift), type (3.4x lift)
configuration, diff, opentelemetry
import, json, typescript

quick lookups, usually about debugging/understanding rather than changing.

UNKNOWN (high-lift words)

hello (3.0x lift), analyses (3.0x lift)
various investigation compound words: fieldsmetamap-investigation, knowledge-gaps-resolved

many are ephemeral or incomplete threads.

RESOLVED (high-lift words)

explanation, breakdown, background
stream, editing, positioning, click

concrete nouns and actions that got addressed.

frustrated thread sample

all 14 FRUSTRATED titles:

Fix this
Scoped context isolation vs oracle recommendation
Click-to-edit Input controller for team-intelligence
Hilbert clustering timestamp resolution and time-first tradeoffs
Add comprehensive tests for storage data reorganization
Untitled
Fix concurrent append race conditions with Effect
Optimize cuckoo filter construction with partitioned filters
Resolve deploy_cli module import error
Modify diff generation in GitDiffView component
storage_optimizer trim race condition documentation
Concurrent event fetching and decoupled I/O
Add overflow menu to prompts list
Debug TestService registration error

patterns:

vague: “Fix this”, “Untitled”
complex concurrent/race condition work (4 of 14)
optimization tasks that probably hit walls

predictive power: modest at best

titles can flag risk:

short + vague → likely UNKNOWN
“error” present → elevated frustration risk
“fix” present → higher commit rate

but titles are mostly DESCRIPTIVE, not PRESCRIPTIVE. amp generates them from conversation content, so they reflect what happened more than what was asked.

better predictors (from other analyses):

file references at thread start → +25% success
steering without approval → poor outcomes
26-50 turn sweet spot → highest resolution rate

methodology

source: sqlite db with 4,656 threads labeled by label.js
tokenization: lowercase, remove stopwords, split on whitespace
lift calculation: (freq in status) / (freq global) with min count 3
patterns: regex matching on title text