thread title patterns and outcome prediction
analysis of 4,656 thread titles across outcome categories. do titles predict success?
summary
tldr: titles have WEAK predictive power. the strongest signals:
- short titles (≤4 words) → 35% end UNKNOWN vs 6% RESOLVED
- “error” in title → 14% FRUSTRATED vs 4% RESOLVED
- “fix” in title → 17% COMMITTED (vs 8% RESOLVED)
- verb-first titles slightly favor action outcomes (COMMITTED, HANDOFF)
titles mostly reflect what the thread BECAME, not what it was ASKED to be. amp auto-generates titles from content, so causality is muddy.
outcome distribution
| status | count | % |
|---|---|---|
| RESOLVED | 2,745 | 59% |
| UNKNOWN | 1,560 | 34% |
| HANDOFF | 75 | 1.6% |
| COMMITTED | 305 | 7% |
| EXPLORATORY | 124 | 3% |
| FRUSTRATED | 14 | <1% |
| PENDING | 8 | <1% |
| STUCK | 1 | <1% |
title length
| status | avg chars |
|---|---|
| UNKNOWN | 34.2 |
| EXPLORATORY | 42.0 |
| FRUSTRATED | 41.4 |
| RESOLVED | 44.0 |
| COMMITTED | 43.8 |
| HANDOFF | 44.8 |
short titles correlate with UNKNOWN outcomes. 35% of UNKNOWN threads have ≤4-word titles vs only 6% of RESOLVED. makes sense: vague asks → vague results.
FRUSTRATED threads also skew short (21% are ≤4 words). sample titles:
- “Fix this”
- “Untitled”
verb patterns
% of threads where title starts with common action verbs:
| status | starts with verb |
|---|---|
| EXPLORATORY | 12% |
| RESOLVED | 30% |
| UNKNOWN | 33% |
| FRUSTRATED | 36% |
| COMMITTED | 36% |
| HANDOFF | 37% |
verb-first doesn’t strongly predict outcome. all categories cluster around 30-37% except EXPLORATORY (12%), which makes sense—exploratory threads are often noun-phrase questions.
keyword signals
”error” in title
| status | % with “error” |
|---|---|
| RESOLVED | 3.7% |
| COMMITTED | 1.0% |
| EXPLORATORY | 9.7% |
| FRUSTRATED | 14.3% |
“error” in title has 4x higher incidence in FRUSTRATED threads. these are often debugging sessions that don’t resolve cleanly.
”fix” in title
| status | % with “fix” |
|---|---|
| EXPLORATORY | 1.6% |
| RESOLVED | 8.4% |
| UNKNOWN | 8.5% |
| HANDOFF | 13.4% |
| FRUSTRATED | 14.3% |
| COMMITTED | 16.7% |
“fix” predicts COMMITTED (explicit git push) 2x more than RESOLVED. likely because “fix X” implies a discrete change that gets shipped.
”add” in title
| status | % with “add” |
|---|---|
| RESOLVED | 4.0% |
| COMMITTED | 3.3% |
| FRUSTRATED | 14.3% |
“add” has unusually high incidence in FRUSTRATED threads. sample: “Add comprehensive tests for storage data reorganization”, “Add overflow menu to prompts list”. addition tasks may have more ambiguity/scope creep.
distinctive vocabulary by outcome
COMMITTED (high-lift words)
commit(5.7x lift),push(5.1x lift),lint(5.1x lift)fix(3.8x lift),sizing(8.5x lift)- issue IDs like
ISSUE-XXXX(8.7x lift)
these are narrow, well-scoped tasks with explicit git operations.
HANDOFF (high-lift words)
verification(7.3x lift),review-rounds(8.1x lift)trpc,obsidian,plugin- agent coordination terms:
dig,claims
handoff threads often involve spawning subagents or continuing elsewhere.
EXPLORATORY (high-lift words)
error(3.6x lift),type(3.4x lift)configuration,diff,opentelemetryimport,json,typescript
quick lookups, usually about debugging/understanding rather than changing.
UNKNOWN (high-lift words)
hello(3.0x lift),analyses(3.0x lift)- various investigation compound words:
fieldsmetamap-investigation,knowledge-gaps-resolved
many are ephemeral or incomplete threads.
RESOLVED (high-lift words)
explanation,breakdown,backgroundstream,editing,positioning,click
concrete nouns and actions that got addressed.
frustrated thread sample
all 14 FRUSTRATED titles:
- Fix this
- Scoped context isolation vs oracle recommendation
- Click-to-edit Input controller for team-intelligence
- Hilbert clustering timestamp resolution and time-first tradeoffs
- Add comprehensive tests for storage data reorganization
- Untitled
- Fix concurrent append race conditions with Effect
- Optimize cuckoo filter construction with partitioned filters
- Resolve deploy_cli module import error
- Modify diff generation in GitDiffView component
- storage_optimizer trim race condition documentation
- Concurrent event fetching and decoupled I/O
- Add overflow menu to prompts list
- Debug TestService registration error
patterns:
- vague: “Fix this”, “Untitled”
- complex concurrent/race condition work (4 of 14)
- optimization tasks that probably hit walls
predictive power: modest at best
titles can flag risk:
- short + vague → likely UNKNOWN
- “error” present → elevated frustration risk
- “fix” present → higher commit rate
but titles are mostly DESCRIPTIVE, not PRESCRIPTIVE. amp generates them from conversation content, so they reflect what happened more than what was asked.
better predictors (from other analyses):
- file references at thread start → +25% success
- steering without approval → poor outcomes
- 26-50 turn sweet spot → highest resolution rate
methodology
- source: sqlite db with 4,656 threads labeled by
label.js - tokenization: lowercase, remove stopwords, split on whitespace
- lift calculation:
(freq in status) / (freq global)with min count 3 - patterns: regex matching on title text