first message patterns
analysis of 4,281 threads with first user messages.
length vs outcome
| length category | n | avg turns | avg steering | success rate |
|---|---|---|---|---|
| terse (<50 chars) | 199 | 52.0 | 0.49 | 60.8% |
| brief (50-150) | 612 | 47.5 | 0.42 | 62.6% |
| moderate (150-500) | 1,303 | 39.6 | 0.24 | 54.7% |
| detailed (500-1500) | 1,106 | 37.6 | 0.21 | 42.8% |
| extensive (1500+) | 1,061 | 71.8 | 0.55 | 64.6% |
observations
U-shaped success curve: brief and extensive messages outperform moderate ones.
- brief messages (62.6% success): likely simple tasks. “fix this typo” needs no elaboration.
- detailed messages (42.8% success, LOWEST): possibly over-specified but under-contextualized? enough complexity to require steering, not enough context to avoid it.
- extensive messages (64.6% success): front-loaded context pays off despite longer threads (71.8 avg turns).
steering correlates with length extremes: terse (0.49) and extensive (0.55) messages lead to more steering than moderate ones (0.24). terse = underspecified, extensive = complex tasks.
specificity markers
| marker | n | avg turns | avg steering | success rate |
|---|---|---|---|---|
| with file mentions (@) | 2,349 | 56.6 | 0.39 | 66.7% |
| no file mentions | 1,932 | 39.2 | 0.29 | 41.8% |
| continuations | 1,239 | 62.9 | 0.47 | 57.2% |
| fresh starts | 3,042 | 43.0 | 0.29 | 54.8% |
key finding: FILE REFERENCES = +25% SUCCESS
threads starting with file references (@path/to/file.ts) have 66.7% success vs 41.8% without. this is the single strongest predictor in the dataset.
code blocks don’t help much (52.8% vs 56.5%) — possibly because pasting code without file context is less actionable than referencing files directly.
opening style
| style | n | avg turns | avg steering | success rate |
|---|---|---|---|---|
| question (“how”, “what”, “why”) | 169 | 32.8 | 0.27 | 62.1% |
| imperative (“fix”, “add”, “create”) | 912 | 37.3 | 0.15 | 58.9% |
| continuation | 1,502 | 53.8 | 0.40 | 49.3% |
| declarative (“i want”, “i need”) | 54 | 53.4 | 0.52 | 53.7% |
| other | 1,644 | 52.1 | 0.41 | 58.6% |
observations
- questions have highest success (62.1%) and low steering (0.27). exploratory threads may have clearer success criteria (“did it answer the question?”).
- imperatives have lowest steering (0.15) — direct commands leave less room for misinterpretation.
- continuations underperform (49.3%) despite explicit context passing. possible factors: inherited complexity, context loss between threads, tasks that were already struggling.
per-user patterns
| user | threads | avg first msg len | avg turns | avg steering | success rate |
|---|---|---|---|---|---|
| @concise_commander | 1,218 | 1,274 | 86.6 | 0.81 | 71.8% |
| @steady_navigator | 1,171 | 1,255 | 36.5 | 0.10 | 67.0% |
| @verbose_explorer | 875 | 1,519 | 39.1 | 0.28 | 43.2% |
| @precision_pilot | 90 | 4,280 | 72.9 | 0.41 | 82.2% |
| @swift_solver | 36 | 1,447 | 45.5 | 0.69 | 88.9% |
| @patient_pathfinder | 150 | 608 | 20.3 | 0.20 | 54.0% |
| @feature_lead | 146 | 1,246 | 20.7 | 0.08 | 26.0% |
archetypes
@precision_pilot: marathon front-loader. avg 4,280 char first messages → 82.2% success. proves extensive context works if you commit.
@steady_navigator: efficient imperative user. moderate length (1,255), minimal steering (0.10), 67% success. threads end quickly (36.5 turns).
@concise_commander: high-steering marathoner. long threads (86.6 turns), high steering (0.81), but still 71.8% success. steers toward goal rather than abandoning.
@verbose_explorer: context front-loader. 932 char avg messages with extensive spawn orchestration. resolution rate corrected to 83% after fixing spawn misclassification (was 43.2%). handoff rate only 4.2%.
@feature_lead: lowest success (26.0%) despite low steering. short threads (20.7 turns) suggest early abandonment rather than resolution.
recommendations
- include file references — @mentions boost success by 25 percentage points
- brief OR extensive, not moderate — if the task is complex enough to explain, explain it fully
- imperative > declarative — “fix X” outperforms “i want X fixed”
- questions are underrated — exploratory threads have clearer success criteria
caveats
- completion_status heuristics may misclassify some threads
- success = RESOLVED + COMMITTED, which conflates “answered” with “deployed”
- user sample sizes vary significantly (36 vs 1,218 threads)
- “extensive” messages may include automated context injection, inflating length