complexity estimation from opener characteristics
analysis of 4,281 threads to predict thread complexity (length, steering) from first message features.
key finding: complexity is predictable from openers
opener characteristics correlate strongly with thread outcomes. specific signals predict both thread length and steering requirements.
strongest complexity predictors
| feature | avg turns WITH | avg turns WITHOUT | delta | signal direction |
|---|---|---|---|---|
is_collaborative (“we”, “let’s”) | 91.9 | 47.4 | +44.5 | long threads |
is_directive (“you”, “your”) | 69.1 | 48.4 | +20.7 | long threads |
has_url | 35.1 | 50.8 | -15.7 | short threads |
is_polite (“please”) | 36.4 | 51.1 | -14.7 | short threads |
has_code_block | 61.7 | 47.7 | +14.1 | long threads |
has_file_ref | 56.7 | 39.2 | +17.4 | long threads |
interpretation
- collaborative framing (“let’s”, “we”) predicts marathons. avg 91.9 turns vs 47.4. these threads imply iterative work.
- directive framing (“you are X”) predicts longer threads (69.1 avg). typically spawned sub-agents with complex tasks.
- polite framing (“please X”) predicts SHORT threads (36.4 avg). simple requests, quick resolution.
- URL presence predicts shorter threads (35.1 avg). often research/reading tasks, not implementation.
first word as complexity signal
| first word | count | avg turns | avg steering rate |
|---|---|---|---|
| we’re | 24 | 133.7 | 0.0135 |
| your | 20 | 129.3 | 0.0178 |
| let’s | 45 | 114.4 | 0.0175 |
| summarize | 41 | 83.2 | 0.0124 |
| implement | 35 | 74.1 | 0.0064 |
| continuing | 1,502 | 53.8 | 0.0100 |
| please | 667 | 36.4 | 0.0049 |
| migrate | 33 | 17.1 | n/a |
| using | 34 | 17.1 | n/a |
complexity tiers by first word
marathon signals (100+ avg turns):
- “we’re” (133.7) - session framing, extended work
- “your” (129.3) - spawned agent instructions
- “let’s” (114.4) - collaborative iteration
medium signals (50-100 avg turns):
- “summarize” (83.2) - research + synthesis
- “implement” (74.1) - feature work
- “review” (56.4) - review cycles
quick signals (<40 avg turns):
- “please” (36.4) - polite quick requests
- “migrate” (17.1) - scripted/scoped tasks
- “using” (17.1) - tool-specific queries
opener length vs complexity
| length bucket | count | avg turns | avg steering |
|---|---|---|---|
| tiny (<100 chars) | 504 | 49.9 | 0.0119 |
| short (100-300) | 925 | 44.5 | 0.0112 |
| medium (300-600) | 767 | 36.8 | 0.0058 |
| long (600-1500) | 956 | 35.6 | 0.0061 |
| verbose (1500+) | 1,129 | 71.0 | 0.0140 |
sweet spot: 300-1500 chars
- lowest steering rate (0.58-0.61%)
- shortest threads (35-37 avg turns)
- enough context to be clear, not so much to create confusion
u-shaped curve
- tiny prompts → medium threads + higher steering (ambiguous)
- medium prompts → shortest threads + lowest steering (goldilocks)
- verbose prompts → longest threads + highest steering (overwhelming context or complex tasks)
feature prevalence by complexity bucket
| feature | tiny (1-10) | small (11-25) | medium (26-50) | large (51-100) | marathon (100+) |
|---|---|---|---|---|---|
| has_file_ref | 35.6% | 53.5% | 65.5% | 70.2% | 64.3% |
| has_continuing | 33.4% | 24.8% | 30.2% | 45.5% | 44.2% |
| is_polite | 15.1% | 19.0% | 22.8% | 14.0% | 6.4% |
| is_collaborative | 1.5% | 2.3% | 2.4% | 5.1% | 6.1% |
| mentions_test | 43.6% | 42.9% | 54.3% | 63.4% | 64.0% |
| has_list | 39.4% | 42.0% | 45.1% | 55.0% | 52.0% |
patterns
- file refs increase with complexity - peaks at large (70.2%), still high in marathon (64.3%)
- politeness decreases with complexity - 19% in small, drops to 6.4% in marathon
- collaborative language increases with complexity - 1.5% tiny → 6.1% marathon
- test mentions increase with complexity - complex tasks involve more testing
steering predictors
| feature | steering WITH | steering WITHOUT | delta |
|---|---|---|---|
| is_collaborative | 0.0169 | 0.0097 | +74% |
| is_polite | 0.0049 | 0.0108 | -55% |
| is_directive | 0.0063 | 0.0100 | -37% |
| has_file_ref | 0.0116 | 0.0078 | +49% |
| is_question | 0.0137 | 0.0097 | +41% |
insights
- polite openers reduce steering by 55% - clear intent, agent knows what to do
- collaborative framing increases steering by 74% - implies back-and-forth, more intervention
- questions increase steering by 41% - exploratory threads need more guidance
practical complexity estimation heuristic
if first_word in ["we're", "your", "let's"]:
expect = "marathon (100+ turns)"
elif first_word == "please":
expect = "quick (30-40 turns)"
elif first_word == "continuing":
expect = "medium-long (50-60 turns)"
elif first_word in ["migrate", "using"]:
expect = "very quick (<20 turns)"
if length > 1500:
expect += " +15 turns (verbose penalty)"
elif 300 < length < 1500:
expect += " -10 turns (sweet spot)"
if has_file_ref:
expect += " +17 turns"
if is_collaborative:
expect += " +44 turns"
if is_polite:
expect -= " 15 turns"
recommendations for prompt design
- want quick resolution? start with “please”, keep under 600 chars
- expect iteration? use collaborative language (“let’s”, “we”) and budget for marathon
- spawning agents? “your” framing predicts long threads (129 avg) - scope carefully
- sweet spot for context: 300-1500 chars, include file refs, structured lists
data quality notes
- 4,281 threads analyzed with opener extraction
- steering/approval counts from labeling pass
- some threads lack content files (excluded from analysis)
- “continuing” threads (35% of corpus) are continuations, which may inflate their turn counts