debug patterns analysis
analysis of 678 threads containing “debug”, “fix”, or “bug” keywords.
success rates by completion status
| status | count | % of total |
|---|
| RESOLVED | 298 | 44.0% |
| UNKNOWN | 175 | 25.8% |
| HANDOFF | 116 | 17.1% |
| COMMITTED | 77 | 11.4% |
| EXPLORATORY | 9 | 1.3% |
| FRUSTRATED | 3 | 0.4% |
steering intensity vs success
| steering count | threads | resolved | success rate |
|---|
| 0 steers | 525 | 200 | 38.1% |
| 1-2 steers | 129 | 84 | 65.1% |
| 3-5 steers | 21 | 13 | 61.9% |
| 6+ steers | 3 | 1 | 33.3% |
key insight: moderate steering (1-2 interventions) correlates with HIGHEST success rate. zero steering underperforms significantly—likely represents cases where agent got stuck or went off-track without correction. heavy steering (6+) suggests fundamental confusion about the problem.
keyword breakdown
| keyword | threads | success rate | avg turns | avg steers |
|---|
| bug | 42 | 69.0% | 76.3 | 0.69 |
| debug | 152 | 53.3% | 67.1 | 0.53 |
| fix | 484 | 38.8% | 47.9 | 0.32 |
insight: “bug” threads have highest success—likely because they’re scoped investigations. “fix” threads are often ambiguous (“fix this”, “fix conflicts”) and underperform. specificity matters.
thread length vs outcome
| length | threads | success rate | avg steers |
|---|
| short (<20 turns) | 275 | 16.0% | 0.01 |
| medium (20-50) | 124 | 54.0% | 0.16 |
| long (51-100) | 156 | 62.8% | 0.52 |
| very long (100+) | 123 | 72.4% | 1.29 |
insight: longer threads correlate with higher success. short threads often represent abandoned attempts or simple queries that weren’t true debugging sessions.
frustrated cases (3 total)
| thread | turns | steers |
|---|
| Debug sort_optimization panic with constant columns | 252 | 9 |
| Fix this | 124 | 2 |
| Debug TestService registration error | 133 | 2 |
common pattern: high-churn threads with unclear problem definitions.
high-steering threads (6+ steers)
| thread | steers | turns | outcome |
|---|
| Debug sort_optimization panic with constant columns | 9 | 252 | UNKNOWN |
| Review diff and bug fixes | 7 | 175 | RESOLVED |
| Investigating potential storage_optimizer brain code bug | 7 | 138 | UNKNOWN |
high-steering often correlates with exploratory debugging without clear repro steps.
outcome by status (avg metrics)
| status | avg turns | avg steers |
|---|
| RESOLVED | 81.2 | 0.55 |
| COMMITTED | 43.2 | 0.22 |
| HANDOFF | 37.4 | 0.16 |
| FRUSTRATED | 123.3 | 1.67 |
| UNKNOWN | 24.5 | 0.34 |
recommendations
- steer early, steer once: 1-2 steering interventions dramatically improve outcomes (65% vs 38%)
- scope before starting: “bug” threads succeed at 69% vs “fix” at 39%. specific problem framing matters.
- don’t abandon early: short threads (<20 turns) have 16% success. debugging needs persistence.
- watch for thrash: 6+ steers signals the agent is confused about the goal—consider reframing.
- avoid vague titles: “Fix this” threads underperform. clear problem statements improve outcomes.