debug patterns analysis

analysis of 678 threads containing “debug”, “fix”, or “bug” keywords.

success rates by completion status

status	count	% of total
RESOLVED	298	44.0%
UNKNOWN	175	25.8%
HANDOFF	116	17.1%
COMMITTED	77	11.4%
EXPLORATORY	9	1.3%
FRUSTRATED	3	0.4%

steering intensity vs success

steering count	threads	resolved	success rate
0 steers	525	200	38.1%
1-2 steers	129	84	65.1%
3-5 steers	21	13	61.9%
6+ steers	3	1	33.3%

key insight: moderate steering (1-2 interventions) correlates with HIGHEST success rate. zero steering underperforms significantly—likely represents cases where agent got stuck or went off-track without correction. heavy steering (6+) suggests fundamental confusion about the problem.

keyword breakdown

keyword	threads	success rate	avg turns	avg steers
bug	42	69.0%	76.3	0.69
debug	152	53.3%	67.1	0.53
fix	484	38.8%	47.9	0.32

insight: “bug” threads have highest success—likely because they’re scoped investigations. “fix” threads are often ambiguous (“fix this”, “fix conflicts”) and underperform. specificity matters.

thread length vs outcome

length	threads	success rate	avg steers
short (<20 turns)	275	16.0%	0.01
medium (20-50)	124	54.0%	0.16
long (51-100)	156	62.8%	0.52
very long (100+)	123	72.4%	1.29

insight: longer threads correlate with higher success. short threads often represent abandoned attempts or simple queries that weren’t true debugging sessions.

frustrated cases (3 total)

thread	turns	steers
Debug sort_optimization panic with constant columns	252	9
Fix this	124	2
Debug TestService registration error	133	2

common pattern: high-churn threads with unclear problem definitions.

high-steering threads (6+ steers)

thread	steers	turns	outcome
Debug sort_optimization panic with constant columns	9	252	UNKNOWN
Review diff and bug fixes	7	175	RESOLVED
Investigating potential storage_optimizer brain code bug	7	138	UNKNOWN

high-steering often correlates with exploratory debugging without clear repro steps.

outcome by status (avg metrics)

status	avg turns	avg steers
RESOLVED	81.2	0.55
COMMITTED	43.2	0.22
HANDOFF	37.4	0.16
FRUSTRATED	123.3	1.67
UNKNOWN	24.5	0.34

recommendations

steer early, steer once: 1-2 steering interventions dramatically improve outcomes (65% vs 38%)
scope before starting: “bug” threads succeed at 69% vs “fix” at 39%. specific problem framing matters.
don’t abandon early: short threads (<20 turns) have 16% success. debugging needs persistence.
watch for thrash: 6+ steers signals the agent is confused about the goal—consider reframing.
avoid vague titles: “Fix this” threads underperform. clear problem statements improve outcomes.