code quality signals analysis

analysis of 4,656 threads for lint errors, type errors, test failures and their correlation with outcomes.

key findings

1. error presence correlates with SUCCESSFUL outcomes (counterintuitive)

outcome	threads	any error signal
RESOLVED	2,745	97.8%
COMMITTED	305	81.6%
HANDOFF	75	64.1%
FRUSTRATED	14	114.3%*
EXPLORATORY	124	12.1%
UNKNOWN	1,560	42.9%

*>100% means multiple error types per thread

interpretation: threads that encounter and work through errors tend to reach resolution. EXPLORATORY threads (12.1% error rate) rarely hit errors because they’re not attempting real changes.

2. error type distribution

signal	threads affected	% of corpus
test failures	1,471	31.6%
type errors	798	17.1%
build errors	604	13.0%
lint errors	479	10.3%
runtime errors	136	2.9%

test failures are the DOMINANT signal - agents encounter them in ~1/3 of all threads.

3. error resolution patterns (CONCERNING)

among 1,304 threads with errors in outcome-labeled categories:

resolution	count	rate
fixed properly	237	18.2%
workaround used	934	71.6%
unresolved	133	10.2%

71.6% workaround rate - agents use @ts-ignore, @ts-expect-error, eslint-disable, or similar suppressions FAR more often than actually fixing issues.

2,283 instances of error suppression directives found across threads.

4. steering correlation with errors

threads encountering errors by steering level:

steering	threads with errors
low (0-1)	1,100 (84.4%)
medium (2-3)	166 (12.7%)
high (4+)	38 (2.9%)

most error encounters happen with LOW steering - agents attempt to fix autonomously. high-steering threads have fewer errors because users are providing more guidance, often avoiding error-prone paths.

5. FRUSTRATED threads: the error story

the 14 FRUSTRATED threads show highest test failure rate (64.3%). pattern:

user encounters errors
agent attempts fix
fix creates more errors
frustration ensues

recommendations for AGENTS.md

## error handling guidelines

1. **run typecheck/lint BEFORE committing** - not after
2. **never suppress errors to pass checks** - fix root cause
3. **test failures require investigation** - don't just modify assertions
4. **escalate after 2 failed fix attempts** - ask user for guidance

signal quality assessment

test failures: HIGH SIGNAL - reliably indicates real issues
type errors: HIGH SIGNAL - catches actual bugs
lint errors: MEDIUM SIGNAL - often style, sometimes real issues
build errors: HIGH SIGNAL - blocks progress
runtime errors: LOW OCCURRENCE but HIGH SEVERITY when present

raw data

metric	value
total threads analyzed	4,656
threads with any error	2,221 (47.7%)
test fail mentions	1,471
type error mentions	798
suppression directives	2,283