code quality signals analysis
analysis of 4,656 threads for lint errors, type errors, test failures and their correlation with outcomes.
key findings
1. error presence correlates with SUCCESSFUL outcomes (counterintuitive)
| outcome | threads | any error signal |
|---|---|---|
| RESOLVED | 2,745 | 97.8% |
| COMMITTED | 305 | 81.6% |
| HANDOFF | 75 | 64.1% |
| FRUSTRATED | 14 | 114.3%* |
| EXPLORATORY | 124 | 12.1% |
| UNKNOWN | 1,560 | 42.9% |
*>100% means multiple error types per thread
interpretation: threads that encounter and work through errors tend to reach resolution. EXPLORATORY threads (12.1% error rate) rarely hit errors because they’re not attempting real changes.
2. error type distribution
| signal | threads affected | % of corpus |
|---|---|---|
| test failures | 1,471 | 31.6% |
| type errors | 798 | 17.1% |
| build errors | 604 | 13.0% |
| lint errors | 479 | 10.3% |
| runtime errors | 136 | 2.9% |
test failures are the DOMINANT signal - agents encounter them in ~1/3 of all threads.
3. error resolution patterns (CONCERNING)
among 1,304 threads with errors in outcome-labeled categories:
| resolution | count | rate |
|---|---|---|
| fixed properly | 237 | 18.2% |
| workaround used | 934 | 71.6% |
| unresolved | 133 | 10.2% |
71.6% workaround rate - agents use @ts-ignore, @ts-expect-error, eslint-disable, or similar suppressions FAR more often than actually fixing issues.
2,283 instances of error suppression directives found across threads.
4. steering correlation with errors
threads encountering errors by steering level:
| steering | threads with errors |
|---|---|
| low (0-1) | 1,100 (84.4%) |
| medium (2-3) | 166 (12.7%) |
| high (4+) | 38 (2.9%) |
most error encounters happen with LOW steering - agents attempt to fix autonomously. high-steering threads have fewer errors because users are providing more guidance, often avoiding error-prone paths.
5. FRUSTRATED threads: the error story
the 14 FRUSTRATED threads show highest test failure rate (64.3%). pattern:
- user encounters errors
- agent attempts fix
- fix creates more errors
- frustration ensues
recommendations for AGENTS.md
## error handling guidelines
1. **run typecheck/lint BEFORE committing** - not after
2. **never suppress errors to pass checks** - fix root cause
3. **test failures require investigation** - don't just modify assertions
4. **escalate after 2 failed fix attempts** - ask user for guidance
signal quality assessment
- test failures: HIGH SIGNAL - reliably indicates real issues
- type errors: HIGH SIGNAL - catches actual bugs
- lint errors: MEDIUM SIGNAL - often style, sometimes real issues
- build errors: HIGH SIGNAL - blocks progress
- runtime errors: LOW OCCURRENCE but HIGH SEVERITY when present
raw data
| metric | value |
|---|---|
| total threads analyzed | 4,656 |
| threads with any error | 2,221 (47.7%) |
| test fail mentions | 1,471 |
| type error mentions | 798 |
| suppression directives | 2,283 |