Refactoring Patterns Analysis
analysis of 245 threads containing “refactor”, “migrate”, or “upgrade” in titles.
Success Rates by Task Type
| type | total | success | rate | avg turns | avg steering |
|---|---|---|---|---|---|
| refactor | 150 | 95 | 63.3% | 62.2 | 0.46 |
| upgrade | 8 | 3 | 37.5% | 26.0 | 0.63 |
| migrate | 87 | 18 | 20.7% | 33.3 | 0.05 |
key insight: refactoring succeeds 3x more often than migration. migrations have lowest steering but lowest success—suggests agents complete without verification.
Completion Status Distribution
| status | count | percentage |
|---|---|---|
| RESOLVED | 102 | 41.6% |
| UNKNOWN | 90 | 36.7% |
| HANDOFF | 38 | 15.5% |
| COMMITTED | 14 | 5.7% |
combined success rate (RESOLVED+COMMITTED): 47.3%
Turn Analysis
| outcome | avg turns | min | max | count |
|---|---|---|---|---|
| success | 75.5 | 3 | 433 | 116 |
| incomplete | 28.4 | 2 | 195 | 129 |
insight: successful refactors take ~2.7x more turns. short threads correlate with incomplete work—agents that bail early leave tasks unfinished.
User Patterns
| user | total | success | rate | avg turns |
|---|---|---|---|---|
| @concise_commander | 71 | 49 | 69.0% | 87.4 |
| @steady_navigator | 54 | 40 | 74.1% | 40.0 |
| @verbose_explorer | 39 | — | — | 55.6 |
| @precision_pilot | 8 | 7 | 87.5% | 66.9 |
| @patient_pathfinder | 5 | 1 | 20.0% | 50.0 |
patterns:
- @concise_commander: high-turn, high-steering socratic approach yields 69% success
- @steady_navigator: balanced turns with strong success (74%)
NOTE: @verbose_explorer’s refactor success rate was previously reported as 28%, but this was based on spawn-misclassified data. with corrected overall stats (83% resolution), @verbose_explorer’s refactor-specific success is unknown and needs recomputation from clean data.
Pitfall Categories
1. Batch Spawn Orphaning
migrations using parallel spawned agents show high HANDOFF rates with no terminal RESOLVED:
Migrate LEGACY_FA_Iconseries: 8 HANDOFF threads, 0 COMMITTED- pattern: coordinator spawns N agents, agents complete work but no verification/aggregation step
2. Underspecified Migration Scope
failed migrations often have highly detailed first messages but missing:
- validation steps
- rollback criteria
- integration testing requirements
example from failed migration:
Migrate Menu classnames to @internal_org/ui package.
Steps: 1. Copy... 2. Update import... 3. Update package.json
Return: Confirm the files were created/updated.
no build verification, no type checking, no import validation across consumers.
3. Steering Vocabulary in High-Churn Refactors
extracted steering messages reveal common friction points:
- “No” / “Not” prefix: agent went wrong direction
- “Wait”: user catching agent mid-mistake
- design pushback: “That is not clean at all”, “Not simple enough”
- missing context: agent missing domain knowledge (Hilbert keys, column types)
- lazy execution: “You’re getting so lazy” - agent cutting corners
4. Performance Regression Blindness
several threads show pattern:
- refactor code
- tests pass
- benchmarks regress (discovered later)
- requires additional steering to fix
example: Radix sort generic refactoring performance regression analysis (3 steering, 128 turns)
Success Patterns
High-Success Refactors Share:
- explicit verification: “run benchmarks”, “typecheck”, “run tests”
- incremental scope: single-file or single-concept changes
- domain expertise: user provides context agent lacks
- iteration tolerance: willingness to spend 60+ turns
Successful Migration Characteristics:
- smaller scope (single file or utility)
- self-contained modules with few cross-cutting dependencies
- explicit success criteria in first message
Recommendations
- migrations need verification gates: add explicit typecheck/build/test steps to migration prompts
- batch spawns need aggregation: when spawning N migration agents, include terminal verification agent
- expect high turn counts: successful refactors average 75 turns; bailing at 30 leaves work incomplete
- front-load domain context: agent lacks knowledge of custom column types, encoding schemes, performance characteristics
- benchmark before declaring success: include perf regression checks for algorithm/interface refactors