Refactoring Patterns Analysis

analysis of 245 threads containing “refactor”, “migrate”, or “upgrade” in titles.

Success Rates by Task Type

type	total	success	rate	avg turns	avg steering
refactor	150	95	63.3%	62.2	0.46
upgrade	8	3	37.5%	26.0	0.63
migrate	87	18	20.7%	33.3	0.05

key insight: refactoring succeeds 3x more often than migration. migrations have lowest steering but lowest success—suggests agents complete without verification.

Completion Status Distribution

status	count	percentage
RESOLVED	102	41.6%
UNKNOWN	90	36.7%
HANDOFF	38	15.5%
COMMITTED	14	5.7%

combined success rate (RESOLVED+COMMITTED): 47.3%

Turn Analysis

outcome	avg turns	min	max	count
success	75.5	3	433	116
incomplete	28.4	2	195	129

insight: successful refactors take ~2.7x more turns. short threads correlate with incomplete work—agents that bail early leave tasks unfinished.

User Patterns

user	total	success	rate	avg turns
@concise_commander	71	49	69.0%	87.4
@steady_navigator	54	40	74.1%	40.0
@verbose_explorer	39	—	—	55.6
@precision_pilot	8	7	87.5%	66.9
@patient_pathfinder	5	1	20.0%	50.0

patterns:

@concise_commander: high-turn, high-steering socratic approach yields 69% success
@steady_navigator: balanced turns with strong success (74%)

NOTE: @verbose_explorer’s refactor success rate was previously reported as 28%, but this was based on spawn-misclassified data. with corrected overall stats (83% resolution), @verbose_explorer’s refactor-specific success is unknown and needs recomputation from clean data.

Pitfall Categories

1. Batch Spawn Orphaning

migrations using parallel spawned agents show high HANDOFF rates with no terminal RESOLVED:

Migrate LEGACY_FA_Icon series: 8 HANDOFF threads, 0 COMMITTED
pattern: coordinator spawns N agents, agents complete work but no verification/aggregation step

2. Underspecified Migration Scope

failed migrations often have highly detailed first messages but missing:

validation steps
rollback criteria
integration testing requirements

example from failed migration:

Migrate Menu classnames to @internal_org/ui package.
Steps: 1. Copy... 2. Update import... 3. Update package.json
Return: Confirm the files were created/updated.

no build verification, no type checking, no import validation across consumers.

3. Steering Vocabulary in High-Churn Refactors

extracted steering messages reveal common friction points:

“No” / “Not” prefix: agent went wrong direction
“Wait”: user catching agent mid-mistake
design pushback: “That is not clean at all”, “Not simple enough”
missing context: agent missing domain knowledge (Hilbert keys, column types)
lazy execution: “You’re getting so lazy” - agent cutting corners

4. Performance Regression Blindness

several threads show pattern:

refactor code
tests pass
benchmarks regress (discovered later)
requires additional steering to fix

example: Radix sort generic refactoring performance regression analysis (3 steering, 128 turns)

Success Patterns

explicit verification: “run benchmarks”, “typecheck”, “run tests”
incremental scope: single-file or single-concept changes
domain expertise: user provides context agent lacks
iteration tolerance: willingness to spend 60+ turns

Successful Migration Characteristics:

smaller scope (single file or utility)
self-contained modules with few cross-cutting dependencies
explicit success criteria in first message

Recommendations

migrations need verification gates: add explicit typecheck/build/test steps to migration prompts
batch spawns need aggregation: when spawning N migration agents, include terminal verification agent
expect high turn counts: successful refactors average 75 turns; bailing at 30 leaves work incomplete
front-load domain context: agent lacks knowledge of custom column types, encoding schemes, performance characteristics
benchmark before declaring success: include perf regression checks for algorithm/interface refactors