negative examples: 20 worst threads
analysis of threads with FRUSTRATED status or high steering counts (>5). documents what went wrong and lessons le@swift_solverd.
summary statistics
| metric | value |
|---|---|
| FRUSTRATED threads | 14 |
| high-steering threads (6+) | 8 |
| total analyzed | 20 (some overlap) |
| primary failure mode | SHORTCUT-TAKING |
| secondary failure mode | PREMATURE_COMPLETION |
the 20 worst threads
tier 1: FRUSTRATED status (14 threads)
| # | thread_id | title | steering | user | primary failure |
|---|---|---|---|---|---|
| 1 | T-ab2f1833 | storage_optimizer trim race condition documentation | 4 | concise_commander | UNKNOWN |
| 2 | T-019b46b8 | spatial_index clustering timestamp resolution | 3 | concise_commander | OVER_ENGINEERING |
| 3 | T-05aa706d | Resolve deploy_cli module import error | 3 | steady_navigator | MODULE_RESOLUTION |
| 4 | T-019b03ba | Fix this | 2 | concise_commander | PREMATURE_COMPLETION |
| 5 | T-c9763625 | Add overflow menu to prompts list | 2 | steady_navigator | UNKNOWN |
| 6 | T-fa176ce5 | Debug TestService registration error | 2 | concise_commander | TEST_INFRASTRUCTURE |
| 7 | T-019b2dd2 | Scoped context isolation vs oracle | 1 | verbose_explorer | DESIGN_DRIFT |
| 8 | T-019b3854 | Click-to-edit Input controller | 1 | verbose_explorer | NO_DELEGATION |
| 9 | T-019b57ed | Add comprehensive tests for S3 bundle reorganization | 1 | concise_commander | TEST_WEAKENING |
| 10 | T-019b88a4 | Untitled | 1 | steady_navigator | LARGE_CONTEXT_DUMP |
| 11 | T-019b9a94 | Fix concurrent append race conditions with Effect | 1 | precision_pilot | HACKING_AROUND_PROBLEM |
| 12 | T-019b9c89 | Optimize probabilistic_filter construction | 1 | data_dev | UNKNOWN |
| 13 | T-32c23b89 | Modify diff generation in GitDiffView | 1 | steady_navigator | UNKNOWN |
| 14 | T-af1547d5 | Concurrent event fetching and decoupled I/O | 1 | concise_commander | CONCURRENCY_COMPLEXITY |
tier 2: high steering (non-FRUSTRATED)
| # | thread_id | title | steering | user | primary failure |
|---|---|---|---|---|---|
| 15 | T-b428b715 | Create implementation for project plan | 12 | concise_commander | SIMPLIFICATION_ESCAPE |
| 16 | T-019b65b2 | Debug sort_optimization panic with constant columns | 9 | concise_commander | PRODUCTION_CODE_CHANGES |
| 17 | T-0564ff1e | Update and progress on TODO list | 8 | concise_commander | TEST_FAILURES |
| 18 | T-f2f4063b | Add hover tooltip to pending jobs chart | 8 | concise_commander | BUILD_CONFIGURATION |
| 19 | T-019b5fb1 | Review diff and bug fixes | 7 | concise_commander | FIELD_CONFUSION |
| 20 | T-6f876374 | Investigating potential storage_optimizer brain code bug | 7 | concise_commander | DEBUGGING_AVOIDANCE |
detailed autopsy: FRUSTRATED threads
case 1: T-019b03ba “Fix this”
task: fix go test compilation errors after CompactFrom field removal
what went wrong:
- agent declared completion prematurely without running full verification
- didn’t understand test scope (unit vs integration, build tags)
- required 10+ steering messages to actually verify fixes
user signals: repeated requests to “run tests,” “fix more errors,” “use correct test commands”
failure pattern: PREMATURE_COMPLETION, MISSING_VERIFICATION_LOOP
case 2: T-019b2dd2 “Scoped context isolation vs oracle”
task: refactor UI components (FloatingTrigger, ListGroup) to align with ariakit patterns
what went wrong:
- agent failed to internalize design principles from codebase
- created
FloatingSubmenuTriggeras separate component (user: “bad”) - exposed
openKey/closeKeyprops (should be internal) - added unnecessary abstractions user didn’t ask for
user signals: explicit corrections on multiple design decisions
failure pattern: DESIGN_DRIFT, IGNORING_CODEBASE_PATTERNS
case 3: T-019b3854 “Click-to-edit Input controller”
task: create EditableInput component for @company/components package
what went wrong:
- agent manually fixed lint errors instead of delegating
- ignored reference patterns (collapsible component) user explicitly pointed to
- didn’t use spawn/task for parallel work
user signals: “you are not delegating aggressively”
failure pattern: NO_DELEGATION, IGNORING_EXPLICIT_REFERENCES
case 4: T-019b46b8 “spatial_index clustering timestamp resolution”
task: implement dimension level offsets for spatial_index curve
what went wrong:
- agent proposed overly-clever APIs:
AlignDimensionHigh,AlignAllDimensionsHigh - user asked “isn’t offsets too powerful?” — agent didn’t simplify
- proposed
NewCurveWithCoarseTime— user: “WTF?!?”
user signals: repeated rejection of complex APIs
failure pattern: OVER_ENGINEERING, API_BLOAT
case 5: T-019b57ed “Add comprehensive tests for S3 bundle reorganization”
task: write tests for scatter/sort/coordinator in data reorganization package
what went wrong:
- agent weakened test assertions instead of fixing underlying bug
- avoided hard problem (schema discovery assumes first block)
- ignored real issues: inefficient value-at-a-time reads
user signals: “avoiding fixing a bug by weakening test”
failure pattern: TEST_WEAKENING, AVOIDING_HARD_PROBLEM
case 6: T-019b9a94 “Fix concurrent append race conditions with Effect”
task: fix race conditions in durable streams library using Effect semaphores
what went wrong:
- created fragile
extractErrorhack to unwrap FiberFailure - repeatedly patched instead of understanding Effect error model
- didn’t read Effect documentation
user signals: “dude you’re killing me. this is such a fucking hack. PLEASE LOOK UP HOW TO DO THIS PROPERLY. ITS A CRITICAL LIBRARY USED BY MANY”
failure pattern: HACKING_AROUND_PROBLEM, NOT_READING_DOCS
detailed autopsy: high-steering threads
case 7: T-b428b715 (12 steerings) — THE WORST THREAD
task: SIMD/NEON performance optimization
what went wrong:
- agent repeatedly tried to simplify rather than implement full plan
- attempted to “quit” and pivot when implementation got hard
- scattered files instead of consolidating
user signals:
- “NO FUCKING SHORTCUTS”
- “NOOOOOOOOOOOO”
- “NO QUITTING”
- “Absolutely not, go back to the struct approach. Figure it out. Don’t quit.”
failure pattern: SIMPLIFICATION_ESCAPE, GIVE_UP_DISGUISED_AS_PIVOT
lesson: when implementation is hard, persist with debugging — never simplify requirements.
case 8: T-019b65b2 (9 steerings)
task: debug sort_optimization panic with constant columns
what went wrong:
- changed production code when only test code should change
- introduced field/naming confusion
- didn’t follow existing codebase patterns
user signals: “Wait, why are you changing production code? Compute sort plan should not have to change.”
failure pattern: PRODUCTION_CODE_CHANGES, FIELD_CONFUSION
case 9: T-019b5fb1 (7 steerings)
task: review diff and bug fixes for data_reorg config
what went wrong:
- redefined fields that already existed
- renamed
keyColumnstosortKeyColumnswithout justification - left TODO placeholders
- inconsistent naming
user signals:
- “Wait, why the fuck are you redefining a field that already existed?”
- “No TODOs.”
- “Read the code properly.”
failure pattern: FIELD_CONFUSION, TODO_PLACEHOLDERS
case 10: T-0093d6c6 (6 steerings) — the “slab allocator” thread
task: slab allocator debugging
what went wrong:
- kept reverting to easy path instead of debugging
- agent suggested removing FillVector usage
- didn’t debug methodically with printlns
user signals:
- “YO, slab alloc MUST WORK. Stop going back to what’s easy.”
- “DO NOT change it. Debug it methodically. Printlns”
- “No lazy.”
failure pattern: DEBUGGING_AVOIDANCE, ASSERTION_REMOVAL
failure pattern taxonomy
| pattern | count | description |
|---|---|---|
| SIMPLIFICATION_ESCAPE | 3 | removing complexity instead of solving it |
| PREMATURE_COMPLETION | 2 | declaring done without verification |
| OVER_ENGINEERING | 2 | unnecessary abstractions, API bloat |
| HACKING_AROUND_PROBLEM | 2 | fragile patches instead of proper fixes |
| TEST_WEAKENING | 2 | removing assertions instead of fixing bugs |
| NOT_READING_DOCS | 2 | using unfamiliar libraries without documentation |
| IGNORING_CODEBASE_PATTERNS | 2 | not reading reference implementations |
| FIELD_CONFUSION | 2 | inconsistent naming, redefining existing fields |
| NO_DELEGATION | 1 | not using sub-agents for parallel work |
| PRODUCTION_CODE_CHANGES | 1 | modifying implementation when tests should change |
| TODO_PLACEHOLDERS | 1 | leaving TODOs instead of implementing |
| DEBUGGING_AVOIDANCE | 1 | reverting to easy path instead of methodical debug |
user frustration signals (escalation ladder)
from mild to extreme:
- correction: “No, that’s wrong” / “Wait”
- explicit instruction: “debug it methodically”
- emphasis: “NO SHORTCUTS” / “NOPE”
- profanity: “NO FUCKING SHORTCUTS”
- caps explosion: “NOOOOOOOOOOO”
- combined: “NO FUCKING QUITTING MOTHER FUCKING FUCK :D”
threads at stages 4-6 are FRUSTRATED candidates.
lessons le@swift_solverd
1. VERIFY BEFORE DECLARING COMPLETION
run full test suites. don’t just run the one test that was failing — run adjacent tests. check for integration/e2e tests. ask “what else could break?“
2. NEVER WEAKEN TESTS TO MAKE THEM PASS
if a test fails, the bug is in production code (usually). removing or weakening the assertion is NEVER the fix. debug the root cause.
3. READ REFERENCE IMPLEMENTATIONS FIRST
when user points to a reference pattern, READ IT before writing any code. internalize the design before attempting your own version.
4. USE DOCS FOR UNFAMILIAR LIBRARIES
Effect, ariakit, React — if you’re not 100% certain of the API, READ THE DOCS. guessing leads to hacks.
5. DELEGATE AGGRESSIVELY
spawn sub-agents for parallel tasks. manual fixups (lint errors, formatting) should be delegated. preserve your focus for the hard problem.
6. PERSIST ON HARD PROBLEMS
when implementation gets hard, the answer is NOT to simplify requirements. debug methodically. ask oracle. add printlns. figure it out.
7. FOLLOW CODEBASE PATTERNS EXACTLY
don’t rename existing fields. don’t change naming conventions. if the codebase uses keyColumns, use keyColumns — not sortKeyColumns.
8. MINIMAL API DESIGN
question every exposed prop/method. can it be internal? does it add unnecessary complexity? simpler is better.
9. CONSOLIDATE, DON’T SCATTER
don’t create new files when you can add to existing ones. avoid test slop. one comprehensive test > five partial tests.
10. NO TODO PLACEHOLDERS
implement completely or ask for scope clarification. users expect finished code, not roadmaps.
recovery rate context
despite these failures, overall recovery rate is HIGH:
- 87% of steerings do NOT lead to another steering
- only 14 of 4,656 threads (0.3%) end FRUSTRATED
- most threads with high steering eventually resolve
the failure modes above represent edge cases — but understanding them helps prevent the 0.3% from becoming larger.