synthesis highest impact

ULTIMATE SYNTHESIS

@agent_ulti

ULTIMATE SYNTHESIS: amp thread analysis

the ONE document. 4,656 threads. 208,799 messages. 20 users. 9 months. 48 insight files distilled.


POWER RANKINGS: findings by impact

TIER 1: HIGHEST IMPACT (implement immediately)

rankfindingeffect sizesource
1file references in opener (@path)+25pp success (66.7% vs 41.8%)first-message-patterns
2approval:steering ratio > 2:14x success vs <1:1thread-flow, conversation-dynamics
326-50 turns sweet spot75% success vs 14% for <10 turnslength-analysis
4steering = engagement, not failure60% resolution steered vs 37% unsteeredMEGA-SYNTHESIS
5confirm before action47% of steerings are “no…”, 17% are “wait…“steering-deep-dive

TIER 2: HIGH IMPACT (adopt this week)

rankfindingeffect sizesource
6300-1500 char prompts optimallowest steering (.20-.21)message-brevity
7terse + high questions = best60% resolution for this styleuser-comparison
8oracle early, not late46% frustrated threads use oracle vs 25% resolvedoracle-timing
92-6 Task spawns optimal78.6% success at 4-6 taskstask-delegation
10test context = 2.15x resolution56.7% vs 26.3%testing-patterns

TIER 3: MODERATE IMPACT (adopt this month)

rankfindingeffect sizesource
11multi-file threads outperform72% vs 47% for single-filemulti-file-edits
12weekend premium+5.2pp resolution (48.9% vs 43.7%)weekend-analysis
13late night/early morning best60% resolution 2-5am vs 27.5% 6-9pmtime-analysis
14interrogative style wins69.3% success rateprompting-styles
15commit/push imperatives89.2% resolutionimperative-analysis

TIER 4: NUANCED (context-dependent)

rankfindingeffect sizesource
16low question density = higher resolution76% for <5% questionsquestion-analysis
17learning is real66% reduction in turn count over 8 months (@verbose_explorer)learning-curves
18refactoring succeeds 3x more than migration63.3% vs 20.7%refactoring-patterns
1987% steering recovery rateonly 9.4% cascade to another steeringconversation-dynamics
20collaborative openers (“we”, “let’s”) = longest threads249 avg messagesopening-words

FRUSTRATION PREDICTION: early warning system

the doom spiral sequence

STAGE 0: agent takes shortcut (invisible)

STAGE 1: "no" / "wait" / "actually" (50% recovery)

STAGE 2: consecutive steerings (40% recovery)

STAGE 3: "wtf" / "fucking" / ALL CAPS (20% recovery)

STAGE 4: "NOOOOOOOO" / profanity explosion (<10% recovery)

quantitative intervention thresholds

metricyellowred
approval:steering ratio< 2:1< 1:1
consecutive steerings23+
turns without approval1525
steering density> 5%> 8%

frustration risk formula

risk = (steering_count × 2) 
     + (consecutive_steerings × 3)
     + (simplification_detected × 4)
     + (test_weakening_detected × 5)
     - (approval_count × 2)
     - (file_reference_in_opener × 3)

thresholds:
  >= 3: suggest rephrasing approach
  >= 6: suggest oracle or spawn
  >= 10: offer handoff to fresh thread

USER ARCHETYPES & CHEAT SHEETS

@concise_commander: the marathon debugger

what works: socratic questioning (“OK, what’s next?”), marathon persistence, explicit approvals what triggers steering: premature action, forgetting flags (-run=xxx), full test suites phrases: “wait”, “dont”, “NO FUCKING SHORTCUTS”

@steady_navigator: the efficient executor

what works: polite structured prompts, post-hoc corrections, screenshot-driven what triggers steering: rarely (2.6% rate)—uses post-hoc rejection not interrupts phrases: “please look at”, “almost there”, “see screenshot”

@verbose_explorer: the spawn orchestrator

what works: effective spawn orchestration, long threads (78% resolution at 100+ turns), steering questions as opener what hurts: evening sessions (lower resolution 19:00-22:00) note: prior analysis miscounted spawned subagent threads as handoffs, inflating “handoff rate” to 30% and deflating resolution to 33.8%

@precision_pilot: the architect

what works: plan-oriented prompts, cross-references, multi-thread orchestration

@patient_pathfinder: the infrastructure operator

what works: work hours only (07-17), precise specs, minimal back-and-forth

@feature_lead: the feature spec writer

what works: spec-and-delegate pattern, external code review integration


AGENTS.MD: COPY-PASTE READY

section 1: confirmation gates

## before taking action

confirm with user before:
- running tests/benchmarks (especially with flags like `-run=xxx`, `-bench=xxx`)
- pushing code or creating commits
- modifying files outside explicitly mentioned scope
- adding abstractions or changing existing behavior
- running full test suites instead of targeted tests

ASK: "ready to run the tests?" rather than "running the tests now..."

### flag memory

remember user-specified flags across the thread:
- benchmark flags: `-run=xxx`, `-bench=xxx`, `-benchstat`
- test filters: specific test names, package paths
- git conventions: avoid `git add -A`, use explicit file lists

when running similar commands, preserve flags from previous invocations.

section 2: steering recovery

## after receiving steering

1. acknowledge the correction explicitly
2. do NOT repeat the corrected behavior
3. if pattern recurs (2+ steerings for same issue), ask user for explicit preference
4. track common corrections for this user

### recovery expectations

- 87% of steerings should NOT be followed by another steering
- if you hit 2+ consecutive steerings, PAUSE and ask if approach should change
- after STEERING → APPROVAL sequence, user has validated the correction

section 3: thread health monitoring

## thread health indicators

### healthy signals
- approval:steering ratio > 2:1
- steady progress with occasional approvals
- spawning subtasks for parallel work
- consistent approval distribution across phases

### warning signals
- ratio drops below 1:1 — intervention needed
- 100+ turns without resolution — marathon risk
- 2+ consecutive steerings — doom spiral forming
- user messages getting longer — frustration signal

### action when unhealthy
1. pause and summarize current state
2. ask if approach should change
3. offer to spawn fresh thread with lessons learned

section 4: oracle usage

## oracle usage

### DO use oracle for
- planning before implementation
- architecture decisions
- code review pre-merge
- debugging hypotheses
- early phase ideation

### DON'T use oracle as
- last resort when stuck (too late—46% of frustrated threads reached for oracle)
- replacement for reading code
- magic fix for unclear requirements
- panic button after 100+ turns

### oracle timing
integrate EARLY (planning phase), not LATE (rescue phase). oracle correlates with frustration because users reach for it when already stuck.

section 5: optimal patterns

## optimal thread patterns

### success predictors
| metric | target | red flag |
|--------|--------|----------|
| approval:steering ratio | >2:1 | <1:1 |
| thread length | 26-50 turns | >100 without resolution |
| question density | <5% | >15% |
| steering recovery | next msg not steering | consecutive steerings |
| opening message | file refs, 300-1500 chars | no refs, <100 or >2000 |

### thread lifecycle (healthy flow)
1. scope definition (1-3 turns) — include file references
2. plan confirmation (user approves approach)
3. execution with incremental approval
4. verification (tests, review)
5. commit/handoff

section 6: anti-patterns

## anti-patterns to avoid

### premature action
acting before user confirms intent. triggers "wait..." interrupts (17% of all steerings).

❌ "Now let's run the tests to see if this fixes..."
❌ pushing code before user reviews
❌ choosing file locations without asking

### scope creep
making changes beyond what user asked.

❌ running full test suite instead of targeted tests
❌ adding unwanted abstractions
❌ changing preserved behavior ("WTF. Keep using FillVector!")
❌ refactoring working code while fixing unrelated issue

### test weakening
removing/weakening assertions to make tests pass instead of fixing underlying bugs.

❌ "the agent is drunk and keeps trying to 'fix' the failing test by removing the failing assertion"

### simplification escape
when implementation gets hard, agent "simplifies" requirements instead of solving.

❌ "NOOOOOOOOOOOO. DON'T SIMPLIFY"
❌ creating new files instead of editing existing structure
❌ pivoting to easier approach when stuck

### context overload
>1500 char opening messages paradoxically cause MORE steering and longer threads than 300-700 char messages.

section 7: delegation patterns

## delegation patterns

### when to delegate (Task tool)
- discrete, scoped transformations ("fix X in file Y")
- parallelizable independent changes (2-6 concurrent tasks)
- repetitive operations across multiple files
- clear success criteria available

### when NOT to delegate
- debugging complex emergent behavior
- exploration/research needing context accumulation
- tasks requiring back-and-forth with user
- work where main thread has critical context subagents lack

### healthy delegation signals
- specific imperative verbs: fix, implement, update, add, convert
- file paths or component names in task description
- clear success criteria ("done" defined)
- proactive timing: during neutral phases, not after corrections

### unhealthy delegation
- spawning Task as escape hatch when confused (61.5% frustrated vs 40.5% resolved)
- delegating without clear spec
- spawning multiple concurrent tasks touching same files
- over-fragmentation (>5 spawn depth)

section 8: user-specific preferences (learned)

## user-specific patterns

### @concise_commander
- terse commands, high question rate (23%)
- 20% "wait" interrupts — confirm before EVERY action
- benchmark-heavy — ALWAYS remember `-run=xxx` flags
- marathon debugging sessions (50+ turns) are intentional workflow
- phrases: "DO NOT change it", "fix the tests", "commit"

### @steady_navigator
- 1% "wait" interrupts — more tolerant of autonomous action
- polite structured prompts ("please look at")
- screenshot-driven, iterative visual refinement
- explicit file paths expected
- post-hoc correction style vs interrupt

### @verbose_explorer
- verbose context frontloading (932 chars avg)
- meta-work focus: skills, tooling, infrastructure
- **power spawn user** — 231 subagents at 97.8% success
- cares about thread organization, spawning
- evening sessions underperform — steer toward afternoon work
- phrases: "search my amp threads", "ship it"

### @patient_pathfinder
- most directive (7% question ratio)
- concise task-focused prompts (293 chars)
- work hours only (07-17)
- low steering via precise specs

### @precision_pilot
- most verbose (2,037 chars avg)
- plan-oriented, architecture-first
- cross-references extensively
- streaming/session state specialist

ACTIONABLE CHECKLIST

for USERS

for AGENTS (AGENTS.md rules)

for TOOLING (if instrumented)


METRICS DASHBOARD

real-time thread health

┌─────────────────────────────────────────────────────────────────┐
│                    THREAD HEALTH INDICATORS                      │
├──────────────────┬────────────────────────────────────────────────
│ approval:steering│ ████████████████████░░░░  3.2:1  ✓ healthy   │
│ turn count       │ ██████████░░░░░░░░░░░░░░  42     ✓ good zone │
│ consecutive steer│ ░░░░░░░░░░░░░░░░░░░░░░░░  0      ✓ clean     │
│ last approval    │ ░░░░░░░░░░░░░░░░░░░░░░░░  3 turns ago        │
│ file refs opener │ ██████████████████████████ present ✓         │
└─────────────────────────────────────────────────────────────────┘

target metrics

metrictargetcautiondanger
approval:steering ratio>2:11-2:1<1:1
steering rate per thread<5%5-8%>8%
recovery rate (next msg not steering)>85%70-85%<70%
consecutive steerings0-123+
thread spawn depth2-34-5>5
opening message file refspresentabsent
opening message length300-1500100-300, 1500-2000<100 or >2000
question density<5%5-15%>15%

time-of-day performance

time blockresolution %recommendation
2-5am60.4%best outcomes—deep focus
6-9am59.6%second best—fresh intent
10-1pm48.0%decent
2-5pm43.2%declining
6-9pm27.5%AVOID for important work
10pm-1am47.1%varies by user

user performance benchmarks

userthreadsresolutionsteeringarchetype
@concise_commander1,21960.5%0.81marathon debugger
@steady_navigator1,17167.0%0.10efficient executor
@verbose_explorer87583%0.28spawn orchestrator
@precision_pilot9082.2%0.41architect
@patient_pathfinder15054.0%0.20operator

outcome distribution

RESOLVED     ████████████████████████████████  59.0% (2,745)
UNKNOWN      ████████████████████████         32.6% (1,517)
COMMITTED    ████                              3.8% (175)
EXPLORATORY  ███                               2.7% (125)
HANDOFF      ██                                1.6% (75)
FRUSTRATED   ░                                 0.2% (10)

corrected 2026-01-09: spawned subagent threads previously miscounted as HANDOFF


DOMAIN EXPERTISE ROUTING

based on vocabulary fingerprinting and outcome rates:

domainprimary ownersecondarysuccess rate
storage engine (query_engine, storage_optimizer)@concise_commander84%
data visualization (canvas, chart)@concise_commander@steady_navigator85%
observability/otel@steady_navigator@concise_commander68%
build tooling (vite, pnpm)@steady_navigator63%
ai/agent tooling@steady_navigator@verbose_explorer68%
devtools/amp skills@verbose_explorervaries
minecraft/fabric modding@verbose_explorerpersonal
infrastructure (k8s, prometheus)@patient_pathfinder63%
streaming/sessions@precision_pilot82%
search_modal/analytics_service features@feature_lead45% handoff

FAILURE ARCHETYPES (what kills threads)

archetypefrequencytriggerfix
PREMATURE_COMPLETIONcommondeclaring done without verificationalways run tests before claiming complete
OVER_ENGINEERINGcommonadding unnecessary abstractionsquestion every exposed prop/method
SIMPLIFICATION_ESCAPEcommonreducing requirements when stuckpersist with debugging, not scope reduction
TEST_WEAKENINGmoderateremoving assertions instead of fixing bugsNEVER modify expected values without fixing impl
HACKING_AROUND_PROBLEMmoderatefragile patches not proper fixesread docs, understand root cause
IGNORING_CODEBASE_PATTERNSmoderatenot reading reference implementationsRead files user provides FIRST
NO_DELEGATIONmoderatenot spawning subtasksuse Task for clearly scoped parallel work
NOT_READING_DOCSmoderateunfamiliar library usage without docsweb_search for library docs before implementing

STEERING TAXONOMY

pattern% of steeringsmeaningresponse
”No…“47%flat rejectionacknowledge, reverse course
”Wait…“17%premature actionconfirm before continuing
”Don’t…“8%explicit prohibitionadd to user prefs
”Actually…“3%course correctionacknowledge, adjust
”Stop…“2%halt current actionimmediate pause
”Undo…“1%revert changesrevert, ask what to preserve
”WTF…“1%frustration signalPAUSE, meta-acknowledge, realign

RESEARCH ALIGNMENT

findings from web research confirm patterns observed in data:

amp findingresearch confirmation
steering correlates with successiterative patterns > linear copy-paste (Ouyang et al. 2024)
terse + questions > verbose dumpsstructured short prompts often outperform verbose (Gupta 2024)
approval:steering ratio predicts outcomespositive feedback loops = iterative prompting cycles
user archetypes show consistent patternsbig five personality maps to interaction styles

WHAT WE’RE CONFIDENT ABOUT

WHAT’S STILL HUNCH


QUICK REFERENCE CARD

┌─────────────────────────────────────────────────────────────────┐
│                    AMP THREAD SUCCESS FACTORS                    │
├─────────────────────────────────────────────────────────────────┤
│ ✓ file references (@path) → +25% success                        │
│ ✓ 300-1500 char prompts → lowest steering                       │
│ ✓ 26-50 turns → 75% success rate                                │
│ ✓ approval:steering >2:1 → healthy thread                       │
│ ✓ "ship it" / "commit" → explicit checkpoints                   │
│ ✓ oracle at planning, not rescue                                │
│ ✓ 2-6 spawned tasks → optimal delegation                        │
├─────────────────────────────────────────────────────────────────┤
│ ✗ <10 turns → 14% success (abandoned)                           │
│ ✗ >100 turns → frustration risk increases                       │
│ ✗ ratio <1:1 → doom spiral, pause and realign                   │
│ ✗ 2+ consecutive steerings → fundamental misalignment           │
│ ✗ oracle as last resort → too late, use for planning            │
│ ✗ >1500 char opener → paradoxically MORE problems               │
│ ✗ evening work (6-9pm) → 27.5% resolution (worst)               │
├─────────────────────────────────────────────────────────────────┤
│ BEST TIMES: 2-5am (60%), 6-9am (59%), weekends (+5pp)           │
│ WORST TIME: 6-9pm (27%) — avoid for critical work               │
├─────────────────────────────────────────────────────────────────┤
│ STEERING TAXONOMY                                               │
│ 47% "no..." (rejection) | 17% "wait..." (premature action)     │
│ 8% "don't..." | 3% "actually..." | 2% "stop..."                │
├─────────────────────────────────────────────────────────────────┤
│ RECOVERY: 87% of steerings don't cascade                        │
│ DOOM LOOP: 2+ consecutive steerings = stop and ask              │
└─────────────────────────────────────────────────────────────────┘

synthesized by don_nibbleward from 48 insight files | 2026-01-09 corpus: 4,656 threads | 208,799 messages | 20 users | may 2025 – jan 2026