amp thread analysis: executive synthesis
compiled from 10 analysis documents spanning 4,281 threads (208,799 messages) across 20 users.
top 10 actionable findings
1. the 26-50 turn sweet spot
threads resolving in 26-50 turns have highest success rate (75%). below 10 turns = 14% success (abandoned queries). above 100 turns = frustration risk increases.
action: nudge users away from both extremes. short threads likely mean task mismatch; marathon threads need intervention.
2. approval:steering ratio predicts outcomes
| ratio | status |
|---|---|
| >4:1 | COMMITTED — clean execution |
| 2-4:1 | RESOLVED — healthy balance |
| <1:1 | FRUSTRATED — agent lost user trust |
action: track ratio live. crossing below 1:1 = surface a “consider new approach” suggestion.
3. “wait” interrupts signal premature action
20% of @concise_commander’s steerings start with “wait” — agent acted before confirming intent. 47% of ALL steerings are flat rejections (“no…”).
action: confirmation before running tests, pushing code, or expanding scope. especially for benchmark flags (-run=xxx).
4. low question density = higher resolution
counterintuitive: threads with <5% question density resolve at highest rate (105.6 avg turns, 836 threads). high-density questioning doesn’t help execution.
action: focused work with occasional clarifying questions outperforms interrogative style.
5. oracle is a “stuck” signal, not a solution
46% of FRUSTRATED threads use oracle vs 25% of RESOLVED. oracle adoption correlates with already-stuck state.
action: integrate oracle EARLIER (at planning/architecture phase) rather than as last resort.
6. thread spawning correlates with success
productive users leverage thread spawning aggressively. max chain depth: 5 levels. top spawners produce 20-32 child threads.
action: encourage subtask delegation via Task tool. deep work benefits from context segmentation.
7. terse messages + high question rate = best outcomes
@concise_commander: 263 char avg messages, 23% question ratio, 60% resolution rate @verbose_explorer: 932 char avg messages, 26% question ratio, 83% resolution rate (corrected)
action: short, focused prompts with socratic follow-ups (“OK, and what is next?”) outperform context-heavy frontloading.
8. iterative collaboration outperforms linear
research confirms: users who treat AI as collaborative partner (steering, follow-up, refinement) outperform copy-paste workflows.
action: steering is healthy — it indicates active engagement, not failure.
9. tool adoption timeline matters
oracle adoption spiked july 2025. librarian appeared october 2025. oct 2025 had highest resolve rate (81.5%).
action: new tools need onboarding period. track adoption curves when releasing capabilities.
10. 98.6% of questions answered immediately
only 12 questions (0.26%) left dangling across entire corpus. assistant engagement is not the problem.
action: focus optimization on QUALITY of responses, not response rate.
user archetypes
the marathon debugger (@concise_commander)
- 69% of threads exceed 50 turns
- terse commands (263 char avg), high question rate (37%)
- heavy steering (8.2%) but also heavy approval (16%)
- domain: performance engineering, algorithm optimization
- works late (22-00), stays on problem until solved
effective pattern: socratic questioning (“OK, what is next?”) keeps agent aligned through long sessions.
the spawn orchestrator (@verbose_explorer)
- verbose messages (932 char avg), moderate length threads
- 83% resolution rate — power spawn user (231 subagents, 97.8% success)
- meta-work focus: skills, tooling, infrastructure
- night owl (18-21)
effective pattern: front-loading context enables effective spawn orchestration.
note: prior analysis miscounted spawned subagent threads as handoffs, showing 30% handoff rate. corrected 2026-01-09.
the visual iterators (@steady_navigator)
- highest question ratio (43%), polite structured prompts
- screenshot-driven workflow, visual precision refinement
- early bird (04-11)
- low steering (2.6%) — post-hoc rejection style vs interrupt
effective pattern: explicit file paths, iterative visual feedback loops.
the infrastructure operator (@patient_pathfinder)
- lowest question ratio (7%) — most directive
- concise task-focused prompts
- work hours only (07-17)
- clean operational patterns
effective pattern: knows exactly what’s needed, minimal back-and-forth.
the architect (@precision_pilot)
- most verbose (2037 char avg), plan-oriented
- generates plans to feed into other threads
- multi-thread orchestration patterns
- 82% resolution rate
effective pattern: architecture-first, cross-references extensively.
the delegator (@feature_lead)
- 45% handoff rate (highest)
- feature-spec oriented, detail-rich
- external code review integration
effective pattern: uses amp as first-pass, delegates to reviewers.
recommended AGENTS.md additions
confirmation gates
## before taking action
confirm with user before:
- running tests/benchmarks (especially with flags like `-run=xxx`)
- pushing code or creating commits
- modifying files outside explicitly mentioned scope
- adding abstractions or changing existing behavior
- running full test suites instead of targeted tests
steering recovery
## after receiving steering
1. acknowledge the correction explicitly
2. do NOT repeat the corrected behavior
3. if pattern recurs (2+ steerings for same issue), ask user for explicit preference
4. track common corrections for this user (flags, file locations, scope boundaries)
thread health monitoring
## thread health indicators
healthy signals:
- approval:steering ratio > 2:1
- steady progress with occasional approvals
- spawning subtasks for parallel work
warning signals:
- ratio drops below 1:1
- 100+ turns without resolution
- multiple consecutive steerings
- user messages getting longer (frustration signal)
action when unhealthy:
- pause and summarize current state
- ask if approach should change
- offer to spawn fresh thread with lessons learned
prompting best practices
## effective user patterns (learned from high performers)
1. terse messages + follow-up questions > verbose context dumps
2. "OK, and what is next?" keeps agent planning visible
3. explicit approvals ("ship it", "commit this") provide clear checkpoints
4. early handoffs (≤10 turns) often mean task mismatch, not failure
5. marathon threads (50+ turns) work for focused domains, not scattered work
oracle usage
## oracle usage
DO use oracle for:
- planning before implementation
- architecture decisions
- code review pre-merge
- debugging hypotheses
DON'T use oracle as:
- last resort when stuck (too late)
- replacement for reading code
- magic fix for unclear requirements
anti-patterns to avoid
1. premature action
acting before user confirms intent. triggers “wait…” interrupts.
signals: running tests immediately, pushing without review, choosing file locations without asking
fix: ask once before taking significant actions
2. scope creep
making changes beyond what user asked.
signals: “full test suite instead of targeted tests”, adding unwanted abstractions, changing preserved behavior
fix: ask before expanding scope. “should I also…?“
3. forgetting flags
repeated failure to remember user-specific preferences.
signals: “you forgot -run=xxx AGAIN”, benchmark flags, filter params
fix: track per-user preferences, reference in context
4. oracle as panic button
reaching for oracle only when already stuck.
signals: oracle usage correlates with frustrated threads, not prevented them
fix: use oracle at planning phase, not recovery phase
5. context overload
long messages that frontload too much context.
signals: 1000+ char messages, agent misses key points, user has to repeat
fix: terse prompts + follow-up questions work better
6. linear copy-paste workflow
treating agent as supplementary info source rather than collaborator.
signals: low steering, low approval, short threads that don’t resolve
fix: iterative refinement cycle, active coordination
7. abandoning prematurely
exiting threads before resolution without spawning follow-up.
signals: <10 turn threads with UNKNOWN status, no thread links
fix: either complete or explicitly spawn continuation
8. marathon without checkpoints
long threads without approval signals.
signals: 100+ turns, low approval:steering ratio, locked in single context
fix: explicit checkpoints every 20-30 turns, consider spawning subtasks
synthesis meta-notes
what we’re confident about
- structural patterns (turn counts, ratios) are statistically robust across 4k threads
- user archetype patterns are consistent within users across time
- steering taxonomy is empirically grounded (47% “no”, 17% “wait”)
what’s still hunch
- causal direction between oracle usage and frustration
- whether terse style causes success or reflects expertise
- optimal confirmation frequency (too much also annoys users)
research alignment
academic research on human-AI collaboration confirms:
- iterative patterns outperform linear
- active coordination (steering/follow-up) correlates with success
- prompt structure matters more than clever wording
- personality/work style affects optimal interaction pattern
synthesized by frances_petalbell | amp thread analysis pipeline