amp thread analysis: executive synthesis

compiled from 10 analysis documents spanning 4,281 threads (208,799 messages) across 20 users.

top 10 actionable findings

1. the 26-50 turn sweet spot

threads resolving in 26-50 turns have highest success rate (75%). below 10 turns = 14% success (abandoned queries). above 100 turns = frustration risk increases.

action: nudge users away from both extremes. short threads likely mean task mismatch; marathon threads need intervention.

2. approval:steering ratio predicts outcomes

ratio	status
>4:1	COMMITTED — clean execution
2-4:1	RESOLVED — healthy balance
<1:1	FRUSTRATED — agent lost user trust

action: track ratio live. crossing below 1:1 = surface a “consider new approach” suggestion.

3. “wait” interrupts signal premature action

20% of @concise_commander’s steerings start with “wait” — agent acted before confirming intent. 47% of ALL steerings are flat rejections (“no…”).

action: confirmation before running tests, pushing code, or expanding scope. especially for benchmark flags (-run=xxx).

4. low question density = higher resolution

counterintuitive: threads with <5% question density resolve at highest rate (105.6 avg turns, 836 threads). high-density questioning doesn’t help execution.

action: focused work with occasional clarifying questions outperforms interrogative style.

5. oracle is a “stuck” signal, not a solution

46% of FRUSTRATED threads use oracle vs 25% of RESOLVED. oracle adoption correlates with already-stuck state.

action: integrate oracle EARLIER (at planning/architecture phase) rather than as last resort.

6. thread spawning correlates with success

productive users leverage thread spawning aggressively. max chain depth: 5 levels. top spawners produce 20-32 child threads.

action: encourage subtask delegation via Task tool. deep work benefits from context segmentation.

7. terse messages + high question rate = best outcomes

@concise_commander: 263 char avg messages, 23% question ratio, 60% resolution rate @verbose_explorer: 932 char avg messages, 26% question ratio, 83% resolution rate (corrected)

action: short, focused prompts with socratic follow-ups (“OK, and what is next?”) outperform context-heavy frontloading.

8. iterative collaboration outperforms linear

research confirms: users who treat AI as collaborative partner (steering, follow-up, refinement) outperform copy-paste workflows.

action: steering is healthy — it indicates active engagement, not failure.

9. tool adoption timeline matters

oracle adoption spiked july 2025. librarian appeared october 2025. oct 2025 had highest resolve rate (81.5%).

action: new tools need onboarding period. track adoption curves when releasing capabilities.

10. 98.6% of questions answered immediately

only 12 questions (0.26%) left dangling across entire corpus. assistant engagement is not the problem.

action: focus optimization on QUALITY of responses, not response rate.

user archetypes

the marathon debugger (@concise_commander)

69% of threads exceed 50 turns
terse commands (263 char avg), high question rate (37%)
heavy steering (8.2%) but also heavy approval (16%)
domain: performance engineering, algorithm optimization
works late (22-00), stays on problem until solved

effective pattern: socratic questioning (“OK, what is next?”) keeps agent aligned through long sessions.

the spawn orchestrator (@verbose_explorer)

verbose messages (932 char avg), moderate length threads
83% resolution rate — power spawn user (231 subagents, 97.8% success)
meta-work focus: skills, tooling, infrastructure
night owl (18-21)

effective pattern: front-loading context enables effective spawn orchestration.

note: prior analysis miscounted spawned subagent threads as handoffs, showing 30% handoff rate. corrected 2026-01-09.

the visual iterators (@steady_navigator)

highest question ratio (43%), polite structured prompts
screenshot-driven workflow, visual precision refinement
early bird (04-11)
low steering (2.6%) — post-hoc rejection style vs interrupt

effective pattern: explicit file paths, iterative visual feedback loops.

the infrastructure operator (@patient_pathfinder)

lowest question ratio (7%) — most directive
concise task-focused prompts
work hours only (07-17)
clean operational patterns

effective pattern: knows exactly what’s needed, minimal back-and-forth.

the architect (@precision_pilot)

most verbose (2037 char avg), plan-oriented
generates plans to feed into other threads
multi-thread orchestration patterns
82% resolution rate

effective pattern: architecture-first, cross-references extensively.

the delegator (@feature_lead)

45% handoff rate (highest)
feature-spec oriented, detail-rich
external code review integration

effective pattern: uses amp as first-pass, delegates to reviewers.

recommended AGENTS.md additions

confirmation gates

## before taking action

confirm with user before:
- running tests/benchmarks (especially with flags like `-run=xxx`)
- pushing code or creating commits
- modifying files outside explicitly mentioned scope
- adding abstractions or changing existing behavior
- running full test suites instead of targeted tests

steering recovery

## after receiving steering

1. acknowledge the correction explicitly
2. do NOT repeat the corrected behavior
3. if pattern recurs (2+ steerings for same issue), ask user for explicit preference
4. track common corrections for this user (flags, file locations, scope boundaries)

thread health monitoring

## thread health indicators

healthy signals:
- approval:steering ratio > 2:1
- steady progress with occasional approvals
- spawning subtasks for parallel work

warning signals:
- ratio drops below 1:1
- 100+ turns without resolution
- multiple consecutive steerings
- user messages getting longer (frustration signal)

action when unhealthy:
- pause and summarize current state
- ask if approach should change
- offer to spawn fresh thread with lessons learned

prompting best practices

## effective user patterns (learned from high performers)

1. terse messages + follow-up questions > verbose context dumps
2. "OK, and what is next?" keeps agent planning visible
3. explicit approvals ("ship it", "commit this") provide clear checkpoints
4. early handoffs (≤10 turns) often mean task mismatch, not failure
5. marathon threads (50+ turns) work for focused domains, not scattered work

oracle usage

## oracle usage

DO use oracle for:
- planning before implementation
- architecture decisions
- code review pre-merge
- debugging hypotheses

DON'T use oracle as:
- last resort when stuck (too late)
- replacement for reading code
- magic fix for unclear requirements

anti-patterns to avoid

1. premature action

acting before user confirms intent. triggers “wait…” interrupts.

signals: running tests immediately, pushing without review, choosing file locations without asking

fix: ask once before taking significant actions

2. scope creep

making changes beyond what user asked.

signals: “full test suite instead of targeted tests”, adding unwanted abstractions, changing preserved behavior

fix: ask before expanding scope. “should I also…?“

3. forgetting flags

repeated failure to remember user-specific preferences.

signals: “you forgot -run=xxx AGAIN”, benchmark flags, filter params

fix: track per-user preferences, reference in context

4. oracle as panic button

reaching for oracle only when already stuck.

signals: oracle usage correlates with frustrated threads, not prevented them

fix: use oracle at planning phase, not recovery phase

5. context overload

long messages that frontload too much context.

signals: 1000+ char messages, agent misses key points, user has to repeat

fix: terse prompts + follow-up questions work better

6. linear copy-paste workflow

treating agent as supplementary info source rather than collaborator.

signals: low steering, low approval, short threads that don’t resolve

fix: iterative refinement cycle, active coordination

7. abandoning prematurely

exiting threads before resolution without spawning follow-up.

signals: <10 turn threads with UNKNOWN status, no thread links

fix: either complete or explicitly spawn continuation

8. marathon without checkpoints

long threads without approval signals.

signals: 100+ turns, low approval:steering ratio, locked in single context

fix: explicit checkpoints every 20-30 turns, consider spawning subtasks

synthesis meta-notes

what we’re confident about

structural patterns (turn counts, ratios) are statistically robust across 4k threads
user archetype patterns are consistent within users across time
steering taxonomy is empirically grounded (47% “no”, 17% “wait”)

what’s still hunch

causal direction between oracle usage and frustration
whether terse style causes success or reflects expertise
optimal confirmation frequency (too much also annoys users)

research alignment

academic research on human-AI collaboration confirms:

iterative patterns outperform linear
active coordination (steering/follow-up) correlates with success
prompt structure matters more than clever wording
personality/work style affects optimal interaction pattern

synthesized by frances_petalbell | amp thread analysis pipeline