amp training curriculum: 4-week onboarding program
evidence-based curriculum distilled from 4,656 threads | 208,799 messages | 20 users
program overview
| week | focus | key metric target | learning outcome |
|---|---|---|---|
| 1 | context quality | +25pp success via file refs | le@swift_solverr writes grounded first messages |
| 2 | conversation rhythm | 2:1 approval:steering ratio | le@swift_solverr maintains healthy thread flow |
| 3 | advanced tools | verification gates in every impl thread | le@swift_solverr uses oracle, spawn, verification |
| 4 | persistence & recovery | 26-50 turn threads without abandonment | le@swift_solverr handles complexity without quitting |
week 1: context quality
learning objectives
- understand why context grounds agent behavior
- master @file reference syntax
- calibrate first-message length (300-1500 chars)
- distinguish effective vs ineffective openers
day 1: file references
the data:
- threads WITH @file references: 66.7% success
- threads WITHOUT: 41.8% success
- delta: +25 percentage points
exercise: rewrite these bad openers:
❌ "make the auth better"
→ rewrite with file references, success criteria
❌ "there's a bug in the api"
→ rewrite with specific file, symptom, expected behavior
checkpoint: complete one real thread with @file in opener
day 2: first-message calibration
the data:
- 300-1500 chars: lowest steering needed
- <150 chars: often too vague
-
1500 chars: paradoxically worse (42.8% success vs 52% at optimal)
exercise: write opening messages for these tasks, hitting 300-1500 chars:
- fix a flaky test
- add a new api endpoint
- refactor a component for accessibility
pattern to learn:
@src/auth/login.ts @src/auth/types.ts
the login handler isn't validating refresh tokens. add validation that
checks expiry and signature before issuing new access tokens.
run `pnpm test src/auth` when done.
day 3: opener style—interrogative vs imperative
the data:
| style | success rate | steering rate |
|---|---|---|
| interrogative (“what…?“) | 69.3% | moderate |
| imperative (“fix X”) | 57% | 0.15 (lowest) |
| declarative (“i think we need…“) | 50% | 0.23 (highest) |
exercise: convert these declaratives to interrogative OR imperative:
❌ "i was thinking maybe we could potentially look at improving the
auth system because it seems like there might be some issues"
✓ "what's causing the token refresh failures in @src/auth/refresh.ts?"
✓ "fix the race condition in handleSubmit by adding a mutex"
rule: questions for exploration, commands for known fixes.
day 4: thread continuity with read_thread
the data: 8/10 golden threads started with explicit parent reference.
pattern:
Continuing work from thread T-019b83ca...
@pkg/simd/simd_bench_test.go @pkg/simd/dispatch_arm64.go
- I just completed SVE implementations
- Committed and pushed
exercise: practice handoff. start a thread, pause deliberately, resume with proper context.
day 5: week 1 assessment
complete a real task thread demonstrating:
- @file references in opener
- 300-1500 char first message
- interrogative or imperative style (not declarative)
- if continuing work, explicit thread reference
success criteria: thread reaches RESOLVED/COMMITTED status
week 2: conversation rhythm
learning objectives
- recognize approval as a navigation tool
- distinguish steering from micro-management
- maintain healthy approval:steering ratio
- use “wait” interrupts appropriately
day 1: approval vocabulary
the data:
- 2:1 approval:steering ratio = healthy thread
- <1:1 ratio = danger zone (FRUSTRATED likely)
- steady_navigator: 3:1 ratio, 67% resolution
- concise_commander: 1.78:1 ratio, 60.5% resolution
approval vocabulary (keep it brief):
- “yes”
- “lgtm”
- “ship it”
- “go on”
- “good”
- “commit”
exercise: practice rapid approval. every time agent does something correct, acknowledge with one word.
day 2: steering patterns
the data: 46.7% of steerings start with “no”
| pattern | when to use |
|---|---|
| ”no, …“ | flat rejection—wrong direction |
| ”wait, …“ | interrupt before agent commits |
| ”don’t …“ | explicit prohibition |
| ”actually, …“ | course correction |
anti-pattern: steering is NOT micro-management. 87% of steerings lead to recovery.
exercise: review a past thread. identify where you steered. was it necessary? could earlier context have prevented it?
day 3: the wait interrupt
the data: concise_commander uses “wait” 20% of the time—catches agent before wrong path solidifies
when to wait:
- agent about to run tests without confirmation
- agent about to push/commit
- agent making assumption about approach
example:
agent: "Now let's run the tests to see if this fixes..."
you: "wait, confirm before running tests"
exercise: practice one thread with deliberate wait interrupts before agent actions.
day 4: steering doom loops
the data: 30% of corrections require another correction
danger signals:
- 2+ consecutive steerings
- approval:steering drops below 1:1
- frustration vocabulary appears (“wtf”, caps)
intervention: after 2 consecutive steerings, STOP. ask:
“are we approaching this wrong? should we step back and reconsider?”
exercise: practice the intervention. deliberately enter a steering loop and practice the recovery phrase.
day 5: week 2 assessment
complete a thread demonstrating:
- 2:1 or better approval:steering ratio
- brief approval vocabulary
- at least one “wait” interrupt if applicable
- recovery from any steering events
success criteria: no consecutive steering events, thread RESOLVED/COMMITTED
week 3: advanced tools
learning objectives
- use oracle for planning AND review (not rescue)
- spawn sub-agents for parallel work
- embed verification gates in implementation threads
- avoid anti-patterns around tool usage
day 1: oracle timing
the data:
| oracle timing | frustration rate |
|---|---|
| early (≤33%) | 1.4% |
| mid (33-66%) | 0.7% |
| late (>66%) | 0% |
anti-pattern: 46% of FRUSTRATED threads use oracle as rescue tool
proper usage:
- planning: invoke oracle BEFORE implementation
- review: invoke oracle AFTER implementation for validation
- debug: invoke oracle when FIRST stuck, not after 10 failed attempts
exercise: use oracle to plan an implementation before writing any code.
day 2: spawn / task delegation
the data:
- optimal spawned tasks: 4-6 (78.6% success)
- Task tool correlates with frustration when overused (61.5% in FRUSTRATED vs 40.5% in RESOLVED)
when to spawn:
spawn agents to:
1. add unit tests for the validator
2. update the README with new usage examples
3. fix the lint errors in /components
when NOT to spawn:
- single logical task
- deep debugging (needs continuity)
- learning unfamiliar code
exercise: identify a task with 3+ independent sub-tasks. practice spawning.
day 3: verification gates
the data:
| metric | with verification | without |
|---|---|---|
| success rate | 78.2% | 61.3% |
| committed rate | 25.4% | 18.1% |
verification checklist for implementation threads:
- run targeted tests before declaring done
- run build/typecheck
- lint check if applicable
- review the diff
pattern:
you: "run `pnpm test src/auth` before committing"
agent: [runs tests]
you: "tests pass, ship it"
exercise: complete an implementation thread with at least 2 verification gates.
day 4: skill usage (underutilized)
the data: dig skill: 1 invocation across 4,656 threads (severely underutilized)
available skills to learn:
dig— systematic debugging with hypothesis-driven analysisspawn— parallel agent orchestrationcoordinate— multi-agent tmux workflowsoracle— deep reasoning and planning
exercise: invoke the dig skill on a real bug. compare to your usual debug approach.
day 5: week 3 assessment
complete a thread demonstrating:
- oracle used for planning OR review (not rescue)
- spawn used for parallel tasks if applicable
- verification gate (test run) before completion
- no premature_completion anti-pattern
success criteria: thread COMMITTED with explicit verification
week 4: persistence & recovery
learning objectives
- calibrate thread length to task complexity
- avoid premature abandonment
- recover from agent anti-patterns
- achieve power-user behaviors
day 1: thread length sweet spot
the data:
| turn range | success rate |
|---|---|
| <10 turns | 14% |
| 10-25 | 42% |
| 26-50 | 75% |
| 51-100 | 65% |
| >100 | 55% |
rule: don’t abandon before 26 turns unless task is complete. commit to the work.
exercise: practice staying with a thread past the “this is annoying” threshold.
day 2: agent anti-patterns recognition
recognize and counter these:
| anti-pattern | signal | counter |
|---|---|---|
| SIMPLIFICATION_ESCAPE | agent removes complexity instead of solving | ”no shortcuts—debug the actual issue” |
| TEST_WEAKENING | agent removes failing assertion | ”never weaken tests—debug the bug” |
| PREMATURE_COMPLETION | agent declares done without tests | ”run full test suite first” |
| HACKING_AROUND | fragile patches | ”look up the proper way” |
exercise: review a past thread. identify any anti-patterns you let slide.
day 3: frustration ladder awareness
escalation stages:
STAGE 1: agent misunderstands → correct early (50% recovery)
STAGE 2: 2+ consecutive corrections → pause and realign (40% recovery)
STAGE 3: expletives appear → start fresh thread (20% recovery)
STAGE 4: caps lock explosion → thread is lost (<10% recovery)
intervention timing matters. correct at stage 1, not stage 3.
exercise: in your next thread, if frustration begins, consciously identify the stage and intervene appropriately.
day 4: power user synthesis
behaviors from top 3 users (82%, 67%, 60.5% resolution):
| behavior | implementation |
|---|---|
| @file references | always in opener |
| domain vocabulary | speak at expert level, don’t over-explain |
| consistent approval | every successful step acknowledged |
| question-driven | socratic guidance keeps agent reasoning visible |
| persistence | don’t quit when it gets hard |
anti-behaviors:
- abandon before 26 turns
- let approval:steering drop below 2:1
- skip verification
- allow agent shortcuts
exercise: complete a complex task (>50 turns) maintaining all power user behaviors.
day 5: graduation assessment
complete a challenging thread demonstrating:
- @file references in opener
- 300-1500 char first message
- 2:1+ approval:steering ratio
- verification gate before completion
- oracle or spawn used appropriately
- 26+ turns if task requires
- no stage 2+ frustration events
graduation criteria: COMMITTED status with clean conversation dynamics
appendix: quick reference cards
opener template
@path/to/file1.ts @path/to/file2.ts
[clear task description, 300-1500 chars]
[success criteria / verification command]
approval vocabulary
yes | lgtm | ship it | go on | good | commit
steering vocabulary
no, ... | wait, ... | don't ... | actually, ...
healthy ratios
- approval:steering > 2:1
- thread length: 26-50 turns optimal
- consecutive steerings: ≤1
verification gates
pnpm test/go test/vitestpnpm build/tsc/cargo check- “review the diff”
- “tests pass” before ship
anti-pattern counters
| pattern | counter phrase |
|---|---|
| shortcuts | ”no shortcuts—solve it properly” |
| test weakening | ”bug is in prod code, not test” |
| premature done | ”run tests first” |
| hacking around | ”read the docs” |
metrics for self-assessment
| metric | healthy | warning | danger |
|---|---|---|---|
| approval:steering ratio | >2:1 | 1-2:1 | <1:1 |
| thread length | 26-50 | 51-100 | <10 or >100 |
| consecutive steerings | 0-1 | 2 | 3+ |
| file refs in opener | present | — | absent |
| verification before ship | yes | — | no |
curriculum developed from empirical analysis | jack_winkleshine