amp training curriculum: 4-week onboarding program

evidence-based curriculum distilled from 4,656 threads | 208,799 messages | 20 users

program overview

week	focus	key metric target	learning outcome
1	context quality	+25pp success via file refs	le@swift_solverr writes grounded first messages
2	conversation rhythm	2:1 approval:steering ratio	le@swift_solverr maintains healthy thread flow
3	advanced tools	verification gates in every impl thread	le@swift_solverr uses oracle, spawn, verification
4	persistence & recovery	26-50 turn threads without abandonment	le@swift_solverr handles complexity without quitting

week 1: context quality

learning objectives

understand why context grounds agent behavior
master @file reference syntax
calibrate first-message length (300-1500 chars)
distinguish effective vs ineffective openers

day 1: file references

the data:

threads WITH @file references: 66.7% success
threads WITHOUT: 41.8% success
delta: +25 percentage points

exercise: rewrite these bad openers:

❌ "make the auth better"

→ rewrite with file references, success criteria

❌ "there's a bug in the api"

→ rewrite with specific file, symptom, expected behavior

checkpoint: complete one real thread with @file in opener

day 2: first-message calibration

the data:

300-1500 chars: lowest steering needed
<150 chars: often too vague
1500 chars: paradoxically worse (42.8% success vs 52% at optimal)

exercise: write opening messages for these tasks, hitting 300-1500 chars:

fix a flaky test
add a new api endpoint
refactor a component for accessibility

pattern to learn:

@src/auth/login.ts @src/auth/types.ts

the login handler isn't validating refresh tokens. add validation that 
checks expiry and signature before issuing new access tokens.

run `pnpm test src/auth` when done.

day 3: opener style—interrogative vs imperative

the data:

style	success rate	steering rate
interrogative (“what…?“)	69.3%	moderate
imperative (“fix X”)	57%	0.15 (lowest)
declarative (“i think we need…“)	50%	0.23 (highest)

exercise: convert these declaratives to interrogative OR imperative:

❌ "i was thinking maybe we could potentially look at improving the 
auth system because it seems like there might be some issues"

✓ "what's causing the token refresh failures in @src/auth/refresh.ts?"
✓ "fix the race condition in handleSubmit by adding a mutex"

rule: questions for exploration, commands for known fixes.

day 4: thread continuity with read_thread

the data: 8/10 golden threads started with explicit parent reference.

pattern:

Continuing work from thread T-019b83ca...
@pkg/simd/simd_bench_test.go @pkg/simd/dispatch_arm64.go

- I just completed SVE implementations
- Committed and pushed

exercise: practice handoff. start a thread, pause deliberately, resume with proper context.

day 5: week 1 assessment

complete a real task thread demonstrating:

@file references in opener
300-1500 char first message
interrogative or imperative style (not declarative)
if continuing work, explicit thread reference

success criteria: thread reaches RESOLVED/COMMITTED status

week 2: conversation rhythm

learning objectives

recognize approval as a navigation tool
distinguish steering from micro-management
maintain healthy approval:steering ratio
use “wait” interrupts appropriately

day 1: approval vocabulary

the data:

2:1 approval:steering ratio = healthy thread
<1:1 ratio = danger zone (FRUSTRATED likely)
steady_navigator: 3:1 ratio, 67% resolution
concise_commander: 1.78:1 ratio, 60.5% resolution

approval vocabulary (keep it brief):

“yes”
“lgtm”
“ship it”
“go on”
“good”
“commit”

exercise: practice rapid approval. every time agent does something correct, acknowledge with one word.

day 2: steering patterns

the data: 46.7% of steerings start with “no”

pattern	when to use
”no, …“	flat rejection—wrong direction
”wait, …“	interrupt before agent commits
”don’t …“	explicit prohibition
”actually, …“	course correction

anti-pattern: steering is NOT micro-management. 87% of steerings lead to recovery.

exercise: review a past thread. identify where you steered. was it necessary? could earlier context have prevented it?

day 3: the wait interrupt

the data: concise_commander uses “wait” 20% of the time—catches agent before wrong path solidifies

when to wait:

agent about to run tests without confirmation
agent about to push/commit
agent making assumption about approach

example:

agent: "Now let's run the tests to see if this fixes..."
you: "wait, confirm before running tests"

exercise: practice one thread with deliberate wait interrupts before agent actions.

day 4: steering doom loops

the data: 30% of corrections require another correction

danger signals:

2+ consecutive steerings
approval:steering drops below 1:1
frustration vocabulary appears (“wtf”, caps)

intervention: after 2 consecutive steerings, STOP. ask:

“are we approaching this wrong? should we step back and reconsider?”

exercise: practice the intervention. deliberately enter a steering loop and practice the recovery phrase.

day 5: week 2 assessment

complete a thread demonstrating:

2:1 or better approval:steering ratio
brief approval vocabulary
at least one “wait” interrupt if applicable
recovery from any steering events

success criteria: no consecutive steering events, thread RESOLVED/COMMITTED

week 3: advanced tools

learning objectives

use oracle for planning AND review (not rescue)
spawn sub-agents for parallel work
embed verification gates in implementation threads
avoid anti-patterns around tool usage

day 1: oracle timing

the data:

oracle timing	frustration rate
early (≤33%)	1.4%
mid (33-66%)	0.7%
late (>66%)	0%

anti-pattern: 46% of FRUSTRATED threads use oracle as rescue tool

proper usage:

planning: invoke oracle BEFORE implementation
review: invoke oracle AFTER implementation for validation
debug: invoke oracle when FIRST stuck, not after 10 failed attempts

exercise: use oracle to plan an implementation before writing any code.

day 2: spawn / task delegation

the data:

optimal spawned tasks: 4-6 (78.6% success)
Task tool correlates with frustration when overused (61.5% in FRUSTRATED vs 40.5% in RESOLVED)

when to spawn:

spawn agents to:
1. add unit tests for the validator
2. update the README with new usage examples
3. fix the lint errors in /components

when NOT to spawn:

single logical task
deep debugging (needs continuity)
learning unfamiliar code

exercise: identify a task with 3+ independent sub-tasks. practice spawning.

day 3: verification gates

the data:

metric	with verification	without
success rate	78.2%	61.3%
committed rate	25.4%	18.1%

verification checklist for implementation threads:

run targeted tests before declaring done
run build/typecheck
lint check if applicable
review the diff

pattern:

you: "run `pnpm test src/auth` before committing"
agent: [runs tests]
you: "tests pass, ship it"

exercise: complete an implementation thread with at least 2 verification gates.

day 4: skill usage (underutilized)

the data: dig skill: 1 invocation across 4,656 threads (severely underutilized)

available skills to learn:

dig — systematic debugging with hypothesis-driven analysis
spawn — parallel agent orchestration
coordinate — multi-agent tmux workflows
oracle — deep reasoning and planning

exercise: invoke the dig skill on a real bug. compare to your usual debug approach.

day 5: week 3 assessment

complete a thread demonstrating:

oracle used for planning OR review (not rescue)
spawn used for parallel tasks if applicable
verification gate (test run) before completion
no premature_completion anti-pattern

success criteria: thread COMMITTED with explicit verification

week 4: persistence & recovery

learning objectives

calibrate thread length to task complexity
avoid premature abandonment
recover from agent anti-patterns
achieve power-user behaviors

day 1: thread length sweet spot

the data:

turn range	success rate
<10 turns	14%
10-25	42%
26-50	75%
51-100	65%
>100	55%

rule: don’t abandon before 26 turns unless task is complete. commit to the work.

exercise: practice staying with a thread past the “this is annoying” threshold.

day 2: agent anti-patterns recognition

recognize and counter these:

anti-pattern	signal	counter
SIMPLIFICATION_ESCAPE	agent removes complexity instead of solving	”no shortcuts—debug the actual issue”
TEST_WEAKENING	agent removes failing assertion	”never weaken tests—debug the bug”
PREMATURE_COMPLETION	agent declares done without tests	”run full test suite first”
HACKING_AROUND	fragile patches	”look up the proper way”

exercise: review a past thread. identify any anti-patterns you let slide.

day 3: frustration ladder awareness

escalation stages:

STAGE 1: agent misunderstands → correct early (50% recovery)
STAGE 2: 2+ consecutive corrections → pause and realign (40% recovery)
STAGE 3: expletives appear → start fresh thread (20% recovery)
STAGE 4: caps lock explosion → thread is lost (<10% recovery)

intervention timing matters. correct at stage 1, not stage 3.

exercise: in your next thread, if frustration begins, consciously identify the stage and intervene appropriately.

day 4: power user synthesis

behaviors from top 3 users (82%, 67%, 60.5% resolution):

behavior	implementation
@file references	always in opener
domain vocabulary	speak at expert level, don’t over-explain
consistent approval	every successful step acknowledged
question-driven	socratic guidance keeps agent reasoning visible
persistence	don’t quit when it gets hard

anti-behaviors:

abandon before 26 turns
let approval:steering drop below 2:1
skip verification
allow agent shortcuts

exercise: complete a complex task (>50 turns) maintaining all power user behaviors.

day 5: graduation assessment

complete a challenging thread demonstrating:

@file references in opener
300-1500 char first message
2:1+ approval:steering ratio
verification gate before completion
oracle or spawn used appropriately
26+ turns if task requires
no stage 2+ frustration events

graduation criteria: COMMITTED status with clean conversation dynamics

appendix: quick reference cards

opener template

@path/to/file1.ts @path/to/file2.ts

[clear task description, 300-1500 chars]
[success criteria / verification command]

approval vocabulary

steering vocabulary

no, ... | wait, ... | don't ... | actually, ...

healthy ratios

approval:steering > 2:1
thread length: 26-50 turns optimal
consecutive steerings: ≤1

verification gates

pnpm test / go test / vitest
pnpm build / tsc / cargo check
“review the diff”
“tests pass” before ship

anti-pattern counters

pattern	counter phrase
shortcuts	”no shortcuts—solve it properly”
test weakening	”bug is in prod code, not test”
premature done	”run tests first”
hacking around	”read the docs”

metrics for self-assessment

metric	healthy	warning	danger
approval:steering ratio	>2:1	1-2:1	<1:1
thread length	26-50	51-100	<10 or >100
consecutive steerings	0-1	2	3+
file refs in opener	present	—	absent
verification before ship	yes	—	no

curriculum developed from empirical analysis | jack_winkleshine