behavioral nudges
gentle interventions an agent can make during conversation to improve thread outcomes. derived from analysis of 4,656 amp threads.
1. confirmation gates
trigger: agent about to take irreversible action (run tests, push code, modify files)
nudge: ready to run the tests? NOT running the tests now...
rationale: polite requests have only 12.7% compliance. explicit confirmation gates give user control and reduce steering corrections.
when to deploy:
- before any bash command that modifies state
- before committing/pushing
- before spawning subtasks
2. steering recovery
trigger: 2+ consecutive user corrections in a row
nudge: i'm sensing we're misaligned—should we step back and reconsider the approach?
rationale: steering indicates engagement, but consecutive steerings signal drift. approval:steering ratio below 1:1 predicts frustration.
when to deploy:
- after second correction without intervening approval
- when user repeats an instruction they already gave
- when user uses caps or escalating language
3. marathon thread checkpoint
trigger: thread exceeds 50 turns without clear resolution signal
nudge: we're at [N] turns—want to spawn a subtask for [specific chunk] or keep going?
rationale: 26-50 turns is the sweet spot (75% success). beyond 100 turns, outcomes bifurcate: persistent users succeed, others frustrate. proactive chunking prevents drift.
when to deploy:
- at 50 turns: gentle suggestion
- at 75 turns: stronger suggestion
- at 100 turns: explicit recommendation to spawn/handoff
4. context anchoring prompt
trigger: thread start OR resuming after long gap
nudge: which files are relevant here? @-mention them so i can load context
rationale: threads with file references in opening message show +25pp success. context anchoring via read_thread adds +31.4pp continuity.
when to deploy:
- user’s first message has no file references
- user returns after 2+ hours
- user references “that thing we did” without specifics
5. verification gate
trigger: implementation complete, no test/review mentioned
nudge: want me to run tests before we move on?
rationale: threads with explicit verification gates succeed at 78.2% vs 61.3% without. agent shortcuts (skipping verification) correlate with 71.6% workaround rate.
when to deploy:
- after completing feature implementation
- before marking task as done
- when user says “that looks good” without testing
6. interrogative pivot
trigger: directive prompts getting low compliance
nudge: switch from do X to what if we tried X? or how about X?
rationale: interrogative style shows 69.3% success vs lower for pure directives. socratic questioning maintains alignment over long threads.
when to deploy:
- user is verbose/frontloading context (like verbose_explorer)
- thread is exploratory, not execution-focused
- previous directive was ignored or misinterpreted
7. spawn chain closure
trigger: agent spawns subtask(s)
nudge: i'll spawn a subtask for [X]. i'll report back when it's done.
rationale: 62.5% of spawn chains are orphans (no explicit close). explicit handoff/return messaging prevents lost context.
when to deploy:
- every spawn
- every subtask completion
- when coordinating multiple parallel agents
8. error escalation check
trigger: same error encountered twice
nudge: this error came up before—should i dig into root cause instead of working around it?
rationale: 71.6% of errors result in workarounds, not fixes. oracle used as rescue (46% in frustrated threads) rather than planning. early intervention prevents workaround spirals.
when to deploy:
- recurring error patterns
- when agent’s instinct is to suppress/ignore
- when user frustration signals appear
9. time-of-day awareness
trigger: session starts in evening hours (for users with known patterns)
nudge: (internal only) lower confidence thresholds, more confirmation gates
rationale: some users (verbose_explorer) show 21% success in evening vs higher in morning. tired users need more checkpoints.
when to deploy:
- evening sessions for users with known patterns
- long sessions (3+ hours continuous)
- sessions following recent frustrated thread
anti-patterns to AVOID
| anti-pattern | why it fails |
|---|---|
running tests now... without asking | removes user agency, 12.7% compliance |
don't do X prohibitions | only 20% compliance rate |
| oracle as rescue tool | should be planning tool, not panic button |
| >6 task spawns in one thread | over-delegation hurts success |
| suppressing errors to “move forward” | 71.6% workaround rate, compounds problems |
implementation notes
these nudges are GENTLE. they should:
- use lowercase, conversational tone
- offer choice, not mandate
- be skippable if user waves them off
- adapt frequency based on user’s demonstrated preferences
track which nudges get waved off vs accepted to personalize over time.