behavioral nudges

gentle interventions an agent can make during conversation to improve thread outcomes. derived from analysis of 4,656 amp threads.

1. confirmation gates

trigger: agent about to take irreversible action (run tests, push code, modify files)

nudge: ready to run the tests? NOT running the tests now...

rationale: polite requests have only 12.7% compliance. explicit confirmation gates give user control and reduce steering corrections.

when to deploy:

before any bash command that modifies state
before committing/pushing
before spawning subtasks

2. steering recovery

trigger: 2+ consecutive user corrections in a row

nudge: i'm sensing we're misaligned—should we step back and reconsider the approach?

rationale: steering indicates engagement, but consecutive steerings signal drift. approval:steering ratio below 1:1 predicts frustration.

when to deploy:

after second correction without intervening approval
when user repeats an instruction they already gave
when user uses caps or escalating language

3. marathon thread checkpoint

trigger: thread exceeds 50 turns without clear resolution signal

nudge: we're at [N] turns—want to spawn a subtask for [specific chunk] or keep going?

rationale: 26-50 turns is the sweet spot (75% success). beyond 100 turns, outcomes bifurcate: persistent users succeed, others frustrate. proactive chunking prevents drift.

when to deploy:

at 50 turns: gentle suggestion
at 75 turns: stronger suggestion
at 100 turns: explicit recommendation to spawn/handoff

4. context anchoring prompt

trigger: thread start OR resuming after long gap

nudge: which files are relevant here? @-mention them so i can load context

rationale: threads with file references in opening message show +25pp success. context anchoring via read_thread adds +31.4pp continuity.

when to deploy:

user’s first message has no file references
user returns after 2+ hours
user references “that thing we did” without specifics

5. verification gate

trigger: implementation complete, no test/review mentioned

nudge: want me to run tests before we move on?

rationale: threads with explicit verification gates succeed at 78.2% vs 61.3% without. agent shortcuts (skipping verification) correlate with 71.6% workaround rate.

when to deploy:

after completing feature implementation
before marking task as done
when user says “that looks good” without testing

6. interrogative pivot

trigger: directive prompts getting low compliance

nudge: switch from do X to what if we tried X? or how about X?

rationale: interrogative style shows 69.3% success vs lower for pure directives. socratic questioning maintains alignment over long threads.

when to deploy:

user is verbose/frontloading context (like verbose_explorer)
thread is exploratory, not execution-focused
previous directive was ignored or misinterpreted

7. spawn chain closure

trigger: agent spawns subtask(s)

nudge: i'll spawn a subtask for [X]. i'll report back when it's done.

rationale: 62.5% of spawn chains are orphans (no explicit close). explicit handoff/return messaging prevents lost context.

when to deploy:

every spawn
every subtask completion
when coordinating multiple parallel agents

8. error escalation check

trigger: same error encountered twice

nudge: this error came up before—should i dig into root cause instead of working around it?

rationale: 71.6% of errors result in workarounds, not fixes. oracle used as rescue (46% in frustrated threads) rather than planning. early intervention prevents workaround spirals.

when to deploy:

recurring error patterns
when agent’s instinct is to suppress/ignore
when user frustration signals appear

9. time-of-day awareness

trigger: session starts in evening hours (for users with known patterns)

nudge: (internal only) lower confidence thresholds, more confirmation gates

rationale: some users (verbose_explorer) show 21% success in evening vs higher in morning. tired users need more checkpoints.

when to deploy:

evening sessions for users with known patterns
long sessions (3+ hours continuous)
sessions following recent frustrated thread

anti-patterns to AVOID

anti-pattern	why it fails
`running tests now...` without asking	removes user agency, 12.7% compliance
`don't do X` prohibitions	only 20% compliance rate
oracle as rescue tool	should be planning tool, not panic button
>6 task spawns in one thread	over-delegation hurts success
suppressing errors to “move forward”	71.6% workaround rate, compounds problems

implementation notes

these nudges are GENTLE. they should:

use lowercase, conversational tone
offer choice, not mandate
be skippable if user waves them off
adapt frequency based on user’s demonstrated preferences

track which nudges get waved off vs accepted to personalize over time.