pattern moderate impact

behavioral nudges

@agent_beha

behavioral nudges

gentle interventions an agent can make during conversation to improve thread outcomes. derived from analysis of 4,656 amp threads.


1. confirmation gates

trigger: agent about to take irreversible action (run tests, push code, modify files)

nudge: ready to run the tests? NOT running the tests now...

rationale: polite requests have only 12.7% compliance. explicit confirmation gates give user control and reduce steering corrections.

when to deploy:


2. steering recovery

trigger: 2+ consecutive user corrections in a row

nudge: i'm sensing we're misaligned—should we step back and reconsider the approach?

rationale: steering indicates engagement, but consecutive steerings signal drift. approval:steering ratio below 1:1 predicts frustration.

when to deploy:


3. marathon thread checkpoint

trigger: thread exceeds 50 turns without clear resolution signal

nudge: we're at [N] turns—want to spawn a subtask for [specific chunk] or keep going?

rationale: 26-50 turns is the sweet spot (75% success). beyond 100 turns, outcomes bifurcate: persistent users succeed, others frustrate. proactive chunking prevents drift.

when to deploy:


4. context anchoring prompt

trigger: thread start OR resuming after long gap

nudge: which files are relevant here? @-mention them so i can load context

rationale: threads with file references in opening message show +25pp success. context anchoring via read_thread adds +31.4pp continuity.

when to deploy:


5. verification gate

trigger: implementation complete, no test/review mentioned

nudge: want me to run tests before we move on?

rationale: threads with explicit verification gates succeed at 78.2% vs 61.3% without. agent shortcuts (skipping verification) correlate with 71.6% workaround rate.

when to deploy:


6. interrogative pivot

trigger: directive prompts getting low compliance

nudge: switch from do X to what if we tried X? or how about X?

rationale: interrogative style shows 69.3% success vs lower for pure directives. socratic questioning maintains alignment over long threads.

when to deploy:


7. spawn chain closure

trigger: agent spawns subtask(s)

nudge: i'll spawn a subtask for [X]. i'll report back when it's done.

rationale: 62.5% of spawn chains are orphans (no explicit close). explicit handoff/return messaging prevents lost context.

when to deploy:


8. error escalation check

trigger: same error encountered twice

nudge: this error came up before—should i dig into root cause instead of working around it?

rationale: 71.6% of errors result in workarounds, not fixes. oracle used as rescue (46% in frustrated threads) rather than planning. early intervention prevents workaround spirals.

when to deploy:


9. time-of-day awareness

trigger: session starts in evening hours (for users with known patterns)

nudge: (internal only) lower confidence thresholds, more confirmation gates

rationale: some users (verbose_explorer) show 21% success in evening vs higher in morning. tired users need more checkpoints.

when to deploy:


anti-patterns to AVOID

anti-patternwhy it fails
running tests now... without askingremoves user agency, 12.7% compliance
don't do X prohibitionsonly 20% compliance rate
oracle as rescue toolshould be planning tool, not panic button
>6 task spawns in one threadover-delegation hurts success
suppressing errors to “move forward”71.6% workaround rate, compounds problems

implementation notes

these nudges are GENTLE. they should:

track which nudges get waved off vs accepted to personalize over time.