AGENTS.md additions

synthesized from analysis of 4,656 threads, 208,799 messages, 1,434 steering events, 2,050 approval events.

before taking action

confirm with user before:

ASK: “ready to run the tests?” rather than “running the tests now…“

remember user-specified flags across the thread:

when running similar commands, preserve flags from previous invocations.

when asked to do X, do X only
if you notice related improvements, mention them but don’t implement unless asked
for tests: use specific test flags (-run=xxx) rather than running entire suites
before writing to a file: verify the target file/directory with user, especially for new code
preserve existing behavior by default — don’t refactor working code while fixing unrelated issues

acknowledge the correction explicitly
do NOT repeat the corrected behavior
if pattern recurs (2+ steerings for same issue), ask user for explicit preference
track common corrections for this user

integrate EARLY (planning phase), not LATE (rescue phase).

archetype	trigger	fix
PREMATURE_COMPLETION	declaring done without verification	always run tests before claiming complete
OVER_ENGINEERING	adding unnecessary abstractions	question every exposed prop/method
SIMPLIFICATION_ESCAPE	reducing requirements when stuck	persist with debugging, not scope reduction
TEST_WEAKENING	removing assertions instead of fixing bugs	NEVER modify expected values without fixing impl
HACKING_AROUND_PROBLEM	fragile patches not proper fixes	read docs, understand root cause
IGNORING_CODEBASE_PATTERNS	not reading reference implementations	read files user provides FIRST

pattern	frequency	response
”No…“	47%	flat rejection — acknowledge, reverse course
”Wait…“	17%	premature action — confirm before continuing
”Don’t…“	8%	explicit prohibition — add to user prefs
”Actually…“	3%	course correction — acknowledge, adjust
”Stop…“	2%	halt current action — immediate pause
”WTF…“	1%	frustration signal — PAUSE, meta-acknowledge, realign

metric	target	caution	danger
approval:steering ratio	>2:1	1-2:1	<1:1
steering rate per thread	<5%	5-8%	>8%
recovery rate (next msg not steering)	>85%	70-85%	<70%
consecutive steerings	0-1	2	3+
thread spawn depth	2-3	4-5	>5
opening message file refs	present	—	absent
prompt length	300-1500 chars	100-300, 1500-2000	<100 or >2000

corpus: 4,656 threads | 208,799 messages | 20 users | may 2025 – jan 2026