agent compliance analysis

analysis of how often agent follows explicit user instructions across 500 threads (4656 available).

key findings

overall compliance rates

outcome	count	percentage
COMPLIED	1,090	16.0%
DEVIATED	726	10.7%
CLARIFIED	46	0.7%
AMBIGUOUS	4,949	72.7%

baseline: 82.8% of threads contain explicit instructions (414/500).

deviation ratio: of exchanges with clear signals, agent deviates 40% of the time (726 / (726+1090)).

compliance by instruction type

type	total	complied	deviated	compliance rate
ACTION	10,281	2,344	909	22.8%
PROHIBITION	3,137	627	371	20.0%
DIRECTIVE	2,773	549	363	19.8%
SUGGESTION	2,092	738	217	35.3%
CONSTRAINT	1,569	258	196	16.4%
SIMPLIFICATION	390	67	65	17.2%
REQUEST	245	31	21	12.7%
STYLE	163	30	5	18.4%
OUTPUT_DIRECTIVE	12	1	1	8.3%

instruction strength distribution

medium strength: 15,055 (72.8%)
strong strength: 5,607 (27.2%)

patterns

high-deviation areas

OUTPUT_DIRECTIVE (8.3% compliance): “write to X”, “save to Y” — agent often forgets or deviates on output location
REQUEST (12.7% compliance): polite requests (“please X”) get lowest follow-through
CONSTRAINT (16.4% compliance): “only X” constraints frequently violated

relatively-better areas

SUGGESTION (35.3% compliance): “should” statements get highest compliance
ACTION (22.8% compliance): direct verbs (“fix”, “update”) moderately followed
STYLE (18.4% compliance but low deviation): formatting instructions generally honored

prohibition handling

prohibitions (“don’t”, “never”, “avoid”) have 20% compliance and 11.8% deviation. gap explained by:

agent often proceeds without acknowledging the prohibition explicitly
prohibition context lost in multi-step reasoning
prohibition may conflict with perceived “helpfulness”

interpretation caveats

high ambiguity rate (72.7%): many exchanges lack clear compliance signals — agent takes action via tools but doesn’t verbally confirm
false negatives: tool uses may indicate compliance even without verbal confirmation
context bleeding: instructions from earlier turns may carry forward but aren’t detected per-exchange
code vs prose: instructions embedded in code blocks or technical context harder to parse

recommendations for users

use direct verbs: “fix X” outperforms “please fix X”
repeat constraints: agent better at following reminders
avoid negatives: “use A” works better than “don’t use B”
verify output locations: explicitly check file destinations were followed
steering works: threads with active steering show higher resolution rates (per prior analysis)

recommendations for agent improvement

prohibition tracking: explicit acknowledgment of “don’t” statements before proceeding
output verification: confirm file paths match user specification before/after write
constraint echoing: repeat back constraints to confirm understanding
polite request parity: treat “please X” same as “X” for action priority

analysis method: regex pattern matching for instruction types, compliance signal detection (positive/negative/clarifying language), tool use counting. raw data: agent-compliance-raw.json

limitations: heuristic-based, ~73% of exchanges classified ambiguous. manual review of sample deviations suggests classification accuracy is moderate.