agent compliance analysis
analysis of how often agent follows explicit user instructions across 500 threads (4656 available).
key findings
overall compliance rates
| outcome | count | percentage |
|---|---|---|
| COMPLIED | 1,090 | 16.0% |
| DEVIATED | 726 | 10.7% |
| CLARIFIED | 46 | 0.7% |
| AMBIGUOUS | 4,949 | 72.7% |
baseline: 82.8% of threads contain explicit instructions (414/500).
deviation ratio: of exchanges with clear signals, agent deviates 40% of the time (726 / (726+1090)).
compliance by instruction type
| type | total | complied | deviated | compliance rate |
|---|---|---|---|---|
| ACTION | 10,281 | 2,344 | 909 | 22.8% |
| PROHIBITION | 3,137 | 627 | 371 | 20.0% |
| DIRECTIVE | 2,773 | 549 | 363 | 19.8% |
| SUGGESTION | 2,092 | 738 | 217 | 35.3% |
| CONSTRAINT | 1,569 | 258 | 196 | 16.4% |
| SIMPLIFICATION | 390 | 67 | 65 | 17.2% |
| REQUEST | 245 | 31 | 21 | 12.7% |
| STYLE | 163 | 30 | 5 | 18.4% |
| OUTPUT_DIRECTIVE | 12 | 1 | 1 | 8.3% |
instruction strength distribution
- medium strength: 15,055 (72.8%)
- strong strength: 5,607 (27.2%)
patterns
high-deviation areas
- OUTPUT_DIRECTIVE (8.3% compliance): “write to X”, “save to Y” — agent often forgets or deviates on output location
- REQUEST (12.7% compliance): polite requests (“please X”) get lowest follow-through
- CONSTRAINT (16.4% compliance): “only X” constraints frequently violated
relatively-better areas
- SUGGESTION (35.3% compliance): “should” statements get highest compliance
- ACTION (22.8% compliance): direct verbs (“fix”, “update”) moderately followed
- STYLE (18.4% compliance but low deviation): formatting instructions generally honored
prohibition handling
prohibitions (“don’t”, “never”, “avoid”) have 20% compliance and 11.8% deviation. gap explained by:
- agent often proceeds without acknowledging the prohibition explicitly
- prohibition context lost in multi-step reasoning
- prohibition may conflict with perceived “helpfulness”
interpretation caveats
- high ambiguity rate (72.7%): many exchanges lack clear compliance signals — agent takes action via tools but doesn’t verbally confirm
- false negatives: tool uses may indicate compliance even without verbal confirmation
- context bleeding: instructions from earlier turns may carry forward but aren’t detected per-exchange
- code vs prose: instructions embedded in code blocks or technical context harder to parse
recommendations for users
- use direct verbs: “fix X” outperforms “please fix X”
- repeat constraints: agent better at following reminders
- avoid negatives: “use A” works better than “don’t use B”
- verify output locations: explicitly check file destinations were followed
- steering works: threads with active steering show higher resolution rates (per prior analysis)
recommendations for agent improvement
- prohibition tracking: explicit acknowledgment of “don’t” statements before proceeding
- output verification: confirm file paths match user specification before/after write
- constraint echoing: repeat back constraints to confirm understanding
- polite request parity: treat “please X” same as “X” for action priority
analysis method: regex pattern matching for instruction types, compliance signal detection (positive/negative/clarifying language), tool use counting. raw data: agent-compliance-raw.json
limitations: heuristic-based, ~73% of exchanges classified ambiguous. manual review of sample deviations suggests classification accuracy is moderate.