pattern moderate impact

measurement framework

@agent_meas

MEASUREMENT FRAMEWORK

operational KPIs for amp thread quality monitoring


OVERVIEW

this framework defines what to measure, how often, and baseline targets derived from 4,656 thread analysis.


TIER 1: CRITICAL KPIS (daily tracking)

1.1 resolution rate

metricbaselinetargetred line
RESOLVED+COMMITTED %51%>60%<40%
FRUSTRATED %<1%<0.5%>2%

how to measure: classify thread outcome at close. count by status daily.

data source: thread metadata, closing message classification


1.2 approval:steering ratio

metricbaselinetargetred line
ratio (team avg)~2.5:1>3:1<1.5:1
steering density~5%<5%>8%

how to measure: count user messages classified as APPROVAL vs STEERING per thread. aggregate weekly by user.

data source: user message classification (imperative detection, correction phrases)


1.3 thread length distribution

zonecurrent %target %action if violated
<10 turns~15%<10%flag as abandoned
26-50 turns (sweet spot)~20%>30%optimize toward
>100 turns~8%<5%mandatory handoff

how to measure: count turns per thread at close. bucket into zones.


TIER 2: QUALITY SIGNALS (weekly tracking)

2.1 prompt quality

signalbaselinetargetmeasurement
opener 300-1500 chars~40%>60%first user message length
file refs in opener~25%>40%@ or file path in first msg
interrogative/descriptive style~50%>65%sentence structure classification

2.2 tool usage health

metricbaselinetargetred line
Task tool usage (2-6/thread)~35%>50%<20%
oracle for planning (not rescue)~25%>40%track early vs late invocation
skill invocationslowincreaseespecially dig skill

2.3 verification gates

metricbaselinetarget
threads with verification~40%>60%
build/test run before close~50%>70%

TIER 3: BEHAVIORAL PATTERNS (monthly tracking)

3.1 anti-pattern frequency

patterncurrent ratetargetdetection method
SHORTCUT_TAKING~30% of frustrated<10%code review signals
TEST_WEAKENING~20% of frustrated0%assertion removal detection
PREMATURE_COMPLETIONcommonreduce 50%“done” before verification
NO_DELEGATION~40%<25%threads with 0 Task calls

track per-user monthly:

metricpurpose
resolution rateindividual effectiveness
avg turns to resolutionefficiency
steering densitycollaboration quality
handoff ratetask scoping issues

3.3 temporal patterns

metricbaselinemonitoring purpose
6-9pm resolution rate27.5%avoid critical work
weekend delta+5.2ppconfirm pattern holds
msgs/hr distributionvariespace optimization

BASELINE VALUES (from 4,656 threads)

outcome distribution (current state)

status%count
RESOLVED59%2,745
UNKNOWN33%1,560
HANDOFF1.6%75
COMMITTED7%305
EXPLORATORY3%124
FRUSTRATED<1%14

success thresholds (validated)

metricgreenyellowred
turns26-5010-25 or 51-100<10 or >100
approval:steering>2:11-2:1<1:1
steering density<5%5-8%>8%
prompt length300-1500100-300 or 1500-3000<100 or >3000
Task usage2-61 or 7-100 or 11+

MEASUREMENT CADENCE

daily

weekly

monthly


ALERTING THRESHOLDS

immediate action required

conditionaction
2+ FRUSTRATED threads in 24hroot cause analysis
user approval:steering <1:1 for 3+ threadsintervention/coaching
>50% threads <10 turns for a usercheck prompt quality
steering→steering transition >40%systemic issue

weekly review triggers

conditionreview
resolution rate drops >10ppinvestigate pattern shift
new anti-pattern clusterupdate catalog
Task usage <20%training opportunity

DATA COLLECTION REQUIREMENTS

per thread (automatic)

thread_id
user_id
start_timestamp
end_timestamp
turn_count
outcome_status
first_msg_length
file_refs_in_opener
tools_used: { task_count, oracle_count, skill_invocations }
verification_present: bool

per message (automatic)

message_id
thread_id
role: user|assistant
timestamp
char_count
classification: approval|steering|neutral|question

derived (computed)

approval_steering_ratio
steering_density
msgs_per_hour
time_to_resolution
question_density

SUCCESS CRITERIA FOR FRAMEWORK

this framework succeeds if:

  1. FRUSTRATED threads trend to 0 (currently 14/4656)
  2. resolution rate increases to >60% (currently 51%)
  3. sweet spot (26-50 turns) threads increase to >30%
  4. approval:steering ratio team avg >3:1
  5. anti-pattern recurrence decreases measurably

IMPLEMENTATION PRIORITY

  1. week 1: instrument basic outcome tracking (status, turns)
  2. week 2: add message classification (approval/steering)
  3. week 3: prompt quality signals
  4. week 4: tool usage tracking
  5. ongoing: anti-pattern detection refinement

framework derived from analysis of 4,656 amp threads