pattern moderate impact

signal strength ranking

@agent_sign

signal strength ranking

predictive power for thread resolution, ranked by effect size and reliability.

tier 1: STRONG PREDICTORS (>20pp effect)

signal	effect	evidence
approval:steering ratio	`>4:1` → COMMITTED, `<1:1` → FRUSTRATED	clearest single predictor; maps directly to outcome buckets
file references in opener	+25pp success (66.7% vs 41.8%)	high n, consistent across users
verification gates present	+17pp success (78.2% vs 61.3%)	causal mechanism clear (catches errors early)
wtf/profanity rate	33% in FRUSTRATED vs 3.5% in RESOLVED	~10x difference; lagging indicator but strong
consecutive steerings	2+ = doom spiral predictor	precedes frustration by 2-5 turns; actionable

tier 2: MODERATE PREDICTORS (10-20pp effect)

signal	effect	evidence
interrogative prompting style	69.3% vs 46.4% (directive)	+23pp but confounded with user skill
thread length 26-50 turns	75% success (sweet spot)	below or above hurts; u-shaped curve
task delegation 2-6 per thread	77-79% resolution	11+ tasks → 58%; diminishing returns
agent shortcut detection	earliest frustration signal (2-5 turns ahead)	LEADING indicator, hard to operationalize
steering presence (any)	60% vs 37% without steering	steering = engagement, not failure

tier 3: WEAK BUT CONSISTENT (5-10pp effect)

signal	effect	evidence
time of day	60%+ (2-5am, 6-9am) vs 27.5% (6-9pm)	+33pp spread, but confounded with user/task type
weekend premium	+5.2pp vs weekday	consistent but small
prompt length 300-1500 chars	.20-.21 steering rate (lowest)	optimal information density
question density <5%	76% success	low questions = clear task framing

tier 4: CONTEXTUAL SIGNALS (effect depends on situation)

signal	context	notes
oracle usage	higher in FRUSTRATED (46% vs 25%)	rescue tool, not planning tool; signal of struggle
thread length >100 turns	marathon debugging	increases frustration risk but not deterministic
opening word patterns	”please” → 100%, “im”/“following:” → frustration	high variance, small n on some
user archetype	@concise_commander 60.5%, @verbose_explorer 83% (corrected)	user skill confounds task difficulty

tier 5: TRAILING/DIAGNOSTIC (not predictive, but diagnostic)

signal	use case
closing ritual type	post-hoc classification only
COMMITTED thread length	40% shorter than RESOLVED; confirms efficiency
orphaned spawn rate (62.5%)	process smell, not resolution predictor
error suppression rate (71.6%)	agent behavior audit, not live prediction

actionable hierarchy

for REAL-TIME intervention:

watch approval:steering ratio (tier 1)
detect consecutive steerings (tier 1)
check for verification gates (tier 1)

for PROMPT ENGINEERING:

include file references (tier 1)
use interrogative style (tier 2)
target 300-1500 chars (tier 3)

for AGENT CONFIGURATION:

enforce verification gates
limit task delegation to 2-6
discourage oracle as rescue tool

confidence notes

tier 1 signals have both high effect size AND mechanistic explanation
tier 2 signals have effect size but potential confounds
tier 3-4 require larger n or controlled experiments to confirm causality
user archetype effects likely confounded with task complexity selection