oracle timing analysis

overview

analyzed oracle mentions in assistant messages across 757 threads that invoked the oracle tool.

key findings

oracle usage correlates with task complexity, not success rate

outcome	threads w/ oracle	total threads	% using oracle
RESOLVED	518	2745	18.9%
COMMITTED	68	305	22.3%
HANDOFF	56	75	74.7%
FRUSTRATED	6	14	42.9%

note: HANDOFF oracle count (56) appears inflated — may include misclassified subagent threads. 74.7% rate is suspect.

hunch: high oracle usage in FRUSTRATED threads (42.9%) suggests oracle is invoked when tasks are genuinely difficult—not that oracle causes frustration.

timing: early oracle use slightly correlates with frustration

first oracle position	RESOLVED	COMMITTED	HANDOFF	FRUSTRATED
early (≤33%)	78.8%	12.5%	7.3%	1.4%
mid (33-66%)	80.3%	7.2%	11.8%	0.7%
late (>66%)	82.8%	8.6%	8.6%	0.0%

early oracle → 1.4% frustration rate
mid oracle → 0.7% frustration rate
late oracle → 0% frustration rate

interpretation: late oracle invocation (for review/validation) is safest. early oracle (for planning) carries slight frustration risk—likely because early invocation happens on harder tasks.

oracle frequency vs outcome

oracle calls	RESOLVED	COMMITTED	HANDOFF	FRUSTRATED
1	89	12	18	0
2-3	209	35	23	2
4-6	105	10	11	3
7+	115	11	4	1

moderate oracle use (2-3 calls) is most common in successful threads. high frequency (7+) often indicates complex tasks but doesn’t hurt outcomes.

frustrated threads: oracle patterns

thread	turns	oracle count	oracle turns	pattern
Scoped context isolation	160	6	1,2,10,11,24,25	early+mid
Hilbert clustering	80	5	4,5,30,31,33	early+mid
Debug TestService	133	8	8,9,33,34,35,40,103,104	spread
GitDiffView	47	6	6,7,10,34,39,40	spread

8/14 frustrated threads never invoked oracle. the 6 that did show repeated early invocations—suggesting they were stuck and repeatedly sought guidance.

conclusions

oracle timing matters less than task difficulty — frustrated threads invoke oracle heavily because they’re hard, not because oracle makes them harder
late oracle = code review — 82.8% success rate for late-first invocations. use for validation
early oracle = planning on hard tasks — slight frustration correlation is selection bias
no oracle ≠ safety — 8/14 frustrated threads never used oracle; lack of oracle didn’t prevent frustration

recommendations

no evidence to avoid early oracle invocation
oracle usage is a reasonable proxy for task complexity
threads with 0 oracle calls on complex tasks may benefit from invoking it