oracle timing analysis
overview
analyzed oracle mentions in assistant messages across 757 threads that invoked the oracle tool.
key findings
oracle usage correlates with task complexity, not success rate
| outcome | threads w/ oracle | total threads | % using oracle |
|---|---|---|---|
| RESOLVED | 518 | 2745 | 18.9% |
| COMMITTED | 68 | 305 | 22.3% |
| HANDOFF | 56 | 75 | 74.7% |
| FRUSTRATED | 6 | 14 | 42.9% |
note: HANDOFF oracle count (56) appears inflated — may include misclassified subagent threads. 74.7% rate is suspect.
hunch: high oracle usage in FRUSTRATED threads (42.9%) suggests oracle is invoked when tasks are genuinely difficult—not that oracle causes frustration.
timing: early oracle use slightly correlates with frustration
| first oracle position | RESOLVED | COMMITTED | HANDOFF | FRUSTRATED |
|---|---|---|---|---|
| early (≤33%) | 78.8% | 12.5% | 7.3% | 1.4% |
| mid (33-66%) | 80.3% | 7.2% | 11.8% | 0.7% |
| late (>66%) | 82.8% | 8.6% | 8.6% | 0.0% |
early oracle → 1.4% frustration rate
mid oracle → 0.7% frustration rate
late oracle → 0% frustration rate
interpretation: late oracle invocation (for review/validation) is safest. early oracle (for planning) carries slight frustration risk—likely because early invocation happens on harder tasks.
oracle frequency vs outcome
| oracle calls | RESOLVED | COMMITTED | HANDOFF | FRUSTRATED |
|---|---|---|---|---|
| 1 | 89 | 12 | 18 | 0 |
| 2-3 | 209 | 35 | 23 | 2 |
| 4-6 | 105 | 10 | 11 | 3 |
| 7+ | 115 | 11 | 4 | 1 |
moderate oracle use (2-3 calls) is most common in successful threads. high frequency (7+) often indicates complex tasks but doesn’t hurt outcomes.
frustrated threads: oracle patterns
| thread | turns | oracle count | oracle turns | pattern |
|---|---|---|---|---|
| Scoped context isolation | 160 | 6 | 1,2,10,11,24,25 | early+mid |
| Hilbert clustering | 80 | 5 | 4,5,30,31,33 | early+mid |
| Debug TestService | 133 | 8 | 8,9,33,34,35,40,103,104 | spread |
| GitDiffView | 47 | 6 | 6,7,10,34,39,40 | spread |
8/14 frustrated threads never invoked oracle. the 6 that did show repeated early invocations—suggesting they were stuck and repeatedly sought guidance.
conclusions
- oracle timing matters less than task difficulty — frustrated threads invoke oracle heavily because they’re hard, not because oracle makes them harder
- late oracle = code review — 82.8% success rate for late-first invocations. use for validation
- early oracle = planning on hard tasks — slight frustration correlation is selection bias
- no oracle ≠ safety — 8/14 frustrated threads never used oracle; lack of oracle didn’t prevent frustration
recommendations
- no evidence to avoid early oracle invocation
- oracle usage is a reasonable proxy for task complexity
- threads with 0 oracle calls on complex tasks may benefit from invoking it