PRReviewETraceMiddleware attacks a different cost pool than prompt caching or tool-output compression. Where those levers shrink per-call cost, this one cuts the number of calls per review — going from a multi-turn investigation loop to a single synthesis call when the new PR is similar to one you’ve already reviewed.
It’s provider-agnostic. The mechanism is just “inject a prescriptive hint into the first user message,” which works the same way on GPT, Claude, and any model that follows instructions.
How it works
The middleware ties a trajectory store to your agent loop:- Every successful PR review is persisted into a store (
PRReviewETraceStore— local JSON for prototypes,CodebaseMemoryfor production). - On a new PR, the middleware embeds the PR’s title + description + diff signature and retrieves the top-K similar past reviews above a cosine threshold.
- If a hit clears the gate, the middleware injects the past review as a prescriptive hint before the first model call — “verify the diff matches, emit this JSON, don’t open files.”
- The agent recognizes the pattern and emits the review in a single synthesis call.
- On a cache miss (cold corpus, low similarity), the middleware is a no-op — zero penalty.
render_skeleton_hint.
Quickstart
Three modes
Passetrace_mode=... on for_code_review(...) to control how prescriptive the injected hint is:
| mode | hint shape | when to use |
|---|---|---|
| skeleton (default) | Full past review JSON + “verify and emit, don’t investigate” | Best on exact-or-near-exact matches. Most prescriptive. |
| pattern | Just the bug-pattern tags (e.g. "silent_except", "math_edge_case") from past reviews | Safer on partial matches. Agent verifies each pattern against the current diff. Smaller per-hit savings but lower regression risk. |
| adapt | A cheap-model (gpt-4o-mini) adapts the past review to the new PR before injection | Two-stage; costs ~$0.001 per hit. Use when retrieved trajectories are semantically similar but textually different. |
Why the lever holds across providers
E-trace doesn’t shrink per-call tokens — it eliminates calls. The percentage saving from goingN calls → 1 call is roughly the same regardless of which model is doing those calls. The dollar saving scales with how expensive your model is, not which provider it’s from.
This makes E-trace compose cleanly with provider caching, which targets a disjoint pool (per-call input):
| lever | what it cuts | typical impact |
|---|---|---|
| Provider prompt caching | per-call input tokens | varies by provider, see Prompt caching |
| E-trace skeleton injection | number of LLM calls per review | depends on cache hit rate against your past-review corpus |
Expected effective savings
Real-world impact iscache_hit_rate × per-hit-saving. Both vary by workload:
- Bot-generated PRs (Dependabot / Renovate / similar template-heavy bots) — high repeat structure, high hit rate, big effective savings.
- Mature codebase with recurring patterns — moderate hit rate; lever pays off as the corpus grows.
- Brand-new repo / fully diverse PRs — low hit rate; lever is a no-op, no penalty.
When NOT to use it
- No history yet — empty store, no retrieval hits, middleware is a no-op. Safe to leave on while you accumulate data.
- Adversarial / security-critical reviews — you probably want the agent to do fresh investigation rather than rely on prior conclusions.
- PRs that are structurally one-of-a-kind — every retrieval misses the gate, middleware never fires.
What pairs with it
Two complementary levers in the SDK that target disjoint cost pools:- Prompt caching — cuts per-call input cost on the turns that DO happen.
- Code-review mode — the validated SWE-bench Pro D-arm monitor + tool-saving stack. E-trace stacks on top of it.
See also
- Code-review mode — the validated SWE-bench Pro D-arm stack
- Codebase memory — the production-grade store that backs E-trace persistence
- Prompt caching — provider caching that composes cleanly with E-trace

