Skip to main content
PRReviewETraceMiddleware attacks a different cost pool than prompt caching or tool-output compression. Where those levers shrink per-call cost, this one cuts the number of calls per review — going from a multi-turn investigation loop to a single synthesis call when the new PR is similar to one you’ve already reviewed. It’s provider-agnostic. The mechanism is just “inject a prescriptive hint into the first user message,” which works the same way on GPT, Claude, and any model that follows instructions.

How it works

The middleware ties a trajectory store to your agent loop:
  1. Every successful PR review is persisted into a store (PRReviewETraceStore — local JSON for prototypes, CodebaseMemory for production).
  2. On a new PR, the middleware embeds the PR’s title + description + diff signature and retrieves the top-K similar past reviews above a cosine threshold.
  3. If a hit clears the gate, the middleware injects the past review as a prescriptive hint before the first model call — “verify the diff matches, emit this JSON, don’t open files.”
  4. The agent recognizes the pattern and emits the review in a single synthesis call.
  5. On a cache miss (cold corpus, low similarity), the middleware is a no-op — zero penalty.
The phrasing of the hint matters: soft hints (“similar PRs found bugs in X”) get ignored; prescriptive hints (“emit this JSON directly, don’t call tools”) consistently trigger single-turn behavior. The validated phrasing lives in render_skeleton_hint.

Quickstart

from reasonblocks import for_code_review, PRReviewETraceStore
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
# or: from langchain_anthropic import ChatAnthropic

store = PRReviewETraceStore(save_path="./.rb/pr_etrace.json")

agent = create_agent(
    model=ChatOpenAI(model="gpt-5.2"),       # or any LangChain-compatible model
    tools=[bash, search_codebase, read_file],
    middleware=for_code_review(etrace_store=store),
)

# After each successful review, persist the trajectory so future similar
# PRs can hit the cache.
store.store(
    pr_signature=current_pr_diff,
    review=review_json,
    files_inspected=files_read_during_run,
)

Three modes

Pass etrace_mode=... on for_code_review(...) to control how prescriptive the injected hint is:
modehint shapewhen to use
skeleton (default)Full past review JSON + “verify and emit, don’t investigate”Best on exact-or-near-exact matches. Most prescriptive.
patternJust the bug-pattern tags (e.g. "silent_except", "math_edge_case") from past reviewsSafer on partial matches. Agent verifies each pattern against the current diff. Smaller per-hit savings but lower regression risk.
adaptA cheap-model (gpt-4o-mini) adapts the past review to the new PR before injectionTwo-stage; costs ~$0.001 per hit. Use when retrieved trajectories are semantically similar but textually different.
middleware = for_code_review(etrace_store=store, etrace_mode="adapt")

Why the lever holds across providers

E-trace doesn’t shrink per-call tokens — it eliminates calls. The percentage saving from going N calls → 1 call is roughly the same regardless of which model is doing those calls. The dollar saving scales with how expensive your model is, not which provider it’s from. This makes E-trace compose cleanly with provider caching, which targets a disjoint pool (per-call input):
leverwhat it cutstypical impact
Provider prompt cachingper-call input tokensvaries by provider, see Prompt caching
E-trace skeleton injectionnumber of LLM calls per reviewdepends on cache hit rate against your past-review corpus
On Claude with both enabled, the compound effect on cache-hit PRs is meaningfully larger than either lever alone:
agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5"),
    tools=[...],
    middleware=for_code_review(
        prompt_caching=True,
        etrace_store=store,
    ),
)

Expected effective savings

Real-world impact is cache_hit_rate × per-hit-saving. Both vary by workload:
  • Bot-generated PRs (Dependabot / Renovate / similar template-heavy bots) — high repeat structure, high hit rate, big effective savings.
  • Mature codebase with recurring patterns — moderate hit rate; lever pays off as the corpus grows.
  • Brand-new repo / fully diverse PRs — low hit rate; lever is a no-op, no penalty.
Measure on your own workload before sizing the headline. The store grows with usage, so cold-start savings are 0% and ramp up as the corpus matures.

When NOT to use it

  • No history yet — empty store, no retrieval hits, middleware is a no-op. Safe to leave on while you accumulate data.
  • Adversarial / security-critical reviews — you probably want the agent to do fresh investigation rather than rely on prior conclusions.
  • PRs that are structurally one-of-a-kind — every retrieval misses the gate, middleware never fires.

What pairs with it

Two complementary levers in the SDK that target disjoint cost pools:
  • Prompt caching — cuts per-call input cost on the turns that DO happen.
  • Code-review mode — the validated SWE-bench Pro D-arm monitor + tool-saving stack. E-trace stacks on top of it.

See also