rb.middleware() attaches to a LangChain 1.0 agent. The OpenAI Agents and Claude Agent SDK integrations share the codebase memory toolset and (where applicable) telemetry, but the FSM scoring, monitor evaluation, E-trace injection, and model routing described below currently live in the LangChain middleware path.
The middleware runs four lifecycle hooks. On every step it scores the agent’s last thought, advances the difficulty FSM, asks rb-api whether the trajectory needs steering, retrieves up to three tiers of E-traces, and assembles any guidance into the system message before the model call. Your agent sees one system message in and one response out. The middleware does its work in the gap.
The four hook points
| Hook | When | What it does |
|---|---|---|
before_agent | Once, at run start | Emits the run_start telemetry event. |
before_model | Before every LLM call | Scores the last thought, advances the FSM, evaluates monitors via rb-api, queues E-trace injections. |
wrap_model_call | Around every LLM call | Applies model routing, renders queued injections into the system message, calls the handler, records tokens and latency. |
after_agent | Once, at clean return | Emits the run_finish telemetry event. Exception paths emit it from __exit__. |
Step-by-step lifecycle
run_start
before_agent fires once and emits a run_start event with run_id, agent_name, task, framework, model, codebase_id, org_id, project_id, task_profile, and any caller-provided metadata. Telemetry is fire-and-forget — the streaming worker is a daemon thread.First call — E3 only
On the very first agent step there is no prior thought to score, so the FSM stays at
INIT. Only E3Injection runs (universal standing rules, scroll-all, up to 32 patterns). It fires exactly once per run and its result is queued for the upcoming model call.Score the thought
From step two onward,
before_model extracts the last AIMessage content and calls ReasonBlocks.score_step. The heuristic combines hedging density, response length, error language, and entity density into a float in [0, 1], squashed through a sigmoid. This score is heuristic-only — it will be replaced by a Pilot model later.Advance the FSM
The score is appended to the difficulty history and fed to
DifficultyFSM.transition. The FSM moves between NORMAL, FAST, SLOW, and SKIP based on whether the recent window is consistently low, high, or extreme. See FSM states for the transition rules.Evaluate monitors server-side
MonitorSteeringInjection.retrieve posts the current trace to rb-api’s POST /monitors/evaluate. The server runs the trajectory monitor suite under the configured task_profile, applies the two-path gate, optionally falls back to an E2 lookup, and returns {inject, intervention_text, fired, scores, composite, failure_type, intervention_source}. When inject is true and the text is new, the middleware queues the rendered intervention text as a PendingInjection.Per-run guardrails are enforced client-side after the server’s decision: a hard cap of 5 monitor injections per run, plus state-dependent cooldowns (every 2 steps in SLOW/SKIP, every 3 in NORMAL, every 5 in FAST).Gate E1 retrieval
Before If no monitor has ever run, E1 is allowed unconditionally for backward compatibility.
E1Injection queries the pattern store, a client-side gate checks the recent monitor history:Retrieve E-traces
With the gate decided, the middleware runs the remaining injections in order:
E1Injection— instance-level pattern, customer-scoped,top_k=1. Capped atmax_calls=1per run.E2Injection— pattern-level guidance,top_k=2, narrowed byfailure_typeif the monitor evaluation produced one. Capped atmax_calls=1per run.E3Injection— already fired on step 0; subsequent calls skip it.
FAST state the entire E-trace pipeline is bypassed (no Qdrant query, no embeddings). Monitors still run because loop detection matters most when the agent is moving quickly and might miss that it’s looping.Route the model
wrap_model_call checks the current FSM state against model_routing. If a mapping exists, the request is overridden via request.override(model=...) before the handler is called. Routing happens before injection rendering so the format-routing layer sees the final post-routing model id.Render injections into the system message
Pending injections are rendered (text passes through;
(tier, fields) go through render_pattern(tier, model_id, fields)) and joined with blank lines. The base system prompt is wrapped in a content block with cache_control: {"type": "ephemeral"} so Anthropic’s prompt cache hits across steps. The rendered guidance rides as a separate uncached block prefixed with [REASONBLOCKS] so its varying content does not bust the base cache key.cache_control is ignored on non-Anthropic providers, so this is safe under OpenAI or Gemini models.Call the model and record telemetry
The handler runs. After the response arrives, the middleware extracts the
AIMessage, reads usage_metadata for token counts, captures tool-call names, and stamps the StepLogEntry with model_id, tokens, latency_ms, injections, intervention_texts, monitors_fired, and failure_type. If live_streaming_enabled is True, a step event is emitted to rb-api.What the agent sees
The agent’s message history is never modified. Steering appears only as an additional content block inside the system message, formatted as[REASONBLOCKS]\n<rendered guidance>. The agent’s conversation turns stay clean.
The
[REASONBLOCKS] block is rebuilt fresh on every step and contains only the injections relevant to that step. There is no cumulative accumulation across turns.Local vs server-side monitors
The SDK ships two monitor surfaces; do not conflate them.- Server-side monitor evaluation —
POST /monitors/evaluate— drives the steering intervention you see appended to the system message. This is the path described above. - Local monitor suite in
reasonblocks/monitors/suite.py— six pure-Python heuristics with weighted aggregation. Used byTokenSavingMiddlewarefor early-exit decisions and bySQLiteMonitorSinkfor offline trace inspection. It does not drive the steering injection.

