Skip to main content
This page describes the steering pipeline that rb.middleware() attaches to a LangChain 1.0 agent. The OpenAI Agents and Claude Agent SDK integrations share the codebase memory toolset and (where applicable) telemetry, but the FSM scoring, monitor evaluation, E-trace injection, and model routing described below currently live in the LangChain middleware path. The middleware runs four lifecycle hooks. On every step it scores the agent’s last thought, advances the difficulty FSM, asks rb-api whether the trajectory needs steering, retrieves up to three tiers of E-traces, and assembles any guidance into the system message before the model call. Your agent sees one system message in and one response out. The middleware does its work in the gap.

The four hook points

HookWhenWhat it does
before_agentOnce, at run startEmits the run_start telemetry event.
before_modelBefore every LLM callScores the last thought, advances the FSM, evaluates monitors via rb-api, queues E-trace injections.
wrap_model_callAround every LLM callApplies model routing, renders queued injections into the system message, calls the handler, records tokens and latency.
after_agentOnce, at clean returnEmits the run_finish telemetry event. Exception paths emit it from __exit__.

Step-by-step lifecycle

1

run_start

before_agent fires once and emits a run_start event with run_id, agent_name, task, framework, model, codebase_id, org_id, project_id, task_profile, and any caller-provided metadata. Telemetry is fire-and-forget — the streaming worker is a daemon thread.
2

First call — E3 only

On the very first agent step there is no prior thought to score, so the FSM stays at INIT. Only E3Injection runs (universal standing rules, scroll-all, up to 32 patterns). It fires exactly once per run and its result is queued for the upcoming model call.
3

Score the thought

From step two onward, before_model extracts the last AIMessage content and calls ReasonBlocks.score_step. The heuristic combines hedging density, response length, error language, and entity density into a float in [0, 1], squashed through a sigmoid. This score is heuristic-only — it will be replaced by a Pilot model later.
4

Advance the FSM

The score is appended to the difficulty history and fed to DifficultyFSM.transition. The FSM moves between NORMAL, FAST, SLOW, and SKIP based on whether the recent window is consistently low, high, or extreme. See FSM states for the transition rules.
5

Evaluate monitors server-side

MonitorSteeringInjection.retrieve posts the current trace to rb-api’s POST /monitors/evaluate. The server runs the trajectory monitor suite under the configured task_profile, applies the two-path gate, optionally falls back to an E2 lookup, and returns {inject, intervention_text, fired, scores, composite, failure_type, intervention_source}. When inject is true and the text is new, the middleware queues the rendered intervention text as a PendingInjection.Per-run guardrails are enforced client-side after the server’s decision: a hard cap of 5 monitor injections per run, plus state-dependent cooldowns (every 2 steps in SLOW/SKIP, every 3 in NORMAL, every 5 in FAST).
6

Gate E1 retrieval

Before E1Injection queries the pattern store, a client-side gate checks the recent monitor history:
allow E1 if:
  current_eval.fired is non-empty
  OR current_eval.composite > 0.15
  OR any monitor fired on either of the previous 2 steps
If no monitor has ever run, E1 is allowed unconditionally for backward compatibility.
7

Retrieve E-traces

With the gate decided, the middleware runs the remaining injections in order:
  • E1Injection — instance-level pattern, customer-scoped, top_k=1. Capped at max_calls=1 per run.
  • E2Injection — pattern-level guidance, top_k=2, narrowed by failure_type if the monitor evaluation produced one. Capped at max_calls=1 per run.
  • E3Injection — already fired on step 0; subsequent calls skip it.
In FAST state the entire E-trace pipeline is bypassed (no Qdrant query, no embeddings). Monitors still run because loop detection matters most when the agent is moving quickly and might miss that it’s looping.
8

Route the model

wrap_model_call checks the current FSM state against model_routing. If a mapping exists, the request is overridden via request.override(model=...) before the handler is called. Routing happens before injection rendering so the format-routing layer sees the final post-routing model id.
9

Render injections into the system message

Pending injections are rendered (text passes through; (tier, fields) go through render_pattern(tier, model_id, fields)) and joined with blank lines. The base system prompt is wrapped in a content block with cache_control: {"type": "ephemeral"} so Anthropic’s prompt cache hits across steps. The rendered guidance rides as a separate uncached block prefixed with [REASONBLOCKS] so its varying content does not bust the base cache key.cache_control is ignored on non-Anthropic providers, so this is safe under OpenAI or Gemini models.
10

Call the model and record telemetry

The handler runs. After the response arrives, the middleware extracts the AIMessage, reads usage_metadata for token counts, captures tool-call names, and stamps the StepLogEntry with model_id, tokens, latency_ms, injections, intervention_texts, monitors_fired, and failure_type. If live_streaming_enabled is True, a step event is emitted to rb-api.
11

run_finish

after_agent fires on a clean return and emits run_finish with outcome "success" (or whatever was set via mark_failure). When the middleware is used as a context manager, __exit__ records exception types in the outcome string and calls close() to drain the streaming worker.

What the agent sees

The agent’s message history is never modified. Steering appears only as an additional content block inside the system message, formatted as [REASONBLOCKS]\n<rendered guidance>. The agent’s conversation turns stay clean.
The [REASONBLOCKS] block is rebuilt fresh on every step and contains only the injections relevant to that step. There is no cumulative accumulation across turns.

Local vs server-side monitors

The SDK ships two monitor surfaces; do not conflate them.
  • Server-side monitor evaluationPOST /monitors/evaluate — drives the steering intervention you see appended to the system message. This is the path described above.
  • Local monitor suite in reasonblocks/monitors/suite.py — six pure-Python heuristics with weighted aggregation. Used by TokenSavingMiddleware for early-exit decisions and by SQLiteMonitorSink for offline trace inspection. It does not drive the steering injection.
See Monitors for the full breakdown of both surfaces.