ReasonBlocks is the main entry point for the SDK. Create one instance per process, passing your API key and any global configuration, then call middleware() or openai_hooks() once per agent run to get a fresh, run-scoped object.
Constructor
Your ReasonBlocks API key. Production keys start with
rb_live_; test keys with rb_test_. When you use a per-customer live key, the server overrides org_id and project_id with the key’s authoritative scope automatically.Override the API base URL. Set this when pointing the SDK at a self-hosted deployment or a staging environment. When omitted, the SDK connects to the ReasonBlocks hosted API.
Soft cap on tokens tracked by the trace state manager. Stored on
TraceState.token_budget and used by downstream consumers that surface budget pressure. Leave unset to run without a cap.Override the difficulty FSM’s transition thresholds. Forwarded as kwargs to
DifficultyFSM; recognized keys include fast_threshold, slow_threshold, skip_threshold, hysteresis_margin, fast_window, slow_window, skip_window. Leave unset to use the defaults.Map FSM state names to model identifiers, e.g.
{"FAST": "anthropic:claude-haiku-4-5-20251001", "SLOW": "anthropic:claude-sonnet-4-6"}. The middleware swaps the model on the next call when the FSM enters that state.Whether to retrieve and inject E-trace patterns from the server. Set to
False to skip the retrieval round-trip — useful in local development or CI environments without rb-api access.Accepted for backward compatibility but currently a no-op on the middleware path — the monitor suite runs server-side. Custom local monitors require a custom injection.
Whether to stream
run_start, per-step, and run_finish events to rb-api as the agent runs. Disable for fully offline or air-gapped deployments. When False, no MonitorClient or StreamingEmitter is constructed and the dashboard will not see this run.Selects the monitor weight vector on the server. Built-in profiles are
"coding" (the default), "pr_review" (weights semantic_loop highest for read-only review agents), and "qa" (weights claim_contradiction and silent_topic_drift highest for question-answering agents). See Monitor profiles for the per-profile weight tables. The profile is stamped on every run_start event and persisted on the run row.middleware()
Creates a fresh ReasonBlocksMiddleware instance scoped to a single agent run. Call this once per run, not once per step.
Unique identifier for this run. When omitted, the SDK generates one from the trace state manager. Provide your own when you want to correlate the dashboard row with an external ID (CI job, PR number, etc.).
Human-readable name for the agent. Displayed in the dashboard run list.
Short description of what this run is trying to accomplish. Stored on the run row.
Agent framework identifier. Stored for diagnostic filtering.
Model identifier used by the agent. Stored on the run row.
Associates this run with a specific codebase in the findings store. Use the same value you pass to
CodebaseMemory.Multi-tenant organization scope. When the api_key is a per-customer live key, rb-api overrides this with the key’s authoritative org.
Multi-tenant project scope within
org_id. Same override behavior as org_id.Free-form key–value tags attached to the run row and surfaced as JSON in the dashboard.
Returns
A LangChain
AgentMiddleware subclass. Also a context manager (with rb.middleware(...) as mw:) that emits run_finish and closes the streaming emitter on exit. Inside an __exit__ an exception sets the outcome to failure: <ExceptionName>; a clean exit defaults to success. Call mw.mark_failure(reason="...") before the context exits to override the default. Call mw.close(timeout=5.0) to drain the emitter explicitly outside a with block.ab_middleware()
Builds a run-scoped middleware for an A/B evaluation. Flips a deterministic coin (assign_arm) and returns either the full pipeline (on) or a vanilla passthrough control (off). Both arms force live telemetry on and stamp experiment_id / arm / assignment_unit / rb_version onto the run. See the A/B evaluation guide for the end-to-end flow and how to read the report.
Identifier shared by every run in the experiment. Stamped on the run and used as the report key (
GET /v1/monitor/experiments/{experiment_id}/report). Immutable on the run once set.Randomization unit. Hashed with
experiment_id to pick the arm, so the same unit always lands in the same arm. Pass a stable task/ticket id if you want retries to stay in their arm; defaults to the run id (each attempt drawn independently).Probability of the
on arm, in [0, 1]. Also passed to the report’s SRM check as the expected split.middleware() (run_id, agent_name, task, framework, model, codebase_id, org_id, project_id, metadata). Experiment tags are stamped last, so a caller’s metadata cannot override the arm.
Returns
A run-scoped middleware, same type and lifecycle as
middleware(). For the off arm it runs in passthrough mode — scoring and telemetry only, with the model request left untouched. The chosen arm is on mw._run_metadata["arm"].claude_messages_session()
Builds a SteeringSession wired for the Anthropic Messages API loop. Pair with run_messages_agent_loop(..., session=session) to run the full pipeline at every turn.
middleware() (run_id, agent_name, task, framework, model, codebase_id, org_id, project_id, metadata). framework defaults to "claude-messages".
Returns
SteeringSession
A
SteeringSession instance — the framework-agnostic core. See the SteeringSession reference for its full API. Sync + async context manager; emits run_start on enter, run_finish on exit (with the same outcome semantics as middleware()).openai_model()
Wraps an openai-agents Model so the steering pipeline runs before each get_response call. This is the parity path with middleware() for the OpenAI Agents SDK — FSM scoring, server-side monitor evaluation, E1 / E2 / E3 injection, and (with a model_factory) model routing all run.
Any object satisfying the
openai-agents Model protocol — typically OpenAIChatCompletionsModel or OpenAIResponsesModel.Optional callable that builds alternate
Model instances on demand when model_routing on the client maps the current FSM state to a different identifier. Without a factory, routing overrides log on the step entry but the wrapped default model is still used.middleware() (run_id, agent_name, task, framework, model, codebase_id, org_id, project_id, metadata). framework defaults to "openai-agents".
Returns
ReasonBlocksModel
A Model-compatible wrapper that is also a sync + async context manager. Use as the
model field on an Agent. The underlying SteeringSession is exposed via wrapped.session for inspecting step_log / calling mark_failure.Streaming via
stream_response is currently a pass-through to the wrapped model. Non-streaming get_response calls run the full pipeline.claude_agent_telemetry()
Builds a telemetry-only adapter for the claude-agent-sdk query() stream. The Claude Agent SDK runs the agent loop inside the Claude Code CLI process, so the steering pipeline cannot inject — this adapter parses the message stream for tool_use / tool_result blocks and emits run_start / per-tool step / run_finish events to the dashboard.
middleware(). framework defaults to "claude-agent-sdk".
Returns
ClaudeAgentTelemetry
Sync + async context manager exposing
wrap(async_iter) to drive an async query stream, plus mark_failure(reason=...) / close(timeout=...). No FSM, no monitor evaluation, no E-trace retrieval — purely an observer.openai_hooks()
Builds a RunHooks adapter for the openai-agents SDK. Telemetry-only — emits the same run_start / step / run_finish events as the steering integrations, but does not run scoring, monitors, or injection. Use this when you want dashboard rows but don’t want a Model wrapper sitting in front of model calls; use openai_model() for the full pipeline.
If you adopt
openai_model, you don’t also need openai_hooks — the wrapper emits the same telemetry from inside the steering pipeline.openai_hooks() accepts the same keyword parameters as middleware(). The only behavioral difference is that framework defaults to "openai-agents" instead of "langchain".Returns
A
RunHooks-compatible object that is also a sync + async context manager. Exposes mark_failure(reason=...) and close(timeout=...) with the same semantics as the LangChain middleware. The hooks emit run_start on on_agent_start, a per-tool step event on on_tool_end, and run_finish on on_agent_end (or via __exit__ when an exception escapes).score_step()
@staticmethod. Heuristic difficulty scorer for a single step’s text. The middleware uses this internally; you can call it directly for debugging or testing.
The agent step text to score. An empty string returns
0.5.A value between 0 and 1. Higher scores mean the step looks harder — more hedging, longer text, more error language, more entities.
raw=0.5:
| Signal | Weight | Description |
|---|---|---|
| Hedging density | 0.30 | Frequency of uncertainty words such as "maybe", "not sure", "hmm", "reconsider" |
| Length | 0.25 | Word count, capped at 500 |
| Error language | 0.25 | Frequency of words like "error", "exception", "traceback", "failed" |
| Entity density | 0.20 | Count of file paths, dotted identifiers, and quoted strings (capped at 10) |
1 / (1 + exp(-6 * (raw - 0.5))) to keep the output in a usable range.
