Skip to main content

Documentation Index

Fetch the complete documentation index at: https://reasonblocks.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

ReasonBlocksConfig is a single dataclass that controls every ReasonBlocks capability. Instead of wiring individual middleware classes by hand, you fill in a config object and pass it to build_middleware(), which assembles the correct middleware stack in the right order automatically.

Complete example

from reasonblocks import ReasonBlocksAPI, ReasonBlocksConfig, build_middleware

api = ReasonBlocksAPI(api_key="rb_live_...")

config = ReasonBlocksConfig(
    # Tier toggles
    enable_e1=True,
    enable_e2=True,
    enable_e3=True,
    enable_monitor_steering=True,
    enable_general_monitor=False,
    enable_token_saving=True,
    enable_distillation=True,

    # Token-saving levers
    ts_compress_threshold_chars=1800,
    ts_keep_recent_tool_messages=2,
    ts_enable_early_exit=True,

    # General monitor levers
    gm_max_tool_calls=30,
    gm_cooldown=8,

    # Routing / scoping
    customer_id="acme",
    monitor_task_profile="coding",
)

middleware = build_middleware(
    config, api,
    score_fn=score_fn,
    fsm=fsm,
    state_manager=state_manager,
)

agent = create_agent(model=..., tools=..., middleware=middleware)
build_middleware() assembles the list in the correct order: ReasonBlocksMiddleware first, then GeneralMonitorMiddleware (if enabled), then TokenSavingMiddleware last so it compresses what earlier middleware injected before the LLM call goes out.

Tier toggles

These fields turn each major capability on or off.
enable_e1
bool
default:"True"
Enable customer-scoped instance-level pattern injection. E1 retrieves the top-1 pattern from your customer’s pattern scope. Disable to stop E1 injections entirely without affecting E2, E3, or monitors.
enable_e2
bool
default:"True"
Enable pattern-level injection keyed on the monitor state description. E2 retrieves patterns from the shared pattern store that match the current failure family detected by monitors.
enable_e3
bool
default:"True"
Enable universal standing-rule injection. E3 fires on the first agent call of every run (scroll-all) to establish baseline guidance. Subsequent calls skip E3 unless the state changes.
enable_monitor_steering
bool
default:"True"
Enable server-side trajectory monitor evaluation. When fired, monitor results steer which E-trace tier retrieves and what gets injected into the system prompt.
enable_general_monitor
bool
default:"False"
Enable the v1 rule-firing general monitor pack. This opt-in pack runs five rule-based detectors (repeated attempts, error without diagnosis, exploration sprawl, idle response, low-novelty tail) with a per-rule cooldown. When enabled, GeneralMonitorMiddleware is inserted between ReasonBlocksMiddleware and TokenSavingMiddleware.
enable_token_saving
bool
default:"True"
Enable tool-output compression and the early-exit nudge. TokenSavingMiddleware is appended last in the stack so it compresses whatever earlier middleware has injected before the LLM call goes out.
enable_distillation
bool
default:"True"
Enable trace distillation. When True, the SDK submits completed session traces to the ReasonBlocks API at session end so new E1 patterns can be mined for future runs. Set to False to prevent trace data from this customer’s runs from being ingested — useful for staging environments or data-residency requirements.

Token-saving levers

These fields control how TokenSavingMiddleware compresses the message history and when it injects the early-exit nudge.
ts_compress_threshold_chars
int
default:"1800"
Character length above which a ToolMessage body is eligible for head+tail truncation. Tool messages shorter than this threshold are left untouched.
ts_head_keep_chars
int
default:"900"
Number of characters to keep from the beginning of a compressed tool message.
ts_tail_keep_chars
int
default:"700"
Number of characters to keep from the end of a compressed tool message. Combined with ts_head_keep_chars, the total kept content is at most 900 + 700 = 1600 chars plus an omission marker.
ts_keep_recent_tool_messages
int
default:"2"
Number of the most recent ToolMessage instances to leave uncompressed. The agent keeps full visibility into the step it is actively reasoning about.
ts_enable_compression
bool
default:"True"
Master toggle for tool-output head+tail compression. Set to False to disable compression while keeping the early-exit nudge active.
ts_enable_early_exit
bool
default:"True"
Enable the early-exit nudge. Once the agent has made at least ts_early_exit_min_call_index model calls, the middleware checks for loop-like signals (streak, hedge, diversity) and injects a HumanMessage telling the agent to stop and submit its current best answer.
ts_early_exit_min_call_index
int
default:"40"
Minimum number of model calls before the early-exit nudge can fire. Prevents premature exits on short runs.

Perplexity compression

Perplexity compression applies word-level keep/drop decisions to stale messages, going further than head+tail truncation. It is off by default and requires you to supply a WordClassifier callable.
ts_enable_perplexity_compression
bool
default:"False"
Enable LLMLingua-2-style word-level compression on stale messages. Requires ts_perplexity_classifier to be set. When the classifier is None, perplexity compression is silently skipped even if this flag is True.
ts_perplexity_classifier
WordClassifier | None
default:"None"
A WordClassifier callable with signature (list[str]) -> list[bool]. Each True in the output means the corresponding word is kept. Use reasonblocks.token_saving.make_anthropic_classifier to create a production classifier backed by Claude Haiku.
import anthropic
from reasonblocks.token_saving import make_anthropic_classifier

classifier = make_anthropic_classifier(
    client=anthropic.Anthropic(),
    model="claude-haiku-4-5-20251001",
    target_keep_ratio=0.5,
)

config = ReasonBlocksConfig(
    ts_enable_perplexity_compression=True,
    ts_perplexity_classifier=classifier,
)
ts_perplexity_recent_cutoff
int
default:"3"
Messages within the last N model calls keep full fidelity — perplexity compression does not touch them. Calls 0 through N-1 back are in the “recent” tier.
ts_perplexity_mid_cutoff
int
default:"10"
Messages between ts_perplexity_recent_cutoff and this cutoff use the mid keep ratio. Messages older than this cutoff use the old keep ratio.
ts_perplexity_keep_ratio_mid
float
default:"0.55"
Target word-keep ratio for messages in the mid tier. A value of 0.55 means roughly 55% of words in mid-tier messages are retained.
ts_perplexity_keep_ratio_old
float
default:"0.30"
Target word-keep ratio for old-tier messages (older than ts_perplexity_mid_cutoff calls). More aggressive than the mid ratio.
ts_perplexity_window_words
int
default:"50"
Number of words per classifier window. Smaller windows produce finer-grained decisions at the cost of more classifier API calls. Larger windows are coarser but cheaper.

General monitor levers

These fields apply only when enable_general_monitor=True.
gm_max_tool_calls
int
default:"30"
Total tool-call budget for the general monitor. Several detectors (exploration sprawl, idle response, low-novelty tail) use this as the reference limit for their proportional thresholds.
gm_cooldown
int
default:"8"
Minimum number of model calls between firings of the same rule. Prevents repeated injections from the same detector on consecutive steps.

Routing and scoping

customer_id
str | None
default:"None"
Customer identifier used by E1 retrieval to narrow pattern search to this customer’s scope. When None, E1 retrieves from the global pattern store. Set this to the same customer identifier you use in your user database — for example, your tenant ID or organization slug.
monitor_task_profile
str
default:"\"coding\""
Server-side monitor weight profile. Built-in values are "coding" (default), "pr_review", and "qa". Each profile shifts which monitors are weighted most heavily. See monitor weight profiles for the exact weight differences between profiles.
monitor_weights
dict[str, float] | None
default:"None"
Optional per-monitor weight override applied on top of the profile. Partial dicts are accepted — monitors not listed fall through to the profile, then to server defaults. Unknown monitor names are dropped server-side, and negative weights are clamped to 0.
# Boost loop-detection sensitivity and reduce hedging weight
config = ReasonBlocksConfig(
    monitor_task_profile="coding",
    monitor_weights={"streak": 0.5, "hedge": 0.05},
)