Skip to main content

Documentation Index

Fetch the complete documentation index at: https://reasonblocks.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

TokenSavingMiddleware reduces context window usage through two independent mechanisms. The first compresses old tool observations using head+tail truncation, keeping the most recent tool messages intact so the agent retains full visibility into its current step. The second injects a single nudge message when the agent appears stuck — high loop or hedging signals after many model calls — telling it to submit its best answer rather than continuing. Both levers are on by default and independently toggleable. The middleware never breaks the agent loop: any internal failure is logged and swallowed.

Simple setup

Use TokenSavingMiddleware on its own when you don’t need E-trace injection or FSM steering.
from langchain.agents import create_agent
from reasonblocks.token_saving import TokenSavingMiddleware, default_suite_signals

agent = create_agent(
    model="anthropic:claude-sonnet-4-20250514",
    tools=[...],
    system_prompt="...",
    middleware=[
        TokenSavingMiddleware(
            compress_threshold_chars=1800,   # compress tool outputs longer than this
            keep_recent_tool_messages=2,     # leave the last N tool messages uncompressed
            enable_early_exit=True,
            signals_fn=default_suite_signals,
        ),
    ],
)

Tool-output compression

Old ToolMessage bodies in the message history are compressed once they exceed compress_threshold_chars. The middleware keeps the first head_keep_chars and last tail_keep_chars characters, replaces the middle with an omission marker, and emits the replaced messages via LangGraph’s add_messages reducer (same-ID replacement, not append). The most recent keep_recent_tool_messages tool messages are always left untouched.
TokenSavingMiddleware(
    compress_threshold_chars=1800,   # default: 1800 chars
    head_keep_chars=900,             # default: 900 chars — keep this much from the start
    tail_keep_chars=700,             # default: 700 chars — keep this much from the end
    keep_recent_tool_messages=2,     # default: 2 — exempt these from compression
    enable_compression=True,         # default: True
)
You can also call compress_tool_output as a standalone utility outside the middleware:
from reasonblocks.token_saving import compress_tool_output

raw = some_tool.run(args)
compressed = compress_tool_output(
    raw,
    threshold_chars=1800,
    head_chars=900,
    tail_chars=700,
)

Early-exit nudge

When the agent has made at least early_exit_min_call_index model calls (default 40), TokenSavingMiddleware evaluates the trajectory using signals_fn. If the signals indicate the agent is stuck, it injects a HumanMessage telling the agent to submit its current best answer. The built-in default_suite_signals runs the ReasonBlocks 6-monitor suite and returns per-monitor scores. The early-exit fires when:
  • streak > 0.7 (repeated identical tool calls), OR
  • hedge > 0.6 AND diversity > 0.5 (hedging with low action diversity)
from reasonblocks.token_saving import TokenSavingMiddleware, default_suite_signals

TokenSavingMiddleware(
    early_exit_min_call_index=40,           # default: wait at least 40 model calls
    enable_early_exit=True,                 # default: True
    signals_fn=default_suite_signals,       # pass your own function to customize detection
)
The injected message reads:
You appear to be stuck in a loop. Stop investigating and submit your current best answer now using whatever submission tool your task expects. Do not start another investigation.
You can override this text:
TokenSavingMiddleware(
    signals_fn=default_suite_signals,
    early_exit_text="You've been running too long. Submit your answer using the submit_answer tool.",
)

Monitor effectiveness with TokenSavingStats

TokenSavingMiddleware exposes a stats attribute with running counters. Read it after a run to see how much compression occurred:
ts = TokenSavingMiddleware(signals_fn=default_suite_signals)
agent = create_agent(..., middleware=[rb.middleware(), ts])
result = agent.invoke(...)

print(ts.stats.compressions)           # tool messages compressed
print(ts.stats.chars_saved)            # total characters removed by head+tail compression
print(ts.stats.early_exits)            # early-exit nudges injected
print(ts.stats.replacements_emitted)   # list of per-step replacement counts

Advanced: perplexity-based compression

For long-running agents where head+tail compression isn’t enough, TokenSavingMiddleware supports word-level keep/drop compression on stale messages using an LLM classifier (LLMLingua-2 style, prompt-only). This compresses both ToolMessage and AIMessage content proportionally to how old the message is.
Enable perplexity compression by providing a perplexity_classifier. The easiest option is make_anthropic_classifier, which uses a small Anthropic model (Haiku by default) to decide which words to keep.
import anthropic
from reasonblocks.token_saving import (
    TokenSavingMiddleware,
    make_anthropic_classifier,
    default_suite_signals,
)

client = anthropic.Anthropic()

classifier = make_anthropic_classifier(
    client,
    model="claude-haiku-4-5-20251001",   # small model — runs fast and cheap
    target_keep_ratio=0.5,               # aim to keep 50% of words overall
)

ts = TokenSavingMiddleware(
    signals_fn=default_suite_signals,
    enable_perplexity_compression=True,
    perplexity_classifier=classifier,
)
Perplexity compression calls an LLM classifier for each stale message window. On very long trajectories, this adds latency and cost proportional to the number of stale messages. The cache mitigates this on repeat calls, but plan for the extra overhead when first enabling it.

Use ReasonBlocksConfig for full control

If you want to manage all middleware from one place, use ReasonBlocksConfig and build_middleware. It assembles the full middleware stack in the correct order and exposes all TokenSavingMiddleware parameters as config fields.
from reasonblocks import ReasonBlocks, ReasonBlocksAPI
from reasonblocks.config import ReasonBlocksConfig, build_middleware

rb = ReasonBlocks(api_key="rb_live_...")
api = ReasonBlocksAPI(api_key="rb_live_...")

config = ReasonBlocksConfig(
    enable_token_saving=True,
    ts_compress_threshold_chars=1800,
    ts_keep_recent_tool_messages=2,
    ts_enable_early_exit=True,
    ts_enable_perplexity_compression=False,  # opt-in separately
)

# build_middleware requires a score_fn, fsm, and state_manager when
# any of E1/E2/E3/monitor_steering are enabled. For the simplest case,
# use rb.middleware() directly — ReasonBlocks assembles these for you.
middleware = [rb.middleware(agent_name="bugfixer")]

agent = create_agent(model=..., tools=..., middleware=middleware)