Skip to main content

Documentation Index

Fetch the complete documentation index at: https://reasonblocks.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

TokenSavingMiddleware is an optional, domain-agnostic middleware that reduces token consumption in long-running agent trajectories. It provides two independent mechanisms: tool-output compression and early-exit nudging. Both levers are on by default and can be toggled independently. A third, opt-in mechanism — perplexity-based word-level compression — is available when you supply a classifier. Failures inside the middleware hook are logged and swallowed. The middleware never interrupts the agent loop.
TokenSavingMiddleware stacks alongside ReasonBlocksMiddleware rather than being embedded inside it. You can use either independently.
from reasonblocks import ReasonBlocks, TokenSavingMiddleware, default_suite_signals

rb = ReasonBlocks(api_key="rb_live_...")

agent = create_agent(
    model=...,
    tools=...,
    middleware=[
        rb.middleware(agent_name="reviewer", task="Review PR #42"),
        TokenSavingMiddleware(signals_fn=default_suite_signals),  # always last
    ],
)

Constructor

compress_threshold_chars
integer
default:"1800"
Minimum character length a ToolMessage body must reach before it is compressed. Messages shorter than this threshold are left unchanged.
head_keep_chars
integer
default:"900"
Number of characters to keep from the start of a tool output when compressing. The head tends to contain the most actionable content.
tail_keep_chars
integer
default:"700"
Number of characters to keep from the end of a tool output when compressing. The tail often contains closing context, error messages, or final values.
keep_recent_tool_messages
integer
default:"2"
Number of the most recent ToolMessage objects to exempt from compression. These are the messages the agent is actively reasoning about; compressing them would degrade step quality.
early_exit_min_call_index
integer
default:"40"
Minimum number of model calls that must have occurred before an early-exit nudge can be injected. This prevents the nudge from firing on short, healthy runs.
early_exit_text
string
default:"\"You appear to be stuck in a loop...\""
The text injected as a HumanMessage when an early-exit nudge fires. The default message instructs the agent to stop investigating and submit its current best answer. Override this to match your agent’s specific submission instructions.
signals_fn
callable
A function (steps: list[dict]) -> dict[str, float] that evaluates the agent’s trajectory and returns monitor scores keyed by signal name. The middleware checks the "streak", "hedge", and "diversity" keys to decide whether to fire the early-exit nudge. Pass default_suite_signals to use the built-in 6-monitor suite. When None, the early-exit lever is disabled even if enable_early_exit=True.
enable_compression
boolean
default:"True"
Whether to enable head+tail tool-output compression. Set to False to disable compression entirely while keeping the early-exit lever active.
enable_early_exit
boolean
default:"True"
Whether to enable the early-exit nudge. Set to False to disable the nudge entirely while keeping compression active.
enable_perplexity_compression
boolean
default:"False"
Whether to enable word-level perplexity-based compression. Off by default. Requires perplexity_classifier to be set; if perplexity_classifier is None and this is True, no perplexity compression occurs.
perplexity_classifier
callable
A WordClassifier callable — (words: list[str]) -> list[bool] — that returns a keep/drop decision for each word. Use make_anthropic_classifier() to build one backed by a small Anthropic model, or supply your own heuristic. Required when enable_perplexity_compression=True.
perplexity_recent_cutoff
integer
default:"3"
Messages from fewer than this many model calls ago are considered “recent” and are excluded from perplexity compression. Keeps the agent’s most active context at full fidelity.
perplexity_mid_cutoff
integer
default:"10"
Messages from between perplexity_recent_cutoff and this many calls ago are in the “mid” tier and compressed at perplexity_keep_ratio_mid. Messages older than this are in the “old” tier.
perplexity_keep_ratio_mid
number
default:"0.55"
Target fraction of words to keep in “mid” tier messages (3–9 model calls ago). 0.55 means the classifier aims to keep roughly 55% of words.
perplexity_keep_ratio_old
number
default:"0.30"
Target fraction of words to keep in “old” tier messages (10+ model calls ago). More aggressive than the mid tier.
perplexity_window_words
integer
default:"50"
The number of words per window passed to the classifier in a single call. Larger windows give the classifier more context but cost more tokens per call.

Stats attribute

Every TokenSavingMiddleware instance exposes a stats attribute of type TokenSavingStats that accumulates counters across all before_model calls.
mw = TokenSavingMiddleware(signals_fn=default_suite_signals)
# ... run the agent ...
print(mw.stats.compressions)       # number of head+tail compressions applied
print(mw.stats.chars_saved)        # total characters removed by head+tail compression
print(mw.stats.early_exits)        # number of early-exit nudges injected
print(mw.stats.perplexity_compressions)  # word-level compressions applied
print(mw.stats.perplexity_chars_saved)   # characters removed by word-level compression
print(mw.stats.perplexity_cache_hits)    # cached compression decisions reused

TokenSavingStats dataclass

TokenSavingStats is a plain dataclass. All fields default to 0.
compressions
integer
Running count of head+tail compressions applied to ToolMessage objects.
chars_saved
integer
Total characters removed across all head+tail compressions.
early_exits
integer
Number of times the early-exit nudge was injected into the message history.
perplexity_compressions
integer
Number of word-level perplexity compressions applied. Only increments when enable_perplexity_compression=True.
perplexity_chars_saved
integer
Total characters removed by word-level perplexity compression.
perplexity_cache_hits
integer
Number of times a cached compression decision was reused instead of calling the classifier again. Cache keys are (message_id, target_keep_ratio).

Standalone utilities

compress_tool_output()

Head+tail truncates a single tool output string when it exceeds a character threshold. Returns the content unchanged if it is within the threshold. You can call this directly when you want to compress a string outside of the middleware lifecycle.
from reasonblocks import compress_tool_output

compressed = compress_tool_output(
    long_output,
    threshold_chars=1800,
    head_chars=900,
    tail_chars=700,
)
content
string
required
The tool output string to compress.
threshold_chars
integer
default:"1800"
Character length above which compression is applied. Strings at or below this length are returned unchanged.
head_chars
integer
default:"900"
Characters to keep from the start of the string.
tail_chars
integer
default:"700"
Characters to keep from the end of the string.
return
string
The original string if it’s within the threshold, otherwise a head + omission notice + tail string of the form "{head}\n\n[... N chars truncated ...]\n\n{tail}".

make_anthropic_classifier()

Wraps an anthropic.Anthropic-compatible client as a WordClassifier for use with perplexity-based compression. The classifier asks a small Anthropic model to label each word keep or drop (LLMLingua-2 style, prompt-only — not true log-probability perplexity). Falls back to the built-in heuristic classifier on any failure (parse error, timeout, rate limit), so the middleware never breaks because of a classifier error.
import anthropic
from reasonblocks import TokenSavingMiddleware, make_anthropic_classifier

client = anthropic.Anthropic()
classifier = make_anthropic_classifier(
    client,
    model="claude-haiku-4-5-20251001",
    target_keep_ratio=0.5,
)

mw = TokenSavingMiddleware(
    enable_perplexity_compression=True,
    perplexity_classifier=classifier,
)
client
object
required
An anthropic.Anthropic-compatible client instance. Must expose a client.messages.create() method with the standard Anthropic Messages API signature.
model
string
default:"\"claude-haiku-4-5-20251001\""
The model used to classify words. A small, fast model such as Haiku is recommended to keep classification costs low.
target_keep_ratio
number
default:"0.5"
The fraction of words the classifier should aim to keep. This value is included in the system prompt so the model can calibrate its labeling. 0.5 means aim for roughly 50% retention.
return
WordClassifier
A WordClassifier callable with signature (words: list[str]) -> list[bool]. Pass this to TokenSavingMiddleware(perplexity_classifier=...).

default_suite_signals()

Runs the built-in 6-monitor suite over a list of agent steps and returns per-monitor scores as a dict. This is the default signals_fn for the early-exit lever.
from reasonblocks import default_suite_signals, build_steps_from_messages

steps = build_steps_from_messages(state["messages"])
signals = default_suite_signals(steps)
# {'streak': 0.0, 'hedge': 0.42, 'diversity': 0.15, ...}
steps
list[dict]
required
A list of step dicts in the format produced by build_steps_from_messages(). Each dict has keys: step_index, action, action_input, thought, observation, is_error.
return
dict[str, float]
A dict mapping monitor names to float scores. The early-exit lever checks "streak", "hedge", and "diversity" keys specifically.

build_steps_from_messages()

Converts a LangChain message history into the step dict format expected by default_suite_signals() and the monitor suite. Pairs each AIMessage’s tool calls with their matching ToolMessage objects via tool_call_id.
from reasonblocks import build_steps_from_messages

steps = build_steps_from_messages(state["messages"])
messages
list
required
A list of LangChain messages (AIMessage, ToolMessage, HumanMessage, etc.) representing the agent’s trajectory so far.
return
list[dict]
A list of step dicts, one per AIMessage (or one per tool call when an AIMessage has multiple tool calls). Each dict contains:

Full example

from reasonblocks import (
    ReasonBlocks,
    TokenSavingMiddleware,
    default_suite_signals,
)

rb = ReasonBlocks(api_key="rb_live_...")

mw = TokenSavingMiddleware(
    compress_threshold_chars=1800,
    keep_recent_tool_messages=2,
    early_exit_min_call_index=40,
    signals_fn=default_suite_signals,
)

agent = create_agent(
    model=...,
    tools=...,
    middleware=[rb.middleware(agent_name="reviewer", task="Review PR #5"), mw],
)

result = agent.invoke({"messages": [HumanMessage(content="Review PR #5")]})

print(f"Compressions: {mw.stats.compressions}")
print(f"Chars saved:  {mw.stats.chars_saved}")
print(f"Early exits:  {mw.stats.early_exits}")