TokenSavingMiddleware reference

TokenSavingMiddleware is an optional, domain-agnostic middleware that reduces token consumption in long-running agent trajectories. It provides two independent mechanisms: tool-output compression and early-exit nudging. Both levers are on by default and can be toggled independently. A third, opt-in mechanism — perplexity-based word-level compression — is available when you supply a classifier. Failures inside the middleware hook are logged and swallowed. The middleware never interrupts the agent loop.

TokenSavingMiddleware stacks alongside ReasonBlocksMiddleware rather than being embedded inside it. You can use either independently.

from reasonblocks import ReasonBlocks, TokenSavingMiddleware, default_suite_signals

rb = ReasonBlocks(api_key="rb_live_...")

agent = create_agent(
    model=...,
    tools=...,
    middleware=[
        rb.middleware(agent_name="reviewer", task="Review PR #42"),
        TokenSavingMiddleware(signals_fn=default_suite_signals),  # always last
    ],
)

Constructor

compress_threshold_chars

integer

default:"1800"

Minimum character length a ToolMessage body must reach before it is compressed. Messages shorter than this threshold are left unchanged.

head_keep_chars

integer

default:"900"

Number of characters to keep from the start of a tool output when compressing. The head tends to contain the most actionable content.

tail_keep_chars

integer

default:"700"

Number of characters to keep from the end of a tool output when compressing. The tail often contains closing context, error messages, or final values.

keep_recent_tool_messages

integer

default:"2"

Number of the most recent ToolMessage objects to exempt from compression. These are the messages the agent is actively reasoning about; compressing them would degrade step quality.

early_exit_min_call_index

integer

default:"40"

Minimum number of model calls that must have occurred before an early-exit nudge can be injected. This prevents the nudge from firing on short, healthy runs.

early_exit_text

string

default:"\"You appear to be stuck in a loop...\""

The text injected as a HumanMessage when an early-exit nudge fires. The default message instructs the agent to stop investigating and submit its current best answer. Override this to match your agent’s specific submission instructions.

signals_fn

callable

A function (steps: list[dict]) -> dict[str, float] that evaluates the agent’s trajectory and returns monitor scores keyed by signal name. The middleware checks the "streak", "hedge", and "diversity" keys to decide whether to fire the early-exit nudge. Pass default_suite_signals to use the built-in 6-monitor suite. When None, the early-exit lever is disabled even if enable_early_exit=True.

enable_compression

boolean

default:"True"

Whether to enable head+tail tool-output compression. Set to False to disable compression entirely while keeping the early-exit lever active.

enable_early_exit

boolean

default:"True"

Whether to enable the early-exit nudge. Set to False to disable the nudge entirely while keeping compression active.

enable_perplexity_compression

boolean

default:"False"

Whether to enable word-level perplexity-based compression. Off by default. Requires perplexity_classifier to be set; if perplexity_classifier is None and this is True, no perplexity compression occurs.

perplexity_classifier

callable

A WordClassifier callable — (words: list[str]) -> list[bool] — that returns a keep/drop decision for each word. Use make_anthropic_classifier() to build one backed by a small Anthropic model, or supply your own heuristic. Required when enable_perplexity_compression=True.

perplexity_recent_cutoff

integer

default:"3"

Messages from fewer than this many model calls ago are considered “recent” and are excluded from perplexity compression. Keeps the agent’s most active context at full fidelity.

perplexity_mid_cutoff

integer

default:"10"

Messages from between perplexity_recent_cutoff and this many calls ago are in the “mid” tier and compressed at perplexity_keep_ratio_mid. Messages older than this are in the “old” tier.

perplexity_keep_ratio_mid

number

default:"0.55"

Target fraction of words to keep in “mid” tier messages (3–9 model calls ago). 0.55 means the classifier aims to keep roughly 55% of words.

perplexity_keep_ratio_old

number

default:"0.30"

Target fraction of words to keep in “old” tier messages (10+ model calls ago). More aggressive than the mid tier.

perplexity_window_words

integer

default:"50"

The number of words per window passed to the classifier in a single call. Larger windows give the classifier more context but cost more tokens per call.

Stats attribute

Every TokenSavingMiddleware instance exposes a stats attribute of type TokenSavingStats that accumulates counters across all before_model calls.

mw = TokenSavingMiddleware(signals_fn=default_suite_signals)
# ... run the agent ...
print(mw.stats.compressions)       # number of head+tail compressions applied
print(mw.stats.chars_saved)        # total characters removed by head+tail compression
print(mw.stats.early_exits)        # number of early-exit nudges injected
print(mw.stats.perplexity_compressions)  # word-level compressions applied
print(mw.stats.perplexity_chars_saved)   # characters removed by word-level compression
print(mw.stats.perplexity_cache_hits)    # cached compression decisions reused

`TokenSavingStats` dataclass

TokenSavingStats is a plain dataclass. All fields default to 0.

compressions

integer

Running count of head+tail compressions applied to ToolMessage objects.

chars_saved

integer

Total characters removed across all head+tail compressions.

early_exits

integer

Number of times the early-exit nudge was injected into the message history.

perplexity_compressions

integer

Number of word-level perplexity compressions applied. Only increments when enable_perplexity_compression=True.

perplexity_chars_saved

integer

Total characters removed by word-level perplexity compression.

perplexity_cache_hits

integer

Number of times a cached compression decision was reused instead of calling the classifier again. Cache keys are (message_id, target_keep_ratio).

Standalone utilities

`compress_tool_output()`

Head+tail truncates a single tool output string when it exceeds a character threshold. Returns the content unchanged if it is within the threshold. You can call this directly when you want to compress a string outside of the middleware lifecycle.

from reasonblocks import compress_tool_output

compressed = compress_tool_output(
    long_output,
    threshold_chars=1800,
    head_chars=900,
    tail_chars=700,
)

content

string

required

The tool output string to compress.

threshold_chars

integer

default:"1800"

Character length above which compression is applied. Strings at or below this length are returned unchanged.

head_chars

integer

default:"900"

Characters to keep from the start of the string.

tail_chars

integer

default:"700"

Characters to keep from the end of the string.

return

string

The original string if it’s within the threshold, otherwise a head + omission notice + tail string of the form "{head}\n\n[... N chars truncated ...]\n\n{tail}".

`make_anthropic_classifier()`

Wraps an anthropic.Anthropic-compatible client as a WordClassifier for use with perplexity-based compression. The classifier asks a small Anthropic model to label each word keep or drop (LLMLingua-2 style, prompt-only — not true log-probability perplexity). Falls back to the built-in heuristic classifier on any failure (parse error, timeout, rate limit), so the middleware never breaks because of a classifier error.

import anthropic
from reasonblocks import TokenSavingMiddleware, make_anthropic_classifier

client = anthropic.Anthropic()
classifier = make_anthropic_classifier(
    client,
    model="claude-haiku-4-5-20251001",
    target_keep_ratio=0.5,
)

mw = TokenSavingMiddleware(
    enable_perplexity_compression=True,
    perplexity_classifier=classifier,
)

client

object

required

An anthropic.Anthropic-compatible client instance. Must expose a client.messages.create() method with the standard Anthropic Messages API signature.

model

string

default:"\"claude-haiku-4-5-20251001\""

The model used to classify words. A small, fast model such as Haiku is recommended to keep classification costs low.

target_keep_ratio

number

default:"0.5"

The fraction of words the classifier should aim to keep. This value is included in the system prompt so the model can calibrate its labeling. 0.5 means aim for roughly 50% retention.

return

WordClassifier

A WordClassifier callable with signature (words: list[str]) -> list[bool]. Pass this to TokenSavingMiddleware(perplexity_classifier=...).

`default_suite_signals()`

Runs the built-in 6-monitor suite over a list of agent steps and returns per-monitor scores as a dict. This is the default signals_fn for the early-exit lever.

from reasonblocks import default_suite_signals, build_steps_from_messages

steps = build_steps_from_messages(state["messages"])
signals = default_suite_signals(steps)
# {'streak': 0.0, 'hedge': 0.42, 'diversity': 0.15, ...}

steps

list[dict]

required

A list of step dicts in the format produced by build_steps_from_messages(). Each dict has keys: step_index, action, action_input, thought, observation, is_error.

return

dict[str, float]

A dict mapping monitor names to float scores. The early-exit lever checks "streak", "hedge", and "diversity" keys specifically.

`build_steps_from_messages()`

Converts a LangChain message history into the step dict format expected by default_suite_signals() and the monitor suite. Pairs each AIMessage’s tool calls with their matching ToolMessage objects via tool_call_id.

from reasonblocks import build_steps_from_messages

steps = build_steps_from_messages(state["messages"])

messages

list

required

A list of LangChain messages (AIMessage, ToolMessage, HumanMessage, etc.) representing the agent’s trajectory so far.

return

list[dict]

A list of step dicts, one per AIMessage (or one per tool call when an AIMessage has multiple tool calls). Each dict contains:

Show Step dict fields

step_index

integer

Zero-based index of the step in the trajectory.

action

string

The tool name called in this step, or an empty string if no tool was called.

action_input

object

The arguments passed to the tool, or an empty dict.

thought

string

The text content of the AIMessage (the model’s reasoning).

observation

string

The content of the matched ToolMessage, or an empty string if no observation was found.

is_error

boolean

True if the ToolMessage has status="error".

Full example

from reasonblocks import (
    ReasonBlocks,
    TokenSavingMiddleware,
    default_suite_signals,
)

rb = ReasonBlocks(api_key="rb_live_...")

mw = TokenSavingMiddleware(
    compress_threshold_chars=1800,
    keep_recent_tool_messages=2,
    early_exit_min_call_index=40,
    signals_fn=default_suite_signals,
)

agent = create_agent(
    model=...,
    tools=...,
    middleware=[rb.middleware(agent_name="reviewer", task="Review PR #5"), mw],
)

result = agent.invoke({"messages": [HumanMessage(content="Review PR #5")]})

print(f"Compressions: {mw.stats.compressions}")
print(f"Chars saved:  {mw.stats.chars_saved}")
print(f"Early exits:  {mw.stats.early_exits}")

SDK Classes

Types

Integrations

TokenSavingMiddleware reference

Constructor

Stats attribute

`TokenSavingStats` dataclass

Standalone utilities

`compress_tool_output()`

`make_anthropic_classifier()`

`default_suite_signals()`

`build_steps_from_messages()`

Full example

SDK Classes

Types

Integrations

Documentation Index

​Constructor

​Stats attribute

​TokenSavingStats dataclass

​Standalone utilities

​compress_tool_output()

​make_anthropic_classifier()

​default_suite_signals()

​build_steps_from_messages()

​Full example

Constructor

Stats attribute

`TokenSavingStats` dataclass

Standalone utilities

`compress_tool_output()`

`make_anthropic_classifier()`

`default_suite_signals()`

`build_steps_from_messages()`

Full example