Token reduction for coding agents

For SWE-style code-fixing agents, two real middleware components reproduce the configurations from our paired SWE-bench Pro benchmark: TokenSavingMiddleware (compression + early-exit) for maximum cost cut at unchanged accuracy, and the general monitor (enable_general_monitor) for an accuracy lift. This page shows how to assemble each.

Validated headline

Paired n=75 runs on ScaleAI/SWE-bench_Pro, claude-sonnet-4-6, real Docker grading, same task ids across arms.

Arm	Configuration	Pass rate	Mean input tokens	Delta
baseline	no middleware	25.3%	1,257,316	—
token-saving	`TokenSavingMiddleware`	25.4%	606,212	−51.8% tokens, flat accuracy
+ general monitor	`enable_general_monitor=True`	36.0%	1,136,946	+10.7pp accuracy, −9.6% tokens

Use the token-saving stack when the goal is maximum cost cut at unchanged success; add the general monitor when the goal is the accuracy lift.

Token-saving stack (the −51.8% arm)

Head+tail tool-output compression plus an early-exit nudge. Compression works on its own; the early-exit nudge requires a signals_fn you supply (there is no built-in — see token saving).

from langchain.agents import create_agent
from reasonblocks import ReasonBlocks, TokenSavingMiddleware

rb = ReasonBlocks(api_key="rb_live_...")

agent = create_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[bash_tool, ...],
    system_prompt="...",
    middleware=[
        rb.middleware(agent_name="swe-agent"),
        TokenSavingMiddleware(
            compress_threshold_chars=1800,    # validated threshold
            keep_recent_tool_messages=2,      # keep the active step fully visible
            early_exit_min_call_index=40,
            # signals_fn=my_signals,          # supply to enable the early-exit nudge
        ),
    ],
)

Add the general monitor (the accuracy-lift arm)

GeneralMonitorMiddleware runs the v1 rule-firing detector pack (semantic loop, verification skip, and the rest — see Monitors) and injects a short corrective hint when a rule fires. Place it before TokenSavingMiddleware so injected hints go out compressed.

from langchain.agents import create_agent
from reasonblocks import ReasonBlocks, GeneralMonitorMiddleware, TokenSavingMiddleware

rb = ReasonBlocks(api_key="rb_live_...")

agent = create_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[bash_tool, ...],
    system_prompt="...",
    middleware=[
        rb.middleware(agent_name="swe-agent"),
        GeneralMonitorMiddleware(
            max_tool_calls=50,                # match your agent loop's budget
            pack="v1",
            verify_tools=frozenset({"run_tests", "pytest"}),  # your verify tools
            submit_tools=frozenset({"submit"}),               # your submit tools
        ),
        TokenSavingMiddleware(compress_threshold_chars=1800, keep_recent_tool_messages=2),
    ],
)

Assemble via the unified config

If you compose the stack through ReasonBlocksConfig / build_middleware, the ordering (general monitor before token-saving) is handled for you:

from reasonblocks import ReasonBlocksAPI, ReasonBlocksConfig, build_middleware
from reasonblocks.fsm import DifficultyFSM
from reasonblocks.state import TraceStateManager
from reasonblocks.client import ReasonBlocks

api = ReasonBlocksAPI(api_key="rb_live_...")

cfg = ReasonBlocksConfig(
    enable_token_saving=True,
    ts_compress_threshold_chars=1800,
    ts_keep_recent_tool_messages=2,
    ts_enable_early_exit=True,
    enable_general_monitor=True,   # the accuracy-lift arm; drop for token-saving only
    gm_max_tool_calls=50,
)

middleware = build_middleware(
    cfg, api,
    score_fn=ReasonBlocks.score_step,
    fsm=DifficultyFSM(),
    state_manager=TraceStateManager(),
)

The early-exit nudge fires only when you pass a signals_fn (config field ts_signals_fn). Without one, the token-saving stack still delivers the bulk of the savings through tool-output compression.

A/B test this stack

To prove the cost/accuracy impact on your own tasks, route runs through ab_middleware and attach the code-review middleware only on the on arm — keyed off mw.arm — so the control stays a vanilla agent with telemetry only:

from langchain.agents import create_agent
from reasonblocks import ReasonBlocks, GeneralMonitorMiddleware, TokenSavingMiddleware

rb = ReasonBlocks(api_key="rb_live_...")

mw = rb.ab_middleware(
    experiment_id="cust-acme-q2",
    unit_id=task.id,          # stable id -> retries stay in the same arm
    org_id="acme",
)

stack = [mw]
if mw.arm == "on":            # full code-review stack only on the treatment arm
    stack += [
        GeneralMonitorMiddleware(max_tool_calls=50, pack="v1"),
        TokenSavingMiddleware(compress_threshold_chars=1800, keep_recent_tool_messages=2),
    ]

agent = create_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[bash_tool, ...],
    system_prompt="...",
    middleware=stack,
)

mw.arm is "on" / "off" (or "" outside an experiment). Run your eval set, then pull the per-arm report — see Run an A/B evaluation. The on arm now reflects the full code-review stack (steering + general monitor

compression); the off arm is the vanilla baseline. requires reasonblocks>=0.2.0.

Getting Started

Concepts

Using ReasonBlocks

Connectors and sync

Token reduction for coding agents

Validated headline

Token-saving stack (the −51.8% arm)

Add the general monitor (the accuracy-lift arm)

Assemble via the unified config

A/B test this stack

See also

​Validated headline

​Token-saving stack (the −51.8% arm)

​Add the general monitor (the accuracy-lift arm)

​Assemble via the unified config

​A/B test this stack

​See also

Validated headline

Token-saving stack (the −51.8% arm)

Add the general monitor (the accuracy-lift arm)

Assemble via the unified config

A/B test this stack

See also