Strands Integration¶
Headroom integrates with Strands Agents to provide automatic context optimization. Two integration patterns: wrap the model, or hook into tool calls.
Installation¶
Quick Start¶
from strands import Agent
from strands.models.bedrock import BedrockModel
from headroom.integrations.strands import HeadroomStrandsModel
# Wrap your model
model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
optimized = HeadroomStrandsModel(wrapped_model=model)
# Create agent as usual
agent = Agent(model=optimized)
response = agent("Investigate the production incident")
# Check savings
print(f"Tokens saved: {optimized.total_tokens_saved}")
Every API call the agent makes — including tool result round-trips — gets compressed automatically.
Integration Patterns¶
1. Model Wrapping¶
Wraps the Strands Model interface. Every call to stream() compresses the messages before they hit the provider.
from strands.models.bedrock import BedrockModel
from headroom.integrations.strands import HeadroomStrandsModel
model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
optimized = HeadroomStrandsModel(wrapped_model=model)
# Streaming works identically
agent = Agent(model=optimized)
response = agent("Analyze these logs")
With custom config:
from headroom import HeadroomConfig
config = HeadroomConfig()
optimized = HeadroomStrandsModel(wrapped_model=model, config=config)
2. Hook Provider (Tool Output Compression)¶
Compresses tool call results via Strands' hook system. Uses SmartCrusher on JSON arrays returned by tools.
from strands import Agent
from strands.models.bedrock import BedrockModel
from headroom.integrations.strands import HeadroomHookProvider
model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
hooks = HeadroomHookProvider(
compress_tool_outputs=True,
min_tokens_to_compress=200,
preserve_errors=True,
)
agent = Agent(model=model, hooks=[hooks])
response = agent("Search the database for recent failures")
# Check tool compression savings
print(f"Tokens saved by hooks: {hooks.total_tokens_saved}")
The hook preserves:
- Error items (error indicators, exceptions)
- Anomalous values (statistical outliers)
- Items matching the user's query context
- First/last items for boundary context
3. Both Together¶
Model wrapping compresses conversation history. Hooks compress individual tool results. Use both for maximum savings.
from headroom.integrations.strands import HeadroomStrandsModel, HeadroomHookProvider
optimized = HeadroomStrandsModel(wrapped_model=model)
hooks = HeadroomHookProvider(compress_tool_outputs=True)
agent = Agent(model=optimized, hooks=[hooks])
Structured Output¶
HeadroomStrandsModel supports Strands' structured output feature:
from pydantic import BaseModel
class Analysis(BaseModel):
severity: str
root_cause: str
recommendation: str
result = optimized.structured_output(Analysis, messages)
Metrics¶
# Per-request metrics
for m in optimized.metrics_history:
print(f" {m.tokens_before} → {m.tokens_after} ({m.tokens_saved} saved)")
# Running total
print(f"Total saved: {optimized.total_tokens_saved}")
How It Works¶
Agent decides to call tool
│
▼
Tool executes, returns result
│
▼
HeadroomHookProvider (optional)
compresses tool result JSON
│
▼
Agent builds next API request
│
▼
HeadroomStrandsModel.stream()
compresses full message list
│
▼
Provider API (Bedrock, etc.)
The model wrapper uses Headroom's full pipeline (CacheAligner → ContentRouter → IntelligentContext). The hook provider uses SmartCrusher directly for fast JSON compression of individual tool results.
Supported Providers¶
HeadroomStrandsModel auto-detects the provider from the wrapped model:
| Strands Model | Provider Detected |
|---|---|
BedrockModel |
Anthropic (via Bedrock) |
OllamaModel |
OpenAI-compatible |
Custom Model |
Falls back to estimation |