Text Compression Utilities¶
For coding tasks, Headroom provides standalone text compression utilities that applications can use explicitly. These are opt-in — they're not applied automatically, giving you full control over when and how to compress text content.
Design Philosophy: SmartCrusher compresses JSON automatically because it's structure-preserving and safe. Text compression is lossy and context-dependent, so applications should decide when to use it.
Available Utilities¶
| Utility | Input Type | Use Case |
|---|---|---|
SearchCompressor |
grep/ripgrep output | Search results with file:line:content format |
LogCompressor |
Build/test logs | pytest, npm, cargo, make output |
TextCompressor |
Generic text | Any plain text with anchor preservation |
detect_content_type |
Any content | Detect content type for routing decisions |
SearchCompressor¶
Compresses search results (grep, ripgrep, ag) while preserving relevant matches.
from headroom.transforms import SearchCompressor
# Your grep/ripgrep output (could be 1000s of lines)
search_results = """
src/utils.py:42:def process_data(items):
src/utils.py:43: \"\"\"Process items.\"\"\"
src/models.py:15:class DataProcessor:
src/models.py:89: def process(self, items):
... hundreds more matches ...
"""
# Explicitly compress when you decide it's appropriate
compressor = SearchCompressor()
result = compressor.compress(search_results, context="find process")
print(f"Compressed {result.original_match_count} matches to {result.compressed_match_count}")
print(result.compressed)
What Gets Preserved¶
- Exact query matches: Lines containing the search term
- High-relevance matches: Scored by BM25 similarity to context
- File diversity: Ensures results from different files are kept
- First/last matches: Context from start and end of results
LogCompressor¶
Compresses build and test output while preserving errors, warnings, and summaries.
from headroom.transforms import LogCompressor
# pytest output with 1000s of lines
build_output = """
===== test session starts =====
collected 500 items
tests/test_foo.py::test_1 PASSED
... hundreds of passed tests ...
tests/test_bar.py::test_fail FAILED
AssertionError: expected 5, got 3
===== 1 failed, 499 passed =====
"""
# Compress logs, preserving errors and stack traces
compressor = LogCompressor()
result = compressor.compress(build_output)
# Errors, stack traces, and summary are preserved
print(result.compressed)
print(f"Compression ratio: {result.compression_ratio:.1%}")
What Gets Preserved¶
- Errors and failures: Any line with ERROR, FAILED, Exception, etc.
- Warnings: Warning messages that might be important
- Stack traces: Full tracebacks for debugging
- Summaries: Test/build summary lines
- Section headers: Structural markers like
=====
TextCompressor¶
General-purpose text compression with anchor preservation.
from headroom.transforms import TextCompressor
long_text = """
... thousands of lines of documentation ...
"""
compressor = TextCompressor()
result = compressor.compress(long_text, context="authentication")
print(result.compressed)
What Gets Preserved¶
- Relevant paragraphs: Scored by similarity to context
- Anchors: Headers, section markers, important keywords
- Structure: Document organization is maintained
Content Type Detection¶
Automatically detect content type to route to the right compressor.
from headroom.transforms import detect_content_type, ContentType
content = "src/main.py:42:def process():"
detection = detect_content_type(content)
if detection.content_type == ContentType.SEARCH_RESULTS:
# Route to SearchCompressor
pass
elif detection.content_type == ContentType.BUILD_OUTPUT:
# Route to LogCompressor
pass
elif detection.content_type == ContentType.PLAIN_TEXT:
# Route to TextCompressor
pass
Content Types¶
| Type | Detection Pattern |
|---|---|
SEARCH_RESULTS |
file:line:content format |
BUILD_OUTPUT |
pytest, npm, cargo markers |
JSON |
Valid JSON structure |
PLAIN_TEXT |
Default fallback |
Integration Pattern¶
from headroom.transforms import (
detect_content_type, ContentType,
SearchCompressor, LogCompressor, TextCompressor
)
def compress_tool_output(content: str, context: str = "") -> str:
"""Application-level compression with explicit control."""
detection = detect_content_type(content)
if detection.content_type == ContentType.SEARCH_RESULTS:
result = SearchCompressor().compress(content, context)
return result.compressed
elif detection.content_type == ContentType.BUILD_OUTPUT:
result = LogCompressor().compress(content)
return result.compressed
elif detection.content_type == ContentType.PLAIN_TEXT:
result = TextCompressor().compress(content, context)
return result.compressed
else:
# JSON or other - let SmartCrusher handle it automatically
return content
Configuration¶
Each compressor accepts configuration options:
from headroom.transforms import SearchCompressor, SearchCompressorConfig
config = SearchCompressorConfig(
max_results=50, # Keep up to 50 matches
preserve_file_diversity=True, # Ensure different files represented
relevance_threshold=0.3, # Minimum relevance score to keep
)
compressor = SearchCompressor(config)
Performance¶
| Compressor | Typical Input | Output | Speed |
|---|---|---|---|
| SearchCompressor | 1000 matches | 30-50 matches | ~2ms |
| LogCompressor | 5000 lines | 100-200 lines | ~3ms |
| TextCompressor | 10000 chars | 2000 chars | ~2ms |
When to Use¶
| Scenario | Recommendation |
|---|---|
| JSON tool output | Let SmartCrusher handle automatically |
| grep/ripgrep results | Use SearchCompressor |
| pytest/npm/cargo output | Use LogCompressor |
| Documentation/README | Use TextCompressor |
| Unknown content | Use detect_content_type to route |