Text Compression Utilities¶

For coding tasks, Headroom provides standalone text compression utilities that applications can use explicitly. These are opt-in — they're not applied automatically, giving you full control over when and how to compress text content.

Design Philosophy: SmartCrusher compresses JSON automatically because it's structure-preserving and safe. Text compression is lossy and context-dependent, so applications should decide when to use it.

Available Utilities¶

Utility	Input Type	Use Case
`SearchCompressor`	grep/ripgrep output	Search results with `file:line:content` format
`LogCompressor`	Build/test logs	pytest, npm, cargo, make output
`TextCompressor`	Generic text	Any plain text with anchor preservation
`detect_content_type`	Any content	Detect content type for routing decisions

SearchCompressor¶

Compresses search results (grep, ripgrep, ag) while preserving relevant matches.

from headroom.transforms import SearchCompressor

# Your grep/ripgrep output (could be 1000s of lines)
search_results = """
src/utils.py:42:def process_data(items):
src/utils.py:43:    \"\"\"Process items.\"\"\"
src/models.py:15:class DataProcessor:
src/models.py:89:    def process(self, items):
... hundreds more matches ...
"""

# Explicitly compress when you decide it's appropriate
compressor = SearchCompressor()
result = compressor.compress(search_results, context="find process")

print(f"Compressed {result.original_match_count} matches to {result.compressed_match_count}")
print(result.compressed)

What Gets Preserved¶

Exact query matches: Lines containing the search term
High-relevance matches: Scored by BM25 similarity to context
File diversity: Ensures results from different files are kept
First/last matches: Context from start and end of results

LogCompressor¶

Compresses build and test output while preserving errors, warnings, and summaries.

from headroom.transforms import LogCompressor

# pytest output with 1000s of lines
build_output = """
===== test session starts =====
collected 500 items
tests/test_foo.py::test_1 PASSED
... hundreds of passed tests ...
tests/test_bar.py::test_fail FAILED
AssertionError: expected 5, got 3
===== 1 failed, 499 passed =====
"""

# Compress logs, preserving errors and stack traces
compressor = LogCompressor()
result = compressor.compress(build_output)

# Errors, stack traces, and summary are preserved
print(result.compressed)
print(f"Compression ratio: {result.compression_ratio:.1%}")

What Gets Preserved¶

Errors and failures: Any line with ERROR, FAILED, Exception, etc.
Warnings: Warning messages that might be important
Stack traces: Full tracebacks for debugging
Summaries: Test/build summary lines
Section headers: Structural markers like =====

TextCompressor¶

General-purpose text compression with anchor preservation.

from headroom.transforms import TextCompressor

long_text = """
... thousands of lines of documentation ...
"""

compressor = TextCompressor()
result = compressor.compress(long_text, context="authentication")

print(result.compressed)

What Gets Preserved¶

Relevant paragraphs: Scored by similarity to context
Anchors: Headers, section markers, important keywords
Structure: Document organization is maintained

Content Type Detection¶

Automatically detect content type to route to the right compressor.

from headroom.transforms import detect_content_type, ContentType

content = "src/main.py:42:def process():"

detection = detect_content_type(content)
if detection.content_type == ContentType.SEARCH_RESULTS:
    # Route to SearchCompressor
    pass
elif detection.content_type == ContentType.BUILD_OUTPUT:
    # Route to LogCompressor
    pass
elif detection.content_type == ContentType.PLAIN_TEXT:
    # Route to TextCompressor
    pass

Content Types¶

Type	Detection Pattern
`SEARCH_RESULTS`	`file:line:content` format
`BUILD_OUTPUT`	pytest, npm, cargo markers
`JSON`	Valid JSON structure
`PLAIN_TEXT`	Default fallback

Integration Pattern¶

from headroom.transforms import (
    detect_content_type, ContentType,
    SearchCompressor, LogCompressor, TextCompressor
)

def compress_tool_output(content: str, context: str = "") -> str:
    """Application-level compression with explicit control."""
    detection = detect_content_type(content)

    if detection.content_type == ContentType.SEARCH_RESULTS:
        result = SearchCompressor().compress(content, context)
        return result.compressed
    elif detection.content_type == ContentType.BUILD_OUTPUT:
        result = LogCompressor().compress(content)
        return result.compressed
    elif detection.content_type == ContentType.PLAIN_TEXT:
        result = TextCompressor().compress(content, context)
        return result.compressed
    else:
        # JSON or other - let SmartCrusher handle it automatically
        return content

Configuration¶

Each compressor accepts configuration options:

from headroom.transforms import SearchCompressor, SearchCompressorConfig

config = SearchCompressorConfig(
    max_results=50,           # Keep up to 50 matches
    preserve_file_diversity=True,  # Ensure different files represented
    relevance_threshold=0.3,  # Minimum relevance score to keep
)

compressor = SearchCompressor(config)

Performance¶

Compressor	Typical Input	Output	Speed
SearchCompressor	1000 matches	30-50 matches	~2ms
LogCompressor	5000 lines	100-200 lines	~3ms
TextCompressor	10000 chars	2000 chars	~2ms

When to Use¶

Scenario	Recommendation
JSON tool output	Let SmartCrusher handle automatically
grep/ripgrep results	Use SearchCompressor
pytest/npm/cargo output	Use LogCompressor
Documentation/README	Use TextCompressor
Unknown content	Use detect_content_type to route