Proposal: Kompress-Ultra for Headroom

Interactive Kompress-Ultra Playground

See how the 4-role pipeline compresses chat history while preserving the critical-syntactic safety floor ($T_{\text{crit}}$).

Select a Preset or Type Below

Original: 0 tokens

Compressed: 0 tokens

0% Saved

1. Pruner & Safety Floor

2. Rewriter Output (Ultra Mode)

The Voting Ensemble Paradox

A multi-checkpoint voting ensemble is meant to be conservative, but under asymmetric training floors, the intuition inverts. Weak checkpoints veto correct keeps on their weakest strata, causing a stratum-wise Pareto collapse.

Interactive Paradox Simulator

Voter 1 (v3 - Noisy Floor) Weak: Paths

Recall (Identifiers)92%

Recall (File Paths)68%

Voter 2 (v5 - Domain-specific) Weak: Identifiers

Recall (Identifiers)70%

Recall (File Paths)95%

Ensemble Result Paradox Collapse

Ensemble Recall (Identifiers)70%

Ensemble Recall (File Paths)68%

Under AND voting, the ensemble's recall collapses to the weakest voter on each stratum (70% for Identifiers, 68% for File Paths), Pareto-dominated by any single strong model.

Theoretical Core

Learned context pruning improves long-context agent efficiency but introduces the Voting Ensemble Paradox. Under unanimity-to-keep (AND) voting ($k=1$ drop-if-any), the ensemble eviction indicator equals the pointwise maximum of the individual voter indicators:

I_ens(x) = ⋁_i=1..N I_i(x) = I_{i^*_k}(x)

This yields a stratum-wise Pareto collapse where the ensemble's recall equals that of the weakest voter on each stratum. As a corrective, `kompress-ultra` employs three core mechanisms:

Mechanism A (Asymmetric Loss Modulation): Adds a $3.0\times$ weighted cross-entropy penalty on critical-syntactic tokens ($T_{\text{crit}}$) during fine-tuning, concentrating gradients on the weakest strata.
Mechanism B (Post-Inference Regex Override): A surgical safety net applied after model scoring to force-keep critical tokens (paths, hex addresses, identifiers).
Mechanism C (Self-Labeling Loop): Closes the training loop by using $A+B$ as an oracle to relabel the training data, internalizing the safety net directly into the model weights.

Model Architecture

Dual-Head ModernBERT

`kompress-v8` uses a 149M-parameter ModernBERT backbone with LoRA fine-tuning applied to the last 4 attention layers. Two task heads share the encoder:

Token Classifier Head: Produces per-token eviction logits.
Span-CNN Head: Scores span-level coherence to prevent evictions from fragmenting syntactic units.

An Asymmetric Modulation Gate scales the token logits to suppress eviction in high-coherence spans:

Ï_i(x) = σ(logits_tok(x) - γ g(logits_span(x)))

Empirical Benchmarks

Evaluated on the Heretic adversarial benchmark, `kompress-v8` dominates traditional prompt compression models on exact-keep rates of critical syntactic tokens.

Method	Exact Keep % ($T_{\text{crit}}$)	Keep Rate (Tokens)	Avg. Latency
kompress-v8 (Ours, Production)	0.993	0.936	97.0 ms
kompress-v8 (Ours, `v4` SSL)	0.967	0.823	—
Random Eviction (Floor)	0.910	0.835	0.0 ms
LLMLingua-2	0.867	1.550	238.9 ms
TextRank (Extractive)	0.599	0.543	23.1 ms

Headroom Integration Proposal

We propose integrating `kompress-ultra` directly into Headroom (referencing Headroom PR #1419) as a core context-management middleware:

1. Middleware Chain Integration

Intercept outgoing LLM payload payloads in Headroom and run token-level classification via a local ONNX runtime of `kompress-v8`.

2. Configurable Safety Floors

Provide pre-configured regex patterns matching $T_{\text{crit}}$ class tokens to ensure 100% survival rates on critical system outputs.

3. Passive Memory Offloading

Seamlessly write evicted tokens to Headroom's memory spine (e.g. SQLite/Milvus) for semantic recall in future turns.

Asymmetric Loss Modulation for Context Compression