Interactive Kompress-Ultra Playground
See how the 4-role pipeline compresses chat history while preserving the critical-syntactic safety floor ($T_{\text{crit}}$).
The Voting Ensemble Paradox
A multi-checkpoint voting ensemble is meant to be conservative, but under asymmetric training floors, the intuition inverts. Weak checkpoints veto correct keeps on their weakest strata, causing a stratum-wise Pareto collapse.
Theoretical Core
Learned context pruning improves long-context agent efficiency but introduces the Voting Ensemble Paradox. Under unanimity-to-keep (AND) voting ($k=1$ drop-if-any), the ensemble eviction indicator equals the pointwise maximum of the individual voter indicators:
This yields a stratum-wise Pareto collapse where the ensemble's recall equals that of the weakest voter on each stratum. As a corrective, `kompress-ultra` employs three core mechanisms:
- Mechanism A (Asymmetric Loss Modulation): Adds a $3.0\times$ weighted cross-entropy penalty on critical-syntactic tokens ($T_{\text{crit}}$) during fine-tuning, concentrating gradients on the weakest strata.
- Mechanism B (Post-Inference Regex Override): A surgical safety net applied after model scoring to force-keep critical tokens (paths, hex addresses, identifiers).
- Mechanism C (Self-Labeling Loop): Closes the training loop by using $A+B$ as an oracle to relabel the training data, internalizing the safety net directly into the model weights.
Model Architecture
Dual-Head ModernBERT
`kompress-v8` uses a 149M-parameter ModernBERT backbone with LoRA fine-tuning applied to the last 4 attention layers. Two task heads share the encoder:
- Token Classifier Head: Produces per-token eviction logits.
- Span-CNN Head: Scores span-level coherence to prevent evictions from fragmenting syntactic units.
An Asymmetric Modulation Gate scales the token logits to suppress eviction in high-coherence spans:
Empirical Benchmarks
Evaluated on the Heretic adversarial benchmark, `kompress-v8` dominates traditional prompt compression models on exact-keep rates of critical syntactic tokens.
| Method | Exact Keep % ($T_{\text{crit}}$) | Keep Rate (Tokens) | Avg. Latency |
|---|---|---|---|
| kompress-v8 (Ours, Production) | 0.993 | 0.936 | 97.0 ms |
| kompress-v8 (Ours, `v4` SSL) | 0.967 | 0.823 | — |
| Random Eviction (Floor) | 0.910 | 0.835 | 0.0 ms |
| LLMLingua-2 | 0.867 | 1.550 | 238.9 ms |
| TextRank (Extractive) | 0.599 | 0.543 | 23.1 ms |
Headroom Integration Proposal
We propose integrating `kompress-ultra` directly into Headroom (referencing Headroom PR #1419) as a core context-management middleware:
1. Middleware Chain Integration
Intercept outgoing LLM payload payloads in Headroom and run token-level classification via a local ONNX runtime of `kompress-v8`.
2. Configurable Safety Floors
Provide pre-configured regex patterns matching $T_{\text{crit}}$ class tokens to ensure 100% survival rates on critical system outputs.
3. Passive Memory Offloading
Seamlessly write evicted tokens to Headroom's memory spine (e.g. SQLite/Milvus) for semantic recall in future turns.