Interactive Kompress-Ultra Playground
Type or select a preset below. Watch the pruner and rewriter compress your prompt in real-time as you type.
The Voting Ensemble Paradox
Imagine a committee of three experts deciding which words to keep in a document to save space. To be extremely conservative, the rule is: "If even one expert votes to delete a word, we delete it."
Each expert is smart, but has one blind spot where they always vote to delete. Because of the veto rule, every single critical item gets deleted because the expert who doesn't understand it vetoes it. The group becomes worse than any single expert on their own!
Theoretical Core
Learned context pruning improves long-context agent efficiency but introduces the **Voting Ensemble Paradox**. Under unanimity-to-keep (AND) voting ($k=1$ drop-if-any), the ensemble eviction indicator equals the pointwise maximum of the individual voter indicators:[Paper p.6]
This yields a stratum-wise Pareto collapse where the ensemble's recall equals that of the weakest voter on each stratum. As a corrective, `kompress-ultra` employs three core mechanisms:
Vaked Capability & Context (vakedc)
A decentralized routing and verification matrix: vaked-base defines node capacities, vaked orchestrates active routing, and vakedc signs context proofs.
Model Architecture
Dual-Head ModernBERT
`kompress-v8` uses a 149M-parameter ModernBERT backbone with LoRA fine-tuning applied to the last 4 attention layers. Two task heads share the encoder:
- Token Classifier Head: Produces per-token eviction logits.
- Span-CNN Head: Scores span-level coherence to prevent evictions from fragmenting syntactic units.
An Asymmetric Modulation Gate scales the token logits to suppress eviction in high-coherence spans:
Empirical Benchmarks
Evaluated on the Heretic adversarial benchmark, kompress-v8 dominates traditional prompt compression models.
| Method |
Exact Keep % ($T_{\text{crit}}$)
Percentage of critical syntactic tokens (paths, errors, code) successfully preserved after pruning.
|
Keep Rate (Tokens)
The ratio of output tokens divided by input tokens. Lower means more compression.
|
Avg. Latency
Average processing time in milliseconds for the context pruner to run.
|
|---|---|---|---|
| kompress-v8 (Ours, Production) | 0.993[Paper p.16] | 0.936 | 97.0 ms |
| kompress-v8 (Ours, `v4` SSL) | 0.967[Paper p.16] | 0.823 | — Offline Checkpoint: Not evaluated for active inference latency. |
| Random Eviction (Floor) | 0.910[Paper p.16] | 0.835 | 0.0 ms |
| LLMLingua-2 | 0.867[Paper p.16] | 1.550 Context Expansion: Kept 155% of original tokens (caused context bloat). | 238.9 ms |
| TextRank (Extractive) | 0.599[Paper p.16] | 0.543 | 23.1 ms |
Headroom Integration Proposal
We propose integrating `kompress-ultra` directly into Headroom (referencing Headroom PR #1419) as a core context-management middleware:
1. Middleware Chain Integration
Intercept outgoing LLM payload payloads in Headroom and run token-level classification via a local ONNX runtime of `kompress-v8`.
2. Configurable Safety Floors
Provide pre-configured regex patterns matching $T_{\text{crit}}$ class tokens to ensure 100% survival rates on critical system outputs (originally reviewed in headroom PR #1400).
3. Passive Memory Offloading
Seamlessly write evicted tokens to Headroom's memory spine (e.g. SQLite/Milvus) for semantic recall in future turns.
Reviews & Feedback
Submit a review of this proposal. Reviews are cryptographically signed by your browser and submitted via a **GitHub Pull Request**, guaranteeing they are **provably immutable** (the author cannot modify them without breaking the signature).
Academic Telemetry & Verification
This site is dedicated strictly to academic research. There are no tracking scripts, Google Ads, or third-party cookies. The connection is proxied and secured solely through Cloudflare.
Glossary
Ecosystem & Related Work
This research is part of a broader ecosystem. All source code, dataset distributions, and experiment logs are open-source and publicly available for replication:
Read the Full Paper
Detailed mathematical proof of the paradox and fine-tuning methodology.
LoopKit GitHub
Four-phase autonomous state machine orchestrating the training runs.
pocoo.vaked.dev
Chronological registry of every training run, evaluation, and telemetry.
ultrawhale-dogfood
Dataset generated by the coding agent dogfeed loop on HuggingFace.
Hugging Face Profile
Host profile containing the kompress-v8 model weights and research datasets.
UltrameshAI Project
The big-picture decentralized agent routing and lifecycle substrate.