Compress LLM prompts by up to 95% with semantic compression. Your agents pay less. Your users notice nothing.
Paste any prompt. See LLMLingua-2 compression in real time.
Research-grade compression. Production-grade reliability.
Semantic token pruning trained via data distillation from GPT-4. Identifies and removes redundant tokens while preserving meaning with 98.5% accuracy.
REST API designed for autonomous agents. X-API-Key header auth, deterministic responses, no session state. Drop-in middleware for any LLM pipeline.
Combine LLMLingua-2 compression with language optimization (Tokinensis). Two-stage pipeline achieves 25–40x total token reduction.
Runs on BERT-level encoder without GPU. Optimized for VPS and edge deployments. Typical latency: 200–500ms for 1K token prompts.
Every compression logged with token counts, ratios, and cost estimates. Dashboard shows cumulative savings across all your agents.
Built on LLMLingua-2 (MIT). No vendor lock-in on the compression model. Self-hostable. The API layer is our value-add.
Use compression alone or chain with language optimization for maximum effect.
When to use the full pipeline:
✓ Use pipeline when:
⚠ Use compression-only when:
Drop TokensTransfer into any LLM pipeline in minutes.
import httpx api_key = "tt_your_api_key" text = "Your long prompt here..." # Compress before sending to OpenAI/Claude/Gemini resp = httpx.post( "https://tokenstree.es/compress", headers={"X-API-Key": api_key}, json={"text": text, "target_token": 150}, ) compressed = resp.json()["compressed_text"] savings = resp.json()["metrics"]["savings_percent"] # → "83.2" (saved 83% of tokens) # Now use compressed in your LLM call llm_response = your_llm_client.chat(compressed)
Add TokensTransfer as a Claude Code skill and compress prompts directly from your AI workflows.
# Save this to: ~/.claude/skills/tokenstransfer/SKILL.md --- name: tokenstransfer description: > Compress LLM prompts using TokensTransfer (LLMLingua-2). Activate when asked to compress, shorten or optimize a prompt before sending to an LLM. Reduces tokens by 50-91%. --- # TokensTransfer Skill You have access to the TokensTransfer compression API at tokenstree.es. Use it to compress any text before it is sent to an LLM. ## API POST https://tokenstree.es/compress Header: X-API-Key: tt_YOUR_API_KEY Body: {"text": "<prompt>", "rate": 0.3} ## When to use - User asks to "compress", "shorten", "optimize" a prompt - User wants to reduce LLM API costs - Text is >50 tokens and will be sent to an LLM ## How to use 1. Take the user's prompt text 2. POST to /compress with rate=0.3 (keep 30% of tokens) 3. Return the compressed_text and savings_percent 4. Optionally pass to another LLM ## Rate guide - rate=0.5 → gentle compression, 50-58% savings - rate=0.3 → strong compression, 71-76% savings (recommended) - rate=0.1 → aggressive, 89-91% savings (may lose detail)
Welcome back