Powered by LLMLingua-2 · MIT Licensed

Stop paying for
tokens you don't need

Compress LLM prompts by up to 95% with semantic compression. Your agents pay less. Your users notice nothing.

See it live →
0
Tokens Saved
Avg Compression
$0
Est. Cost Saved
0
Active Users

Watch tokens disappear

Paste any prompt. See LLMLingua-2 compression in real time.

Original Prompt 0 tokens
Compressed 0 tokens
Token savings
💡 Note: Demo uses client-side estimation. Full LLMLingua-2 semantic compression requires an API key.

Why TokensTransfer

Research-grade compression. Production-grade reliability.

🔬

LLMLingua-2 Engine

Semantic token pruning trained via data distillation from GPT-4. Identifies and removes redundant tokens while preserving meaning with 98.5% accuracy.

ACL 2024 · Microsoft Research · MIT License
🔑

Agent-First API

REST API designed for autonomous agents. X-API-Key header auth, deterministic responses, no session state. Drop-in middleware for any LLM pipeline.

POST /compress · POST /compress/pipeline
🔗

Pipeline Mode

Combine LLMLingua-2 compression with language optimization (Tokinensis). Two-stage pipeline achieves 25–40x total token reduction.

compress → translate → tokinensis

CPU Native

Runs on BERT-level encoder without GPU. Optimized for VPS and edge deployments. Typical latency: 200–500ms for 1K token prompts.

4 cores · 8GB RAM · No GPU needed
📊

Full Observability

Every compression logged with token counts, ratios, and cost estimates. Dashboard shows cumulative savings across all your agents.

GET /stats · GET /stats/global
🛡️

Open Source Core

Built on LLMLingua-2 (MIT). No vendor lock-in on the compression model. Self-hostable. The API layer is our value-add.

github.com/microsoft/LLMLingua

Stack the savings

Use compression alone or chain with language optimization for maximum effect.

📝
Your Prompt
1,000 tokens
🔬
LLMLingua-2
~50–100 tokens
🌐
Tokinensis
~35–72 tokens
🤖
LLM API
93–96% saved

When to use the full pipeline:

✓ Use pipeline when:

  • Your agents send repetitive structured prompts
  • Language quality in LLM response is not critical
  • You need maximum cost reduction (enterprise scale)

⚠ Use compression-only when:

  • Language style/tone of LLM response matters
  • Prompts contain domain-specific vocabulary
  • You need deterministic, predictable output format

3 lines to slash your costs

Drop TokensTransfer into any LLM pipeline in minutes.

import httpx

api_key = "tt_your_api_key"
text = "Your long prompt here..."

# Compress before sending to OpenAI/Claude/Gemini
resp = httpx.post(
    "https://tokenstree.es/compress",
    headers={"X-API-Key": api_key},
    json={"text": text, "target_token": 150},
)
compressed = resp.json()["compressed_text"]
savings = resp.json()["metrics"]["savings_percent"]
# → "83.2"  (saved 83% of tokens)

# Now use compressed in your LLM call
llm_response = your_llm_client.chat(compressed)

Use TokensTransfer inside Claude

Add TokensTransfer as a Claude Code skill and compress prompts directly from your AI workflows.

# Save this to: ~/.claude/skills/tokenstransfer/SKILL.md

---
name: tokenstransfer
description: >
  Compress LLM prompts using TokensTransfer (LLMLingua-2).
  Activate when asked to compress, shorten or optimize a prompt
  before sending to an LLM. Reduces tokens by 50-91%.
---

# TokensTransfer Skill

You have access to the TokensTransfer compression API at tokenstree.es.
Use it to compress any text before it is sent to an LLM.

## API
POST https://tokenstree.es/compress
Header: X-API-Key: tt_YOUR_API_KEY
Body: {"text": "<prompt>", "rate": 0.3}

## When to use
- User asks to "compress", "shorten", "optimize" a prompt
- User wants to reduce LLM API costs
- Text is >50 tokens and will be sent to an LLM

## How to use
1. Take the user's prompt text
2. POST to /compress with rate=0.3 (keep 30% of tokens)
3. Return the compressed_text and savings_percent
4. Optionally pass to another LLM

## Rate guide
- rate=0.5 → gentle compression, 50-58% savings
- rate=0.3 → strong compression, 71-76% savings (recommended)
- rate=0.1 → aggressive, 89-91% savings (may lose detail)
API Reference

Dashboard

Welcome back

Total Requests
0
Tokens Original
0
Tokens Saved
0
Savings Rate
0%
Your API Key
Quick Compress