Documentation Index
Fetch the complete documentation index at: https://www.edgee.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
How much can I save with Edgee?
How much can I save with Edgee?
The honest answer: it depends on the workload. The receipts we publish:
- Claude Code endurance — +26.2% more instructions completed on the same Claude Pro plan, 20.8% more efficient per instruction, 5.1% cheaper per task on a cost-adjusted basis. Source:
edgee-ai/claude-compression-lab· writeup. - Codex re-read context — −49.5% fresh input tokens (1.14M → 574K per session), −35.6% total session cost (2.58), cache hit rate 76% → 85%. Source:
edgee-ai/compression-lab· writeup. - Customer aggregate — across active customers (rolling 30 days), token bills are reduced by approximately 20%, with zero measurable drift on SWE-Bench Verified samples.
compression block (saved_tokens, cost_savings, reduction, time_ms) so you can track savings per request, in real time.How does token compression work?
How does token compression work?
Token compression is the surgical removal of redundancy — not summarization. Edgee treats it in two distinct layers:
- Input compression (~99% of total token volume): what enters the context window — system prompts, tool results, codebase context, conversation history, MCP tool definitions.
- Output compression (~1% of volume but high ROI): what the model generates — filler, repetitive scaffolding, polite preambles, over-explanation, markdown overhead.
Tool Result Trimming: Trim CLI and tool results before they reach the model.Tool Surface Reduction: Strips out tools and skills irrelevant to the task before the request hits the model.Output Brevity(by Caveman): Reduces verbosity in model responses.
compression field with metrics so you can track savings in real time.How is Edgee different from using LLM provider APIs directly?
How is Edgee different from using LLM provider APIs directly?
When you call provider APIs directly, you get one provider, one billing surface, no fallback, and no measurement of where the tokens went.Edgee is an Agent Gateway: it sits between your agent or app and the LLM provider APIs and applies three things on every request.
- Compress — input and output token compression, two layers, three named strategies. Customer aggregate ~20% bill reduction.
- Route — per-request fallback on provider 5xx/timeouts; plan-cap continuity for Claude Pro/Max users when quota is hit; configurable provider chain.
- Observe — session-level metering in the OSS gateway, team-level metering in the managed console.
Which LLM providers does Edgee support?
Which LLM providers does Edgee support?
Edgee works with all major LLM providers:
- OpenAI
- Anthropic
- Mistral
- DeepSeek
- xAI (Grok)
- zAI
- AWS Bedrock
- Azure OpenAI
How much latency does Edgee add?
How much latency does Edgee add?
Edge processing runs on Fastly compute at the point of presence closest to the calling application. For typical AI workloads — where LLM inference dominates the wall-clock time — gateway overhead is a small fraction of the total request.
What happens when a provider goes down?
What happens when a provider goes down?
Two routing techniques, both Native:
- Per-request fallback and retry — transient errors are retried with backoff; persistent provider failures route to a configured backup model. Zero downtime from the agent’s perspective.
- Plan-cap continuity — when you hit a Claude Pro/Max plan cap, Edgee falls back from the plan-based provider to an API-key-based provider so the session keeps going.
How does cost tracking work?
How does cost tracking work?
Every response carries a
compression block (saved_tokens, cost_savings, reduction, time_ms) and a per-request cost figure.Beyond per-request data:- Session-level metering — local SQLite log of every request, every compression event, every cost delta. Available in the OSS gateway.
- Team-level metering and dashboard — cross-developer, cross-project aggregation. Budget alerts, webhook notifications, usage exports. Hosted-only.
Can I use my own API keys for LLM providers?
Can I use my own API keys for LLM providers?
Yes. With Bring Your Own Key (BYOK) you keep paying providers directly and use Edgee for compression, routing, and observability. Details in the BYOK docs.
Is Edgee compliant with GDPR, SOC 2?
Is Edgee compliant with GDPR, SOC 2?
For specifics on certifications, regional routing, and data-handling commitments for the managed product, please contact us.
How can I contact support?
How can I contact support?
- Email: support@edgee.ai
- Discord: Join our community
- GitHub: Open an issue