FAQ

How much can I save with Edgee?

The honest answer: it depends on the workload. The receipts we publish:

Claude Code endurance — +26.2% more instructions completed on the same Claude Pro plan, 20.8% more efficient per instruction, 5.1% cheaper per task on a cost-adjusted basis. Source: edgee-ai/claude-compression-lab · writeup.
Codex re-read context — −49.5% fresh input tokens (1.14M → 574K per session), −35.6% total session cost ( $4.00 →$ 2.58), cache hit rate 76% → 85%. Source: edgee-ai/compression-lab · writeup.
Customer aggregate — across active customers (rolling 30 days), token bills are reduced by approximately 20%, with zero measurable drift on SWE-Bench Verified samples.

Every response includes a compression block (saved_tokens, cost_savings, reduction, time_ms) so you can track savings per request, in real time.

How does token compression work?

Token compression is the surgical removal of redundancy — not summarization. Edgee treats it in two distinct layers:

Input compression (~99% of total token volume): what enters the context window — system prompts, tool results, codebase context, conversation history, MCP tool definitions.
Output compression (~1% of volume but high ROI): what the model generates — filler, repetitive scaffolding, polite preambles, over-explanation, markdown overhead.

Three named strategies, each toggleable independently:

Tool Result Trimming: Trim CLI and tool results before they reach the model.
Tool Surface Reduction: Strips out tools and skills irrelevant to the task before the request hits the model.
Output Brevity (by Caveman): Reduces verbosity in model responses.

Every response includes a compression field with metrics so you can track savings in real time.

How is Edgee different from using LLM provider APIs directly?

When you call provider APIs directly, you get one provider, one billing surface, no fallback, and no measurement of where the tokens went.Edgee is an Agent Gateway: it sits between your agent or app and the LLM provider APIs and applies three things on every request.

Compress — input and output token compression, two layers, three named strategies. Customer aggregate ~20% bill reduction.
Route — per-request fallback on provider 5xx/timeouts; plan-cap continuity for Claude Pro/Max users when quota is hit; configurable provider chain.
Observe — session-level metering in the OSS gateway, team-level metering in the managed console.

Which LLM providers does Edgee support?

Edgee works with all major LLM providers:

OpenAI
Anthropic
Google
Mistral
DeepSeek
xAI (Grok)
zAI
AWS Bedrock
Azure OpenAI

The full list of supported models is on the models page.

How much latency does Edgee add?

Edge processing runs on Fastly compute at the point of presence closest to the calling application. For typical AI workloads — where LLM inference dominates the wall-clock time — gateway overhead is a small fraction of the total request.

What happens when a provider goes down?

Two routing techniques, both Native:

Per-request fallback and retry — transient errors are retried with backoff; persistent provider failures route to a configured backup model. Zero downtime from the agent’s perspective.
Plan-cap continuity — when you hit a Claude Pro/Max plan cap, Edgee falls back from the plan-based provider to an API-key-based provider so the session keeps going.

How does cost tracking work?

Every response carries a compression block (saved_tokens, cost_savings, reduction, time_ms) and a per-request cost figure.Beyond per-request data:

Session-level metering — local SQLite log of every request, every compression event, every cost delta. Available in the OSS gateway.
Team-level metering and dashboard — cross-developer, cross-project aggregation. Budget alerts, webhook notifications, usage exports. Hosted-only.

Can I use my own API keys for LLM providers?

Yes. With Bring Your Own Key (BYOK) you keep paying providers directly and use Edgee for compression, routing, and observability. Details in the BYOK docs.

Is Edgee compliant with GDPR, SOC 2?

How can I contact support?

Email: support@edgee.ai
Discord: Join our community
GitHub: Open an issue

Customers on the managed product have access to dedicated support channels.

Introduction

Quickstart

Features

Integrations

Introduction

Quickstart

Features

Integrations

Documentation Index