← All work
Infrastructure · 2026

Edge LLM gateway, multi-provider routing, budget tokens and cost metering

Overview

An edge-deployed LLM gateway that sits in front of Anthropic, OpenAI and OpenRouter. It accepts a budget-scoped billing token, looks up the issuer’s real provider keys, routes to the right upstream by model, and returns per-request cost and token usage in response headers.

Why It Exists

Running agents and apps against several LLM providers raises three recurring problems: keeping real API keys out of clients, attributing and capping spend per user, and not rewriting call sites for each provider’s request/response format. This Worker centralizes all three behind one endpoint.

What We Built

A single Cloudflare Worker (src/index.ts) that receives a billing token in the x-api-key header and validates it against a KV-stored issuer record, checking expiry and budget (with budgets expressed in microdollars over an hour/day/week/month window). It detects the target provider from the model name, gpt-/o1/o3/o4/chatgpt- prefixes route to OpenAI, known claude-* models to Anthropic, and the remainder to OpenRouter (including custom/self-hosted endpoints), and substitutes the issuer’s own provider key. Requests and responses are translated between Anthropic and OpenAI formats so callers can use one shape. Every response carries cost-accounting headers (X-Request-Cost-USD, X-Input-Tokens, X-Output-Tokens, X-Model), computed from per-model pricing with support for issuer-level custom pricing overrides. The KV issuer namespace is shared with a sibling agent-cloud service, so token issuance and spend tracking stay consistent across the platform.

Technologies & Approach

A stateless Cloudflare Worker keeps the gateway globally close to callers with minimal ops. KV holds issuer secrets, provider keys and pricing; budget tokens carry a subject, issuer, expiry and a windowed budget so enforcement is self-describing. A fallback ANTHROPIC_API_KEY secret covers issuer records without their own key.

Outcome / Impact

A reusable internal building block that decouples applications from LLM providers while enforcing per-user budgets and producing exact cost telemetry on every call, the metering and key-isolation layer behind the studio’s agent/automation platform.

Capabilities Demonstrated

  • Multi-provider LLM routing (Anthropic, OpenAI, OpenRouter) by model detection
  • Anthropic↔OpenAI request/response format translation
  • Budget-scoped billing tokens with windowed spend enforcement
  • Per-request cost and token metering via response headers
  • Provider-key isolation behind an edge gateway, with custom pricing overrides
More work See all →