← All work
Product · 2026

LLM Inference Platform

Overview

An LLM inference platform that gives projects a managed, OpenAI-compatible endpoint for running models, with per-project usage metering and quota enforcement. It wraps a self-hosted gateway behind a modern web app and dashboard.

The Challenge

Serving LLM inference reliably means more than proxying requests: you need managed per-project keys, metered usage against enforceable quotas, model routing across providers, and a real-time dashboard, all in a serverless architecture that stays low-ops as usage scales.

What We Built

A Next.js 16 (App Router) application deployed to Cloudflare via OpenNext and Wrangler. Convex provides the data layer, server actions, cron jobs, and webhook handling, with Convex Auth for sign-in and the Convex Agent component for agentic flows. Inference is routed through a self-hosted LiteLLM gateway to OpenRouter via the OpenAI-compatible AI SDK provider, so any project gets a standard, swappable inference endpoint; per-project usage is metered and bounded by managed key budgets. The UI is built with shadcn/ui, Tailwind, Recharts for usage charts, and react-markdown. The repo also ships an MCP server, agent skills/config, and a documented setup flow.

Technologies & Approach

Convex + Cloudflare/OpenNext give a fully serverless, low-ops backend with built-in scheduling and webhooks. LiteLLM as a gateway in front of OpenRouter cleanly separates usage/quota management from model selection, while the OpenAI-compatible AI SDK keeps provider integration swappable and standards-based.

Outcome / Impact

A working LLM inference platform that delivers managed, metered, quota-enforced inference through a standard OpenAI-compatible interface, demonstrating an end-to-end modern AI-product stack.

Capabilities Demonstrated

  • Managed LLM inference gateways with usage metering and quotas
  • Multi-provider model routing (LiteLLM + OpenRouter)
  • OpenAI-compatible, swappable provider integration
  • Serverless full-stack on Convex + Cloudflare (OpenNext)
  • MCP server and agentic (Convex Agent) integration
More work See all →