AI Screenshot-Parsing Parcel Tracker (Telegram + Cloudflare Workers)
Overview
A Telegram bot that tracks parcel deliveries from any courier in the world without writing a single site-specific scraper. The user pastes a tracking URL; the bot screenshots the page in a headless browser and uses a vision LLM to read the delivery status, location and ETA, then notifies the user whenever the status changes.
Why It Exists
Every courier exposes a different tracking page, and traditional scrapers break the moment a site changes its markup. This build validated a “screenshot + vision model” approach that sidesteps per-site DOM parsing entirely, turning any tracking page into structured data.
What We Built
A Cloudflare Workers application written in TypeScript with a clean module split: a Telegram layer (api.ts, bot.ts, commands.ts) handling the webhook and /track, /list, /status, /stop commands; a tracking layer integrating headless-browser capture (Browserbase) plus Firecrawl and Scrappey fallbacks; a vision parser (parser.ts) that sends full-page screenshots to OpenAI GPT-4 Vision and extracts status, location, ETA and a delivered flag; a D1 (SQLite) persistence layer with schema and typed queries; and a scheduled cron handler that re-checks all active packages three times daily and pushes Telegram notifications on change. The bot also dismisses cookie/consent popups before capture, and includes early-stage payments and email modules.
Technologies & Approach
Cloudflare Workers + D1 + Wrangler give a fully serverless, edge-deployed runtime with built-in cron triggers and a SELF service binding for distributed processing. Browser automation is delegated to Browserbase so the worker stays lightweight, and the vision model removes the need for brittle, per-courier HTML parsing.
Outcome / Impact
Proved that a single, site-agnostic vision pipeline can track parcels across arbitrary courier websites and run unattended on a serverless schedule, with conversational delivery updates over Telegram. A practical demonstration of combining OCR/vision LLMs with edge infrastructure for real-world automation.
Capabilities Demonstrated
- Vision-LLM (GPT-4 Vision) extraction of structured data from screenshots, no per-site scrapers
- Site-agnostic headless-browser capture with multiple provider fallbacks
- Fully serverless edge architecture on Cloudflare Workers + D1
- Scheduled background jobs and change-detection notifications
- Conversational bot UX over the Telegram Bot API