← All work
Capability · 2025 Flagship

In-Browser ID/Document OCR with PaddleOCR ONNX Models

A leading Romanian retail bank

Why It Exists

ID and document OCR in banking ideally runs on-device: keeping sensitive imagery in the browser avoids shipping personal documents to a server and cuts latency. This project explores a fully client-side OCR pipeline as an alternative/complement to server-side document reading in the bank’s onboarding flows.

What We Built

A browser OCR engine built on PaddleOCR’s PP-OCRv3 multilingual models. Shell scripts (download_models.sh, convert.sh) fetch the Paddle inference models, latin recognition, multilingual detection and the mobile angle classifier, and convert each to ONNX via paddle2onnx. The client/ is a TypeScript app (okapi-ocr.ts, build scripts, bun.lockb) that runs the three-stage detect → classify → recognize pipeline entirely in the browser using onnxruntime-web, with OpenCV.js (@techstark/opencv-js) and js-clipper for image pre/post-processing and pdf.js for PDF input. It ships a browser-test harness and Netlify build/deploy configuration.

Technologies & Approach

PaddleOCR models exported to ONNX so they run via ONNX Runtime Web (WASM/WebGL) with no backend; OpenCV.js for box detection and perspective handling; pdf.js to OCR document pages; Bun for fast TS builds. Packaging the detection, classification and recognition models together reproduces a full OCR stack on the client.

Outcome / Impact

Proved that a PaddleOCR-grade pipeline can run client-side in the browser for document/ID text extraction, validating a privacy-preserving, low-latency OCR option for onboarding without sending images to a server.

Capabilities Demonstrated

  • On-device (in-browser) OCR with no server round-trip
  • Converting PaddleOCR/Paddle models to ONNX (paddle2onnx)
  • Running ML inference in the browser via ONNX Runtime Web
  • Computer-vision pre/post-processing with OpenCV.js
  • Document and ID text extraction, including from PDFs
More work See all →