Guten OCR, Cross-Platform JS OCR Library (Evaluation/Extension)
Overview
Guten OCR is an open-source JavaScript OCR library that runs on Node.js, the browser, React Native, and C++, built on PaddleOCR’s PP-OCRv4 models and ONNX Runtime. This repository is the studio’s working copy of the gutenye/ocr project, evaluated and extended across a long-lived branch (~230 commits) as a candidate engine for OCR features.
Why It Exists
Many OCR options require a cloud service or Python runtime. Guten OCR offered on-device, cross-platform text detection and recognition in pure JS/TS, making it worth evaluating, integrating, and extending for embedding OCR directly into Node and browser products.
What We Built
We worked within the Bun-based monorepo (packages/): common (shared detection/recognition pipeline), node (@gutenye/ocr-node, using onnxruntime-node and sharp for image handling), browser (@gutenye/ocr-browser, loading ONNX detection/recognition models and the ppocr dictionary), react-native, and models. Tooling includes Biome for lint/format and Lefthook for git hooks. The API is a simple Ocr.create() then ocr.detect(image).
Technologies & Approach
TypeScript across a Bun workspace; ONNX Runtime executes the PP-OCRv4 detection and recognition models; sharp handles Node-side image decoding. The package split lets the same core pipeline target Node, browser, and React Native runtimes.
Outcome / Impact
Hands-on evaluation and extension of a production-grade, on-device OCR engine, proving the studio can integrate ONNX-based vision models into JavaScript products without a cloud dependency, and operate confidently inside a multi-package OSS codebase. Framed as fork/evaluation and extension of an open-source project.
Capabilities Demonstrated
- On-device OCR with ONNX Runtime and PaddleOCR PP-OCRv4
- Cross-platform JS/TS library design (Node, Browser, React Native)
- Image preprocessing pipelines with
sharp - Working within and extending a Bun-based OSS monorepo