Client engagement · 2023

OCR Document-Processing & Bookkeeping Automation Engine

An SMB accounting/fintech platform

Overview

The automation backbone of the accounting platform: a collection of n8n workflows that ingest uploaded financial documents, run OCR and classification through Nanonets, reconcile the predictions against business and supplier data, and write results back across Hasura (GraphQL/Postgres) and DynamoDB. Maintained as version-controlled JSON exports spanning dev, staging and production environments.

The Challenge

A bookkeeping platform must turn raw scanned invoices and receipts into structured, ledger-ready records, extracting vendor, amounts, VAT codes and document type, then matching them to the right company account. Doing this reliably across many clients requires orchestrating several specialised services, with environment isolation and clear webhook entry points.

What We Built

Fourteen workflows covering the document lifecycle, clearly separated by environment (_DEV__, _PROD__, _OLD_PROD__):

Ingestion & triggers, Hasura document webhooks and a Nanonets classification incoming webhook kick off processing when a document is created.
OCR & classification, the “Kerrigan” flows and Run OCR on existing documents re-run and back-fill Nanonets predictions, including invoice-line-level OCR.
Reconciliation, Get business company_account and vat_code and Compare predictions with agent data map OCR output to the correct company account, supplier and VAT code (querying Postgres via the production database node).
State & export, DynamoDB Update Pending Documents, the Nanonets prediction-update flow, and a DynamoDB Export CSV to Dropbox job for downstream reporting.

Technologies & Approach

n8n provides the visual, node-based orchestration; Nanonets supplies OCR and document classification; Hasura exposes the Postgres data layer over GraphQL and emits event webhooks; DynamoDB holds processing state; Dropbox receives exported CSVs. Custom Code nodes handle the matching and transformation logic between systems. Keeping workflows as committed JSON gave the team review-able, environment-aware automation.

Outcome / Impact

Delivered the production pipeline that converts unstructured documents into classified, reconciled accounting records with minimal human touch, the engine the rest of the platform (tax-portal RPA, VAT coding, supplier matching) plugs into. Demonstrates orchestration of an AI/OCR pipeline across GraphQL, SQL and NoSQL stores in a real fintech setting.

Capabilities Demonstrated

Designing production document-ingestion and OCR pipelines end to end
Orchestrating heterogeneous systems (Nanonets, Hasura, Postgres, DynamoDB, Dropbox) in n8n
Event-driven architecture with webhooks and database triggers
Environment-isolated, version-controlled low-code automation
Data reconciliation and matching logic embedded in workflow code nodes

More work See all →

Client engagement 2025