OCR Document-Processing & Bookkeeping Automation Engine
An SMB accounting/fintech platform
Overview
The automation backbone of the accounting platform: a collection of n8n workflows that ingest uploaded financial documents, run OCR and classification through Nanonets, reconcile the predictions against business and supplier data, and write results back across Hasura (GraphQL/Postgres) and DynamoDB. Maintained as version-controlled JSON exports spanning dev, staging and production environments.
The Challenge
A bookkeeping platform must turn raw scanned invoices and receipts into structured, ledger-ready records, extracting vendor, amounts, VAT codes and document type, then matching them to the right company account. Doing this reliably across many clients requires orchestrating several specialised services, with environment isolation and clear webhook entry points.
What We Built
Fourteen workflows covering the document lifecycle, clearly separated by environment (_DEV__, _PROD__, _OLD_PROD__):
- Ingestion & triggers, Hasura document webhooks and a Nanonets classification incoming webhook kick off processing when a document is created.
- OCR & classification, the “Kerrigan” flows and
Run OCR on existing documentsre-run and back-fill Nanonets predictions, including invoice-line-level OCR. - Reconciliation,
Get business company_account and vat_codeandCompare predictions with agent datamap OCR output to the correct company account, supplier and VAT code (querying Postgres via the production database node). - State & export,
DynamoDB Update Pending Documents, the Nanonets prediction-update flow, and aDynamoDB Export CSV to Dropboxjob for downstream reporting.
Technologies & Approach
n8n provides the visual, node-based orchestration; Nanonets supplies OCR and document classification; Hasura exposes the Postgres data layer over GraphQL and emits event webhooks; DynamoDB holds processing state; Dropbox receives exported CSVs. Custom Code nodes handle the matching and transformation logic between systems. Keeping workflows as committed JSON gave the team review-able, environment-aware automation.
Outcome / Impact
Delivered the production pipeline that converts unstructured documents into classified, reconciled accounting records with minimal human touch, the engine the rest of the platform (tax-portal RPA, VAT coding, supplier matching) plugs into. Demonstrates orchestration of an AI/OCR pipeline across GraphQL, SQL and NoSQL stores in a real fintech setting.
Capabilities Demonstrated
- Designing production document-ingestion and OCR pipelines end to end
- Orchestrating heterogeneous systems (Nanonets, Hasura, Postgres, DynamoDB, Dropbox) in n8n
- Event-driven architecture with webhooks and database triggers
- Environment-isolated, version-controlled low-code automation
- Data reconciliation and matching logic embedded in workflow code nodes