Product · 2023

Fuzzy Supplier-to-Company Entity Resolution

An SMB accounting/fintech platform

Overview

A Jupyter-based build that reconciles OCR-extracted invoice seller names against a canonical list of known suppliers using fuzzy name matching. It addresses the core data-quality problem in automated bookkeeping: the same vendor appears under dozens of spelling, casing and legal-suffix variants.

Why It Exists

Invoices arrive as scanned documents whose seller names are extracted by OCR and never match the clean supplier registry exactly, “MUSGRAVE”, “Musgrave Ltd”, trailing whitespace, multi-line addresses bleeding into the name field. Without reliable matching, every document needs manual vendor assignment, defeating the point of automation.

What We Built

A notebook (Name Matching.ipynb) that loads a 209-row canonical company list and a larger feed of predicted invoice data with pandas, then applies the name_matching library’s NameMatcher to link extracted seller names to supplier IDs. Working CSVs (209-companies.csv, data.csv) capture the real fields in play, supplier_id, name, OCR-predicted seller/buyer names, VAT codes and Nanonets confidence scores.

Technologies & Approach

Python with pandas for data wrangling and the name_matching package for distance-based string similarity and candidate ranking. The exploratory notebook format let us tune matching thresholds against actual platform data before considering productionisation.

Outcome / Impact

Validated that automated fuzzy matching could collapse noisy OCR vendor names onto canonical supplier records, reducing manual reconciliation. As an R&D notebook it proved the approach and informed the supplier-matching logic carried into the platform’s n8n workflows.

Capabilities Demonstrated

Entity resolution and record linkage on real, messy financial data
Practical fuzzy-matching pipeline design in Python/pandas
Rapid data-science building to de-risk a production feature

More work See all →

Client engagement 2025