← All work
Tooling · 2024

CAEN Business-Registry Code Dataset & Normalization Utility

A leading Romanian retail bank

Why It Exists

Onboarding and advising business/SME customers requires classifying each by their CAEN code, Romania’s official business-activity nomenclature. The bank’s lending and advisory tools need a clean, lookup-ready CAEN dataset rather than the raw, inconsistently formatted source list.

What We Built

A small data-preparation utility. A source dataset (Coduri_CAEN.json, ~96 KB) is processed by process.py, which iterates the entries, strips noise (e.g. the literal "Cod CAEN " prefix) and emits a normalised { titlu, cod } structure as filtered_coduri.json (~51 KB), a compact title/code lookup ready to drop into product flows for business-activity selection and classification.

Technologies & Approach

Plain Python with the standard-library json module, deliberately minimal. The value is the cleaned, structured reference dataset and a repeatable script to regenerate it, feeding business-registry lookups in the SME credit and financial-advisory experiences.

Outcome / Impact

Turned a messy public CAEN list into a clean, ready-to-use code/title dataset that supports business-activity classification in the bank’s SME-facing lending and advisory tools.

Capabilities Demonstrated

  • Normalizing public reference data into product-ready lookups
  • Business-registry (CAEN) classification support for SME banking
  • Lightweight, repeatable ETL with zero dependencies
More work See all →