← All work
Tooling · 2025

OpenAI Fine-Tuning Pipeline for Workflow Data

Overview

A small Node.js + Python pipeline that turns dumped n8n workflow data into a fine-tuned OpenAI model. It covers the full path from raw dumps to a ready-to-train JSONL dataset and a launched fine-tuning job.

Why It Exists

To explore whether a model fine-tuned on real workflow examples could assist with generating or reasoning about n8n automations. The repo packages the data wrangling and training-job orchestration needed to validate that idea quickly.

What We Built

A staged set of scripts: dump.js and process_dumps.py ingest and normalize raw exports from a dump/ directory, enrich.js augments records, prepareFinetune.js assembles them into the finetune.jsonl training file, and finetune.js uploads the file and creates an OpenAI fine-tuning job targeting gpt-4o-2024-08-06. Configuration is handled via dotenv and an .env.example.

Technologies & Approach

Node.js (ESM) with the official openai SDK for upload and job creation, plus a Python preprocessing step for the heavier dump parsing. Training data is shaped into OpenAI’s JSONL chat format. The two-language split keeps data wrangling in Python while job orchestration stays in JS.

Outcome / Impact

A working end-to-end fine-tuning proof: from raw workflow dumps to a submitted training job. Validates the studio’s ability to stand up custom-model pipelines and prepare domain-specific training data.

Capabilities Demonstrated

  • End-to-end LLM fine-tuning pipelines (prep, upload, job creation)
  • Training-data extraction, enrichment, and JSONL formatting
  • OpenAI API automation across Node.js and Python
More work See all →