Text-to-SQL Bring-Your-Own-Data, Adapted & Self-Hosted
Overview
A self-hosted, “bring-your-own-data” text-to-SQL service adapted from the open-source caesarHQ textSQL project. The studio’s fork carries custom commits to wire it to a private database and tune the model behavior for an internal use case.
Why It Exists
To let non-technical users query a real database in plain English, and to evaluate how reliably an LLM can translate natural language into correct SQL against the studio’s own schema, reducing the burden on engineers to write ad-hoc queries.
What We Built
Starting from the open-source textSQL (Flask API + React client), the team self-hosted the service and made targeted modifications across ~18 commits: connecting custom data, adjusting contract/table naming, and reverting to GPT-4 for higher-quality SQL generation. Docker Compose packages the API and client for local/standalone deployment.
Technologies & Approach
Python/Flask backend with a React frontend; OpenAI GPT-4 for the NL→SQL translation, with table selection and prompt tuning to keep generated queries valid. The “BYOD” design point lets the same engine point at arbitrary user datasets.
Outcome / Impact
An internal adaptation that validated LLM-driven text-to-SQL against real data and gave the studio hands-on experience tuning model choice and prompts for query correctness. Honestly framed as an adaptation/evaluation of an OSS base rather than a from-scratch build.
Capabilities Demonstrated
- Natural-language-to-SQL over custom databases
- Adapting and self-hosting an open-source data tool
- Prompt/model tuning for query correctness (table selection, GPT-4)
- Building conversational interfaces to structured data