Conversational AI & Topic-Modeling R&D Service
A social storytelling / lead-gen platform
Overview
An AI research-and-development workbench exploring conversational agents, entity-aware search, and topic modeling over storytelling data for a social platform. It evaluated both hosted (OpenAI) and local (GPT4All / LLaMA) models behind a lightweight service.
Why It Exists
The platform wanted to understand how LLMs could power conversational discovery and automatic organization of user-generated stories. This codebase served as the sandbox to test approaches before productizing any of them.
What We Built
A Python service (Dockerized, with a Flask layer) containing multiple build tracks: LangChain- and LlamaIndex-based retrieval agents; a custom agents/ package with named-entity, fuzzy, and entity-search modules; BERTopic notebooks for topic clustering; Autolabel-driven dataset labeling; and side-by-side trials of OpenAI APIs versus locally hosted GPT4All/LLaMA models. Working CSV/JSONL datasets and notebooks document the exploration.
Technologies & Approach
LangChain 0.0.2x and LlamaIndex for orchestration and retrieval; OpenAI plus GPT4All/LLaMA for hosted-vs-local comparison; BERTopic for unsupervised topic discovery; Autolabel for LLM-assisted data labeling; Flask and Docker Compose for packaging the build service.
Outcome / Impact
Validated which LLM patterns and model-hosting trade-offs were viable for conversational discovery and content organization, and produced reusable building blocks (entity search, topic modeling, auto-labeling) for downstream product work. Positioned as applied AI/ML R&D.
Capabilities Demonstrated
- Rapid LLM application building with LangChain and LlamaIndex
- Hosted vs. local model evaluation (OpenAI, GPT4All, LLaMA)
- Topic modeling and LLM-assisted data labeling at dataset scale
- Entity-aware and fuzzy retrieval over unstructured content