AI Web-Scraping & Mention-Monitoring Spike (Firecrawl)
Overview
A scraping-and-extraction spike built on Firecrawl that does two things: scans a list of regional news article URLs to detect whether a given public figure is mentioned, and extracts structured profile data (person and company details) from business-directory member pages.
Why It Exists
Media monitoring and lead enrichment both reduce to “crawl a page, then pull out a precise, typed answer.” This build evaluated Firecrawl’s crawl + LLM-extraction combo as a fast way to get schema-constrained data from messy web pages without bespoke parsers.
What We Built
Two parallel implementations of the same idea. A Node script (allen.js) uses the Firecrawl JS SDK with a Zod schema ({ mention: boolean }) to classify, per URL, whether a named individual appears in a batch of news articles. A Python script (scrap.py) uses the Firecrawl Python SDK with a Pydantic ExtractSchema (person/company name, descriptions, activity, website) to extract directory member profiles. Additional helpers parse images and transform logs into CSV.
Technologies & Approach
Firecrawl handles fetching and rendering; the value-add is constraining the LLM output with strict schemas (Zod in JS, Pydantic in Python) so results are immediately usable as structured data rather than free text. CSV outputs make the results trivially consumable.
Outcome / Impact
Validated a low-effort pattern for both mention monitoring and profile enrichment using AI-assisted crawling with typed extraction, reusable for media-intelligence and lead-generation workflows.
Capabilities Demonstrated
- AI-assisted web scraping with Firecrawl across JS and Python
- Schema-constrained LLM extraction (Zod / Pydantic) for reliable typed output
- Brand/person mention detection across news sources
- Business-directory profile and company enrichment to CSV