← All work
Product · 2025

AI Web-Scraping & Mention-Monitoring Spike (Firecrawl)

Overview

A scraping-and-extraction spike built on Firecrawl that does two things: scans a list of regional news article URLs to detect whether a given public figure is mentioned, and extracts structured profile data (person and company details) from business-directory member pages.

Why It Exists

Media monitoring and lead enrichment both reduce to “crawl a page, then pull out a precise, typed answer.” This build evaluated Firecrawl’s crawl + LLM-extraction combo as a fast way to get schema-constrained data from messy web pages without bespoke parsers.

What We Built

Two parallel implementations of the same idea. A Node script (allen.js) uses the Firecrawl JS SDK with a Zod schema ({ mention: boolean }) to classify, per URL, whether a named individual appears in a batch of news articles. A Python script (scrap.py) uses the Firecrawl Python SDK with a Pydantic ExtractSchema (person/company name, descriptions, activity, website) to extract directory member profiles. Additional helpers parse images and transform logs into CSV.

Technologies & Approach

Firecrawl handles fetching and rendering; the value-add is constraining the LLM output with strict schemas (Zod in JS, Pydantic in Python) so results are immediately usable as structured data rather than free text. CSV outputs make the results trivially consumable.

Outcome / Impact

Validated a low-effort pattern for both mention monitoring and profile enrichment using AI-assisted crawling with typed extraction, reusable for media-intelligence and lead-generation workflows.

Capabilities Demonstrated

  • AI-assisted web scraping with Firecrawl across JS and Python
  • Schema-constrained LLM extraction (Zod / Pydantic) for reliable typed output
  • Brand/person mention detection across news sources
  • Business-directory profile and company enrichment to CSV
More work See all →