Product · 2025

GraphRAG Knowledge-Graph Indexing over Media Corpus

An influencer-marketing media-intelligence platform

Overview

An evaluation of Microsoft’s GraphRAG to build a knowledge graph from the platform’s media corpus, extracting entities and relationships with an LLM and enabling graph-based retrieval that goes beyond flat vector search.

Why It Exists

Influencer-marketing intelligence benefits from understanding entities (people, brands, topics) and how they connect across articles. Standard RAG retrieves passages; GraphRAG builds a structured graph that supports community summaries and multi-hop reasoning. This repo evaluated that approach.

What We Built

A GraphRAG 0.2.1 pipeline configured with input/, output/, cache/, prompts/ and a lancedb/ store. The dependency set (graphrag, LanceDB, dask/fastparquet, Azure Search/Identity/Blob) reflects the standard GraphRAG indexing stack adapted to the platform’s content. Framed as integration/evaluation rather than ground-up work.

Technologies & Approach

Microsoft GraphRAG drives entity/relationship extraction and graph construction; LanceDB stores embeddings; Parquet + Dask handle the intermediate data; Azure provides the LLM/search backends. Prompts customised for the media domain.

Outcome / Impact

Demonstrated how knowledge-graph RAG could enrich the platform’s search and analytics with entity relationships and topic communities, validating the technique against the real corpus.

Capabilities Demonstrated

Integrating and evaluating Microsoft GraphRAG
Building knowledge graphs from unstructured media text
Operating LanceDB and Parquet-based AI data pipelines

More work See all →

Product 2026