GraphRAG Knowledge-Graph Indexing over Media Corpus
An influencer-marketing media-intelligence platform
Overview
An evaluation of Microsoft’s GraphRAG to build a knowledge graph from the platform’s media corpus, extracting entities and relationships with an LLM and enabling graph-based retrieval that goes beyond flat vector search.
Why It Exists
Influencer-marketing intelligence benefits from understanding entities (people, brands, topics) and how they connect across articles. Standard RAG retrieves passages; GraphRAG builds a structured graph that supports community summaries and multi-hop reasoning. This repo evaluated that approach.
What We Built
A GraphRAG 0.2.1 pipeline configured with input/, output/, cache/, prompts/ and a lancedb/ store. The dependency set (graphrag, LanceDB, dask/fastparquet, Azure Search/Identity/Blob) reflects the standard GraphRAG indexing stack adapted to the platform’s content. Framed as integration/evaluation rather than ground-up work.
Technologies & Approach
Microsoft GraphRAG drives entity/relationship extraction and graph construction; LanceDB stores embeddings; Parquet + Dask handle the intermediate data; Azure provides the LLM/search backends. Prompts customised for the media domain.
Outcome / Impact
Demonstrated how knowledge-graph RAG could enrich the platform’s search and analytics with entity relationships and topic communities, validating the technique against the real corpus.
Capabilities Demonstrated
- Integrating and evaluating Microsoft GraphRAG
- Building knowledge graphs from unstructured media text
- Operating LanceDB and Parquet-based AI data pipelines