← All work
Tooling · 2024

AWS Data Ingestion & Face-Recognition Enrichment Scripts (Python)

Overview

A collection of Python scripts for an event-context data pipeline: ingesting attendee/user data, enriching profiles and avatars, and indexing faces with AWS Rekognition collections for search. It combines CSV-based data processing, Firebase access, and computer-vision enrichment.

Why It Exists

Event and community data needs to be cleaned, enriched, and made searchable, including by face. These scripts handle the unglamorous middle of that work: pulling data, decrypting/transforming payloads, building Rekognition collections, enriching avatars and overall records, and exposing a simple search API.

What We Built

A toolbox of single-purpose Python scripts: ingestion (ingest.py, ingest2.py, ingesthidden.py), Rekognition collection management (create_collection.py, face.py, aws_client.py initializing the Rekognition client), enrichment (enrichavatars.py, enrichoveralldata.py, process_avatars.py, process_image.py), data cleanup (remove_empty.py, count_users.py), export (export.py, producing output.csv/output_updated.csv), a search_api.py, Firebase integration (firebase.py), messaging (sendmessage.py), and even AES payload decryption helpers. Data flows through CSVs and Firebase. It’s an operational scripting suite rather than a packaged application.

Technologies & Approach

Python with boto3 driving AWS Rekognition for face detection and collection-based search, Firebase for data storage/auth, PyCryptodome for AES payload handling, and CSV files as the working data format. Each script does one job, composed into ad-hoc pipelines as needed.

Outcome / Impact

The suite delivered a working face-search and data-enrichment capability for an event dataset, proving out Rekognition collections plus Firebase as a fast path to searchable, enriched profiles. It showcases practical CV + data-engineering glue work under real operational constraints.

Capabilities Demonstrated

  • Building face-recognition search with AWS Rekognition collections
  • Multi-stage data ingestion, enrichment, and export pipelines in Python
  • Integrating cloud CV, Firebase, and CSV-based data flows
  • Operational scripting and payload decryption/transformation
More work See all →