Real Time Face Blur

A simple tool to provide real time blur of select faces on video input.

Ani Aggarwal, Vance Degen, Varun Unnithan, Monish Napa, Rohit Kommuru

Live demonstration of the real-time face blurring capabilities.

Installation

To run the project locally, install Conda or Mamba and follow these steps:

# Create conda env for this project and install dependencies
conda env create --name face-blur-rt --file=face-blur-rt.yml

# Activate the env
conda activate face-blur-rt

# Run the demo
python main.py

Introduction & Goals

We utilize computer vision and object detection to perform effective real-time blurring of faces from both live video input and pre-recorded videos. This project has significant implications for:

  • Streaming Platforms (e.g., Twitch): Protecting user privacy in real-time.
  • Law Enforcement: Redacting private video footage to protect multiple suspects' identities.

While face blurring is an existing practice, manual processes are time-consuming and prone to errors. Our solution creates a highly customizable architecture that utilizes efficient techniques and pre-existing face recognition libraries.

Key Features

  • Recognition System: Specified faces can remain unblurred.
  • Recurrent Model: Increases robustness against occlusion.
  • Customizable Settings: Choice between higher frame rate/lower accuracy or vice versa.
  • Blur Options: Whole bounding box blur or facial segmentation blur.

Implementation & Architecture

The system uses a modular pipeline where classes like FaceDetector, FaceRecognizer, and Blurrer are subclassed for specific algorithm implementations. This allows options to be hot-swappable and maintainable.

Figure A: Project Architecture Flowchart
Figure A: Flow chart of the face blurring pipeline and architecture.

Pipeline:

  1. Input Frame: Fed into the FaceDetector (e.g., YuNet or SCRFD).
  2. Tracking (Optional): If detection fails, the tracker (SORT) extrapolates bounding boxes.
  3. Recognition: Bounding boxes are checked against known faces (SFace).
  4. Blurring: Unknown faces are blurred; known faces are labeled.

Optimizations

An unexpected finding was that deep pretrained face detection models (like YuNet) often ran faster than trackers (like SORT). Consequently, it is often more efficient to run the FaceDetector at every frame rather than relying heavily on the tracker, which results in a higher frame rate.

Literature Review & Libraries

YuNet

An innovative face detection architecture using depthwise separable convolutions. Selected for its excellent balance between speed and accuracy, suitable for edge devices.

SCRFD

Sample and Computation Redistribution for Efficient Face Detection. Uses sampling and computational redistribution to produce high accuracy, low cost models. Used as an alternative to YuNet.

SORT

Simple Online and Realtime Tracking. A lightweight tracking algorithm using Kalman filters. Used to enhance recall by extrapolating face positions when the detector fails.

SFace

Privacy-friendly and accurate face recognition using synthetic data. Allows us to deploy the program with very few reference photos (1 to 10s).

Benchmarking & Testing

We benchmarked our model against TinaFace on a 6-minute clip. The video was downsampled from 60fps to 30fps.

Figure D: Average IoUs
Figure D: Average IoUs recorded by our model vs TinaFace.
Figure E: Frame Rate
Figure E: Frame rate of our model vs TinaFace baseline.

Our model performs faster than real time for all tasks except gaussian blur. The drop in IoU between 10 and 25 seconds in the benchmarks is due to a large number of faces in the frame.

Figure F: Excess Bounding Boxes
Figure F: Excess bounding boxes drawn per frame. The tracker tends to draw more excess boxes to maintain recall.

Discussion & Contributions

Results

Our initial goals were met: we created an effective face blurring model that performs optimal face detection and blurring on both live input and recorded videos. The recognition model produces results with relatively high accuracy.

Limitations

We were unable to create an advanced user interface for manual face selection due to time constraints. Additionally, performance may degrade in situations with rapid movement or severe occlusion.