Driver Drowsiness Detection

AI/ML · Computer Vision

YOLOv5-based detection of yawning and closed eyes with a deployed video-processing web demo.

ML Engineer (Computer Vision) + Backend/Deployment (demo web app) Completed (academic project + demo)

AI/ML
Computer Vision

Python
PyTorch
YOLOv5
OpenCV
Flask
Docker

Summary

Built a driver drowsiness detection demo that localizes eyes and mouth cues on video, then triggers a warning after yawning and an alert after ~1 second of closed eyes. We iterated on dataset quality (re-annotation + augmentation), compared Faster R-CNN (Detectron2) vs YOLOv5, and shipped the best-performing pipeline behind a simple web UI. Key highlights:

Re-annotated face images to make eye detection work in full-face frames
Chose YOLOv5 after recall/precision tradeoffs with Faster R-CNN
Deployed a Dockerized Flask app behind NGINX on GCP

Quick facts

Role: ML Engineer (Computer Vision) + Backend/Deployment (demo web app)
Timeframe: Not specified
Platform: Web demo (video upload → processed output) + CV model inference
Status: Completed (academic project + demo)
Team: Team project

Problem

Original Kaggle data was not annotated for detection and didn’t transfer well from “eyes-only” images to full-face frames.
The system needed high recall to avoid missing drowsiness events, while keeping false alerts manageable.
The demo had to process full videos, overlay detections, and return a clear output artifact.

Solution

We rebuilt the dataset around full-face frames and annotated eyes + yawning states, then used augmentation to simulate real driving conditions (brightness, blur, noise). After benchmarking Detectron2 Faster R-CNN and tuning thresholds, we moved to YOLOv5 for better practical accuracy. The final demo is a Flask web app that accepts a video upload, runs PyTorch inference frame-by-frame, overlays detections, and returns a processed video with warning/alert events.

Implemented event logic: “yawn warning” and “eyes closed > 1s” alert

Architecture

Data: Kaggle base dataset → CVAT re-annotation (full-face) → Roboflow split + augmentation
Model training: transfer learning with Detectron2 (Faster R-CNN) and YOLOv5
Inference: PyTorch YOLOv5 on video frames → bounding boxes + class labels + confidence
Event engine: timers/thresholds to trigger yawning warnings and closed-eye alerts
Demo app: Flask (templates + Jinja2) for upload/result flow
Deployment: Docker Compose on GCP VM + NGINX reverse proxy

Tech stack

Architecture: Object detection + temporal event rules (yawn / eyes-closed)
Backend/Infra: Flask, Docker Compose, NGINX, Google Cloud VM
Tooling: PyTorch, YOLOv5, Detectron2, Roboflow, CVAT

Hard problems solved

Fixed a core dataset mismatch: “eyes-only” training data failed on full-face inference, so we re-annotated for the real input distribution
Designed augmentations to reflect driving noise (lighting shifts, blur, in-cabin noise) without breaking labels
Tuned for safety-oriented recall: explored confidence thresholds and accepted precision tradeoffs to reduce missed detections
Switched model families when Faster R-CNN thresholding still missed classes at higher confidence cutoffs
Implemented temporal logic (closed-eyes duration) instead of relying on single-frame predictions
Built a robust video pipeline: decode → infer → annotate → re-encode → return output reliably in a web workflow

Impact / Results

Delivered a working demo that flags yawns and sustained eye-closure events on uploaded driving videos
Produced a repeatable dataset + training pipeline with clear learnings on model selection and threshold tradeoffs
Deployed an end-to-end system (model + web app + infra) suitable for showcasing to non-technical stakeholders