Skip to content
Daniyar Kurmanbayev Contact

Driver Drowsiness Detection

AI/ML · Computer Vision

YOLOv5-based detection of yawning and closed eyes with a deployed video-processing web demo.

ML Engineer (Computer Vision) + Backend/Deployment (demo web app) Completed (academic project + demo)
  • AI/ML
  • Computer Vision
  • Python
  • PyTorch
  • YOLOv5
  • OpenCV
  • Flask
  • Docker

Summary

Built a driver drowsiness detection demo that localizes eyes and mouth cues on video, then triggers a warning after yawning and an alert after ~1 second of closed eyes. We iterated on dataset quality (re-annotation + augmentation), compared Faster R-CNN (Detectron2) vs YOLOv5, and shipped the best-performing pipeline behind a simple web UI. Key highlights:

  • Re-annotated face images to make eye detection work in full-face frames
  • Chose YOLOv5 after recall/precision tradeoffs with Faster R-CNN
  • Deployed a Dockerized Flask app behind NGINX on GCP

Quick facts

  • Role: ML Engineer (Computer Vision) + Backend/Deployment (demo web app)
  • Timeframe: Not specified
  • Platform: Web demo (video upload → processed output) + CV model inference
  • Status: Completed (academic project + demo)
  • Team: Team project

Problem

  • Original Kaggle data was not annotated for detection and didn’t transfer well from “eyes-only” images to full-face frames.
  • The system needed high recall to avoid missing drowsiness events, while keeping false alerts manageable.
  • The demo had to process full videos, overlay detections, and return a clear output artifact.

Solution

We rebuilt the dataset around full-face frames and annotated eyes + yawning states, then used augmentation to simulate real driving conditions (brightness, blur, noise). After benchmarking Detectron2 Faster R-CNN and tuning thresholds, we moved to YOLOv5 for better practical accuracy. The final demo is a Flask web app that accepts a video upload, runs PyTorch inference frame-by-frame, overlays detections, and returns a processed video with warning/alert events.

  • Implemented event logic: “yawn warning” and “eyes closed > 1s” alert

Architecture

  • Data: Kaggle base dataset → CVAT re-annotation (full-face) → Roboflow split + augmentation
  • Model training: transfer learning with Detectron2 (Faster R-CNN) and YOLOv5
  • Inference: PyTorch YOLOv5 on video frames → bounding boxes + class labels + confidence
  • Event engine: timers/thresholds to trigger yawning warnings and closed-eye alerts
  • Demo app: Flask (templates + Jinja2) for upload/result flow
  • Deployment: Docker Compose on GCP VM + NGINX reverse proxy

Tech stack

  • Architecture: Object detection + temporal event rules (yawn / eyes-closed)
  • Backend/Infra: Flask, Docker Compose, NGINX, Google Cloud VM
  • Tooling: PyTorch, YOLOv5, Detectron2, Roboflow, CVAT

Hard problems solved

  • Fixed a core dataset mismatch: “eyes-only” training data failed on full-face inference, so we re-annotated for the real input distribution
  • Designed augmentations to reflect driving noise (lighting shifts, blur, in-cabin noise) without breaking labels
  • Tuned for safety-oriented recall: explored confidence thresholds and accepted precision tradeoffs to reduce missed detections
  • Switched model families when Faster R-CNN thresholding still missed classes at higher confidence cutoffs
  • Implemented temporal logic (closed-eyes duration) instead of relying on single-frame predictions
  • Built a robust video pipeline: decode → infer → annotate → re-encode → return output reliably in a web workflow

Impact / Results

  • Delivered a working demo that flags yawns and sustained eye-closure events on uploaded driving videos
  • Produced a repeatable dataset + training pipeline with clear learnings on model selection and threshold tradeoffs
  • Deployed an end-to-end system (model + web app + infra) suitable for showcasing to non-technical stakeholders