ASL Recognition

AI/ML · Computer Vision

Object detection pipeline for ASL hand sign localization and classification, benchmarked across SSD/YOLO/Faster R-CNN.

ML Engineer (Computer Vision) + Backend Developer (demo API) Completed (academic project)

AI/ML
Computer Vision

Python
YOLOv5
Detectron2
TensorFlow OD
Roboflow

Summary

American Sign Language alphabet recognition from images. The goal was to localize the hand and classify the sign into one of 26 letters. We evaluated multiple object-detection approaches (SSD, YOLOv5, Faster R-CNN), used transfer learning, and built a simple Django API for inference demos. Key highlights:

Compared SSD-ResNet50, YOLOv5, and Faster R-CNN pipelines on the same labeled dataset
Used Roboflow augmentation to scale the dataset and standardize labels
Shipped a prediction endpoint returning letter, bounding box, and confidence

Quick facts

Role: ML Engineer (Computer Vision) + Backend Developer (demo API)
Timeframe: Not specified
Platform: Model training + REST API demo
Status: Completed (academic project)
Team: Team project

Problem

Needed both localization (bounding box) and classification (26 classes) with limited labeled data.
Training compute was constrained early (no GPU), forcing careful iteration choices.
Offline validation metrics did not translate to real-world video performance.

Solution

We framed the task as object detection and used transfer learning to accelerate progress. Data was augmented via Roboflow (rotations and flips) to increase per-class coverage. We then benchmarked SSD-ResNet50 vs YOLOv5 vs Faster R-CNN, focusing on the gap between validation metrics and live inference. For the demo, we wrapped inference behind a Django REST endpoint that accepts an image and returns the predicted letter, bounding box, and score.

Prioritized a reproducible pipeline over a single “best” model

Architecture

Dataset: Kaggle ASL alphabet images (26 classes) with existing annotations
Preprocessing: Roboflow augmentation + train/val/test split + COCO/YOLO export formats
Models: transfer learning on SSD-ResNet50 (TF Object Detection Zoo), YOLOv5, Faster R-CNN (Detectron2)
Training: GPU-enabled runs for longer training schedules; tracked loss and detection metrics
Evaluation: compared validation metrics vs real-world tests (video inference)
Demo: Django API endpoint → model inference → JSON response (class, box, score)

Tech stack

Architecture: Transfer learning, object detection (SSD / YOLOv5 / Faster R-CNN)
Backend/Infra: Django, REST API, Detectron2, TensorFlow Object Detection Zoo
Tooling: PyTorch, Roboflow, Kaggle dataset, Google Colab/local GPU training

Hard problems solved

Managed compute constraints: early SSD training was too slow without GPU, forcing a model strategy change
Diagnosed metric vs reality mismatch: high validation scores but poor live video accuracy
Built a consistent labeling pipeline across frameworks (TF OD API, YOLO format, COCO JSON)
Tuned augmentation to increase data volume while monitoring degradation in generalization
Investigated failure modes separately for box regression vs class prediction (correct box, wrong class)
Packaged inference into an API that returns structured outputs suitable for a front-end demo

Impact / Results

Produced an end-to-end ASL alphabet detection pipeline with multiple model baselines
Demonstrated that offline metrics can be misleading without real inference testing
Delivered a working inference API for demo usage (image in → prediction out)