A Performance Study of Different Deep Learning Architectures For Detecting Construction Equipment in Sites

sabek, mohamed mamdooh

There are relatively few available construction equipment detectors models thatuse deep learning architectures; many of these use old object detection architectures like CNN (Convolutional Neural Networks), RCNN (Region-Based Convolutional Neural Network), and early versions of You Only Look Once (YOLO) V1. It…

There are relatively few available construction equipment detectors models thatuse deep learning architectures; many of these use old object detection architectures like CNN (Convolutional Neural Networks), RCNN (Region-Based Convolutional Neural Network), and early versions of You Only Look Once (YOLO) V1. It can be challenging to deploy these models in practice for tracking construction equipment while working on site. This thesis aims to provide a clear guide on how to train and evaluate the performance of different deep learning architecture models to detect different kinds of construction equipment on-site using two You Only Look Once (YOLO) architecturesYOLO v5s and YOLO R to detect three classes of different construction equipment onsite, including Excavators, Dump Trucks, and Loaders. The thesis also provides a simple solution to deploy the trained models. Additionally, this thesis describes a specialized, high-quality dataset with three thousand pictures created to train these models on real data by considering a typical worksite scene, various motions, varying perspectives, and angles of construction equipment on the site. The results presented herein show that after 150 epochs of training, the YOLORP6 has the best mAP at 0.981, while the YOLO v5s mAP is 0.936. However, YOLO v5s had the fastest and the shortest training time on Tesla P100 GPU as a processing unit on the Google Colab notebook. The YOLOv5s needed 4 hours and 52 minutes, but the YOLOR-P6 needed 14 hours and 35 minutes to finish the training.ii The final findings of this study show that the YOLOv5s model is the most efficient model to use when building an artificial intelligence model to detect construction equipment because of the size of its weights file relative to other versions of YOLO models- 14.4 MB for YOLOV5s vs. 288 MB for YOLOR-P6. This hugely impacts the processing unit’s performance, which is used to predict the construction equipment on site. In addition, the constructed database is published on a public dataset on the Roboflow platform, which can be used later as a foundation for future research and improvement for the newer deep learning architectures.

Copyright Statement