Jana Köhler: Self-Supervised Learning in Industrial Visual Inspection of Freight Trains
BCCN Berlin / Technische Universität Berlin
Abstract
With an anticipated increase in freight traffic, the demand for visual train inspections is set to rise. However, this process is currently manual and time-consuming. Automating these inspections will prevent bottlenecks resulting from a shortage of skilled workers. Unfortunately, training deep neural networks for damage detection from scratch requires large labeled datasets, which can be costly and time-consuming to produce. This thesis examines whether self-supervised pre-training on available unlabeled data can improve the performance on selected downstream tasks. To this end, VICRegL was chosen as the self-supervised learning method, and multiple ResNet-50 backbones were pre-trained on different domain-specific datasets. The effectiveness of the pre-training was evaluated across three object detection tasks and two classification tasks, collectively referred to as downstream tasks. A RetinaNet and a ResNet-50 classifier were employed for the down-stream object detection and classification tasks respectively. The model backbones were initialized with weights obtained from either supervised or self-supervised pre-training. The models were then fine-tuned on a specific downstream task. The results showed that self-supervised pre-training outperformed supervised pre-training in classification tasks, with a particularly notable 20 % increase in F-score observed in the fine-grained classification task. However, all object detection tasks benefited the most from supervised object detection pre-training. This study also found that self-supervised pre-training on domain-specific datasets alone did not achieve the same level of performance as self-supervised training on ImageNet. Dataset size is likely a significant factor in this performance disparity, since the largest domain-specific dataset, containing 100, 000 images, is significantly smaller than ImageNet, which contains 1.2 million images. Overall, this thesis highlights the importance of pre-training and challenges the notion that supervised ImageNet pre-training is always the best option. Several aspects affecting pre-training effectiveness are assessed and discussed, including dataset size and specificity to the downstream task. Scalability is a key advantage of self-supervised learning and further performance increases are expected when up-scaling datasets and model complexity. However, to explore self-supervised pre-training on larger datasets and with deeper state-of-the-art model architectures, more computational resources are required.
Additional Information
Master Thesis Defense
Organized by
Prof. Klaus Obermayer & Prof. Henning Sprekeler / Lisa Velenosi
Location: online via Zoom - please send an email to graduateprograms@bccn-berlin.de for access