Karolina Stosio, BCCN Berlin / TU Berlin

What do Deep Neural Networks see?

State-of-the-art deep convolutional neural networks (DNNs) rival human performance in complex natural object recognition tasks and display rough architectural and representational similarities to the ventral stream of primates. Yet, the processing underlying the classification of the image content is still not well understood. In particular, it is not clear if DNNs utilise the same visual cues as humans. The following thesis aims at providing a better understanding of 'what do the deep networks actually see' by presenting results of a series of experiments testing DNNs robustness in evaluating stimuli with a range of alterations complicating the task.
Firstly, we find that DNNs are not robust against occlusions. Conversely, they perform well on cropped stimuli, being able to correctly recognise patches as small as 3.4x3.4 px. Moreover, DNNs experience a human-like recognition gap: small information loss at the level of minimal images makes the stimuli unrecognisable. In addition, restricting DNNs to have small receptive fields makes them underperform on silhouettes. It appears, that DNNs rely to a high extent on local, textural cues when recognising natural images. Consequently, integration of the global information makes DNNs robust in shape recognition.

Additional Information

Thesis defence in the international master program "Computational Neuroscience"

Organized by

Klaus Obermayer / Robert Martin

Go back