Messina N., Amato G. Carrara F., Gennaro C., Falchi F.
Deep learning demonstrated major abilities in solving many kinds of different real-world problems in computer vision literature. However, they are still strained by simple reasoning tasks that humans consider easy to solve. In this work, we probe current state-of-the-art convolutional neural networks on a difficult set of tasks known as the same-different problems. All the problems require the same prerequisite to be solved correctly: understanding if two random shapes inside the same image are the same or not. With the experiments carried out in this work, we demonstrate that residual connections, and more generally the skip connections, seem to have only a marginal impact on the learning of the proposed problems. In particular, we experiment with DenseNets, and we examine the contribution of residual and recurrent connections in already tested architectures, ResNet-18, and CorNet-S respectively. Our experiments show that older feed-forward networks, AlexNet and VGG, are almost unable to learn the proposed problems, except in some specific scenarios. We show that recently introduced architectures can converge even in the cases where the important parts of their architecture are removed. We finally carry out some zero-shot generalization tests, and we discover that in these scenarios residual and recurrent connections can have a stronger impact on the overall test accuracy. On four difficult problems from the SVRT dataset, we can reach state-of-the-art results with respect to the previous approaches, obtaining super-human performances on three of the four problems.
Fleuret, F., Li, T., Dubout, C., Wampler, E.K., Yantis, S., Geman, D., 2011. Comparing machines and humans on a visual categorization test. Proceedings of the National Academy of Sciences 108, 17621-17625.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: Proceedings of the IEEE CVPR, pp. 770-778.
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., Girshick, R., 2017a. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning, in: Proceedings of IEEE CVPR, pp. 2901-2910.
Johnson, J., Hariharan, B., Van Der Maaten, L., Ho man, J., Fei-Fei, L., Lawrence Zitnick, C., Girshick, R., 2017b. Inferring and executing programs for visual reasoning, in: Proceedings of IEEE CVPR, pp. 2989-2998.
Kar, K., Kubilius, J., Schmidt, K., Issa, E.B., DiCarlo, J.J., 2019. Evidence that recurrent circuits are critical to the ventral stream's execution of core object recognition behavior. Nature neuroscience 22, 974-983.
Kim, J., Ricci, M., Serre, T., 2018. Not-so-CLEVR: Visual relations strain feedforward neural networks, in: International Conference on Learning Representations (ICLR).
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, pp. 1097-1105.
Kubilius, J., Schrimpf, M., Nayebi, A., Bear, D., Yamins, D.L., DiCarlo, J.J., 2018. Cornet: Modeling the neural mechanisms of core object recognition. BioRxiv , 408385.
LeCun, Y., Bottou, L., Bengio, Y., Ha ner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278-2324.
Liu, S., Deng, W., 2015. Very deep convolutional neural network based image classification using small training sample size, in: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), IEEE. pp. 730-734.
Mascharka, D., Tran, P., Soklaski, R., Majumdar, A., 2018. Transparency by design: Closing the gap between performance and interpretability in visual reasoning, in: Proceedings of IEEE CPVR, pp. 4942-4950.
Messina, N., Amato, G., Carrara, F., Falchi, F., Gennaro, C., 2019a. Learning relationship-aware visual features, in: Proceedings of the European Conference on Computer Vision (ECCV), pp. 486-501.
Messina, N., Amato, G., Carrara, F., Falchi, F., Gennaro, C., 2019b. Learning visual features for relational cbir. International Journal of Multimedia Information Retrieval , 1-12.
Messina, N., Amato, G., Carrara, F., Falchi, F., Gennaro, C., 2019c. Testing deep neural networks on the same-di erent task, in: 2019 International Conference on Content-Based Multimedia Indexing (CBMI), IEEE. pp. 1-6.
Santoro, A., Hill, F., Barrett, D., Morcos, A., Lillicrap, T., 2018. Measuring abstract reasoning in neural networks, in: International Conference on Machine Learning, pp. 4477-4486.
Santoro, A., Raposo, D., Barrett, D.G., Malinowski, M., Pascanu, R., Battaglia, P., Lillicrap, T., 2017. A simple neural network module for relational reasoning, in: Advances in neural information processing systems, pp. 4967-4976.
Stabinger, S., Rodr´ıguez-Sa´nchez, A., Piater, J., 2016. 25 years of cnns: Can we compare to human abstraction capabilities?, in: International Conference on Artificial Neural Networks, Springer. pp. 380-387.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., 2015. Going deeper with convolutions, in: Proceedings of IEEE CVPR, pp. 1-9.
Yang, G.R., Ganichev, I., Wang, X.J., Shlens, J., Sussillo, D., 2018. A dataset and architecture for visual reasoning with a working memory, in: European Conference on Computer Vision, Springer. pp. 729-745.
Zhang, C., Gao, F., Jia, B., Zhu, Y., Zhu, S.C., 2019. Raven: A dataset for relational and analogical visual reasoning, in: Proceedings of IEEE CVPR, pp. 5317-5327.