2020
Journal article  Open Access

Virtual to real adaptation of pedestrian detectors

Ciampi L., Messina N., Falchi F., Gennaro C., Amato G.

pedestrian detection  FOS: Electrical engineering  Article  Instrumentation  Electrical Engineering and Systems Science - Image and Video Processing  Biochemistry  Deep learning  Atomic and Molecular Physics  deep learning  Electrical and Electronic Engineering  Analytical Chemistry  Computer Vision and Pattern Recognition (cs.CV)  FOS: Computer and information sciences  Domain adaptation  convolutional neural networks  Machine Learning (cs.LG)  information engineering  synthetic datasets  Computer Science - Machine Learning  Image and Video Processing (eess.IV)  and Optics  Pedestrian detection  electronic engineering  Synthetic datasets  Convolutional neural networks  domain adaptation  Computer Science - Computer Vision and Pattern Recognition 

Pedestrian detection through Computer Vision is a building block for a multitude of applications. Recently, there has been an increasing interest in convolutional neural network-based architectures to execute such a task. One of these supervised networks' critical goals is to generalize the knowledge learned during the training phase to new scenarios with different characteristics. A suitably labeled dataset is essential to achieve this purpose. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is costly. To this end, we introduce ViPeD (Virtual Pedestrian Dataset), a new synthetically generated set of images collected with the highly photo-realistic graphical engine of the video game GTA V (Grand Theft Auto V), where annotations are automatically acquired. However, when training solely on the synthetic dataset, the model experiences a Synthetic2Real domain shift leading to a performance drop when applied to real-world images. To mitigate this gap, we propose two different domain adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection. Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data, exploiting the variety of our synthetic dataset. Furthermore, we demonstrate that with our domain adaptation techniques, we can reduce the Synthetic2Real domain shift, making the two domains closer and obtaining a performance improvement when testing the network over the real-world images.

Source: Sensors (Basel) 20 (2020). doi:10.3390/s20185250

Publisher: Molecular Diversity Preservation International (MDPI),, Basel


[1] J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248{255.
[2] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollar, C. L. Zitnick, Microsoft COCO: common objects in context, CoRR abs/1405.0312 (2014).
[3] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, R. Cucchiara, Learning to detect and track visible and occluded body joints in a virtual world, in: European Conference on Computer Vision (ECCV).
[4] G. Amato, L. Ciampi, F. Falchi, C. Gennaro, N. Messina, Learning pedestrian detection from virtual worlds, in: E. Ricci, S. Rota Bulo, C. Snoek, O. Lanz, S. Messelodi, N. Sebe (Eds.), Image Analysis and Processing { ICIAP 2019, Springer International Publishing, Cham, 2019, pp. 302{312.
[5] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, in: Advances in Neural Information Processing Systems 28, Curran Associates, Inc., 2015, pp. 91{99.
[6] A. Milan, L. Leal-Taixe, I. D. Reid, S. Roth, K. Schindler, MOT16: A benchmark for multi-object tracking, CoRR abs/1603.00831 (2016).
[7] P. Dendorfer, H. Rezato ghi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, L. Leal-Taixe, Cvpr19 tracking and detection challenge: How crowded can it get?, arXiv preprint arXiv:1906.04567 (2019).
[8] R. Benenson, M. Omran, J. Hosang, B. Schiele, Ten years of pedestrian detection, what have we learned?, in: Computer Vision - ECCV 2014 Workshops, Springer International Publishing, Cham, 2015, pp. 613{ 627.
[9] S. Zhang, C. Bauckhage, A. B. Cremers, Informed haar-like features improve pedestrian detection, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] S. Zhang, R. Benenson, B. Schiele, Filtered channel features for pedestrian detection, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1751{1760.
[11] S. Zhang, R. Benenson, M. Omran, J. Hosang, B. Schiele, How far are we from solving pedestrian detection?, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] W. Nam, P. Dollar, J. H. Han, Local decorrelation for improved pedestrian detection, in: Advances in Neural Information Processing Systems 27, Curran Associates, Inc., 2014, pp. 424{432.
[13] Y. Tian, P. Luo, X. Wang, X. Tang, Deep learning strong parts for pedestrian detection, in: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1904{1912.
[14] F. Yang, W. Choi, Y. Lin, Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classi ers, in: 2016 IEEE CVPR, pp. 2129{2137.
[15] Z. Cai, Q. Fan, R. S. Feris, N. Vasconcelos, A uni ed multi-scale deep convolutional neural network for fast object detection, in: Computer Vision { ECCV 2016, Springer International Publishing, Cham, 2016, pp. 354{370.
[16] P. Sermanet, K. Kavukcuoglu, S. Chintala, Y. Lecun, Pedestrian detection with unsupervised multi-stage feature learning, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Y. Lecun, L. Bottou, Y. Bengio, P. Ha ner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (1998) 2278{2324.
[18] P. Dollar, C. Wojek, B. Schiele, P. Perona, Pedestrian detection: An evaluation of the state of the art, IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (2012) 743{761.
[19] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), volume 1, pp. 886{893 vol. 1.
[20] S. Zhang, R. Benenson, B. Schiele, Citypersons: A diverse dataset for pedestrian detection, CoRR abs/1702.05693 (2017).
[21] B. Kaneva, A. Torralba, W. T. Freeman, Evaluation of image features using a photorealistic virtual world, in: 2011 International Conference on Computer Vision, pp. 2282{2289.
[22] J. Marn, D. Vzquez, D. Gernimo, A. M. Lpez, Learning appearance in virtual scenarios for pedestrian detection, in: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 137{144.
[23] D. Vazquez, A. M. Lopez, D. Ponsa, Unsupervised domain adaptation of virtual and real worlds for pedestrian detection, in: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 3492{3495.
[24] D. Vzquez, A. M. Lpez, J. Marn, D. Ponsa, D. Gernimo, Virtual and real world adaptation for pedestrian detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (2014) 797{809.
[25] M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, R. Vasudevan, Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?, CoRR abs/1610.01983 (2016).
[26] E. Bochinski, V. Eiselein, T. Sikora, Training a convolutional neural network for multi-class object detection using solely virtual world data, in: Advanced Video and Signal Based Surveillance (AVSS), 2016 13th IEEE International Conference on, IEEE, pp. 278{285.
[27] L. Leal-Taixe, A. Milan, I. D. Reid, S. Roth, K. Schindler, Motchallenge 2015: Towards a benchmark for multi-target tracking, CoRR abs/1504.01942 (2015).
[28] J. Redmon, A. Farhadi, Yolov3: An incremental improvement, CoRR abs/1804.02767 (2018).
[29] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, A. M. Lopez, The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] G. Ros, S. Stent, P. F. Alcantarilla, T. Watanabe, Training constrained deconvolutional networks for road scene semantic segmentation, CoRR abs/1604.01545 (2016).
[31] F. Yu, W. Li, Q. Li, Y. Liu, X. Shi, J. Yan, POI: multiple object tracking with high performance detection and appearance feature, CoRR abs/1610.06136 (2016).
[32] C. Lin, J. Lu, G. Wang, J. Zhou, Graininess-aware deep feature learning for pedestrian detection, in: The European Conference on Computer Vision (ECCV).

Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:431470,
	title = {Virtual to real adaptation of pedestrian detectors},
	author = {Ciampi L. and Messina N. and Falchi F. and Gennaro C. and Amato G.},
	publisher = {Molecular Diversity Preservation International (MDPI),, Basel },
	doi = {10.3390/s20185250 and 10.48550/arxiv.2001.03032},
	journal = {Sensors (Basel)},
	volume = {20},
	year = {2020}
}

AI4EU
A European AI On Demand Platform and Ecosystem


OpenAIRE