Lagani G., Falchi F., Gennaro C., Amato G.
Computer Vision and Pattern Recognition (cs.CV) Convolution FOS: Computer and information sciences Attention Neural Networks Video Activity Recognition Computer Vision Deep Learning Computer Science - Computer Vision and Pattern Recognition
In this paper, we introduce a deep learning solution for video activity recognition that leverages an innovative combination of convolutional layers with a linear-complexity attention mechanism. Moreover, we introduce a novel quantization mechanism to further improve the efficiency of our model during both training and inference. Our model maintains a reduced computational cost, while preserving robust learning and generalization capabilities. Our approach addresses the issues related to the high computing requirements of current models, with the goal of achieving competitive accuracy on consumer and edge devices, enabling smart home and smart healthcare applications where efficiency and privacy issues are of concern. We experimentally validate our model on different established and publicly available video activity recognition benchmarks, improving accuracy over alternative models at a competitive computing cost.
Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15633, pp. 235-251. Milan, Italy, 29/09/2024
Publisher: Springer Science and Business Media Deutschland GmbH
@inproceedings{oai:iris.cnr.it:20.500.14243/552088,
title = {CA3D: Convolutional-Attentional 3D nets for efficient video activity recognition on the edge},
author = {Lagani G. and Falchi F. and Gennaro C. and Amato G.},
publisher = {Springer Science and Business Media Deutschland GmbH},
doi = {10.1007/978-3-031-91979-4_18 and 10.48550/arxiv.2505.19928},
booktitle = {LECTURE NOTES IN COMPUTER SCIENCE, vol. 15633, pp. 235-251. Milan, Italy, 29/09/2024},
year = {2025}
}