Koutroumanis N., Doulkeridis C., Renso C., Nanni M., Perego R.
Trajectories Data formats Parquet Data lakes
Columnar data formats, such as Apache Parquet, are increasingly popular nowadays for scalable data storage and querying data lakes, due to compressed storage and efficient data access via data skipping. However, when applied to spatial or spatio-temporal data, advanced solutions are required to go beyond pruning over single attributes and towards multidimensional pruning. Even though there exist solutions for geospatial data, such as GeoParquet and SpatialParquet, they fall short when applied to trajectory data (sequences of spatio-temporal positions). In this paper, we propose TrajParquet, a format for columnar storage of trajectory data, which is highly efficient and scalable. Also, we present a query processing algorithm that supports spatio-temporal range queries over TrajParquet. We evaluate TrajParquet using real-world data sets and in comparison with extensions of GeoParquet and SpatialParquet, suitable for handling spatio-temporal data.
Source: SIGSPATIAL '23 - 31st ACM International Conference on Advances in Geographic Information Systems, pp. 73:1–73:4, 13-16/11/2023
@inproceedings{oai:it.cnr:prodotti:491195, title = {TrajParquet: a trajectory-oriented column file format for mobility data lakes}, author = {Koutroumanis N. and Doulkeridis C. and Renso C. and Nanni M. and Perego R.}, doi = {10.1145/3589132.3625623}, booktitle = {SIGSPATIAL '23 - 31st ACM International Conference on Advances in Geographic Information Systems, pp. 73:1–73:4, 13-16/11/2023}, year = {2023} }
SoBigData-PlusPlus
SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics