115 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2021 Conference article Open Access

TSXor: a simple time series compression algorithm
Bruno A., Nardini F. M., Pibiri G. E., Trani R., Venturini R.
Time series are ubiquitous in computing as a key ingredient of many machine learning analytics, ranging from classification to forecasting. Typically, the training of such machine learning algorithms on time series requires to access the data in temporal order for several times. Therefore, a compression algorithm providing good compression ratios and fast decompression speed is desirable. In this paper, we present TSXor, a simple yet effective lossless compressor for time series. The main idea is to exploit the redundancy/similarity between close-in-time values through a window that acts as a cache, as to improve the compression ratio and decompression speed. We show that TSXor achieves up to 3× better compression and up to 2× faster decompression than the state of the art on real-world datasets.Source: SPIRE 2021 - International Symposium on String Processing and Information Retrieval, pp. 217–223, France, Lille (Virtual Event), 04/10/2021-06/10/2021
DOI: 10.1007/978-3-030-86692-1_18

2021 Journal article Restricted

Efficient traversal of decision tree ensembles with FPGAs
Molina R., Loor F., Gil-costa V., Nardini F. M., Perego R., Trani S.
System-on-Chip (SoC) based Field Programmable Gate Arrays (FPGAs) provide a hardware acceleration technology that can be rapidly deployed and tuned, thus providing a flexible solution adaptable to specific design requirements and to changing demands. In this paper, we present three SoC architecture designs for speeding-up inference tasks based on machine learned ensembles of decision trees. We focus on QuickScorer, the state-of-the-art algorithm for the efficient traversal of tree ensembles and present the issues and the advantages related to its deployment on two SoC devices with different capacities. The results of the experiments conducted using publicly available datasets show that the solution proposed is very efficient and scalable. More importantly, it provides almost constant inference times, independently of the number of trees in the model and the number of instances to score. This allows the SoC solution deployed to be fine tuned on the basis of the accuracy and latency constraints of the application scenario considered.Source: Journal of parallel and distributed computing (Print) 155 (2021): 38–49. doi:10.1016/j.jpdc.2021.04.008
DOI: 10.1016/j.jpdc.2021.04.008

2021 Journal article Restricted

Adaptive utterance rewriting for conversational search
Mele I., Muntean C. I., Nardini F. M., Perego R., Tonellotto N., Frieder O.
In a conversational context, a user converses with a system through a sequence of natural-language questions, i.e., utterances. Starting from a given subject, the conversation evolves through sequences of user utterances and system replies. The retrieval of documents relevant to an utterance is difficult due to informal use of natural language in speech and the complexity of understanding the semantic context coming from previous utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing in order: (i) automatic utterance understanding and rewriting, (ii) first-stage retrieval of candidate passages for the rewritten utterances, and (iii) neural re-ranking of candidate passages. By understanding the conversational context, we propose adaptive utterance rewriting strategies based on the current utterance and the dialogue evolution of the user with the system. A classifier identifies those utterances lacking context information as well as the dependencies on the previous utterances. Experimentally, we evaluate the proposed architecture in terms of traditional information retrieval metrics at small cutoffs. Results demonstrate the effectiveness of our techniques, achieving an improvement up to 0.6512 for P@1 and 0.4484 for nDCG@3 w.r.t. the CAsT baseline.Source: Information processing & management 58 (2021). doi:10.1016/j.ipm.2021.102682
DOI: 10.1016/j.ipm.2021.102682
Project(s): BigDataGrapes

2021 Journal article Restricted

Neural network quantization in federated learning at the edge
Tonellotto N., Gotta A., Nardini F. M., Gadler D., Silvestri F.
The massive amount of data collected in the Internet of Things (IoT) asks for effective, intelligent analytics. A recent trend supporting the use of Artificial Intelligence (AI) solutions in IoT domains is to move the computation closer to the data, i.e., from cloud-based services to edge devices. Federated learning (FL) is the primary approach adopted in this scenario to train AI-based solutions. In this work, we investigate the introduction of quantization techniques in FL to improve the efficiency of data exchange between edge servers and a cloud node. We focus on learning recurrent neural network models fed by edge data producers using the most widely adopted neural networks for time-series prediction. Experiments on public datasets show that the proposed quantization techniques in FL reduces up to 19× the volume of data exchanged between each edge server and a cloud node, with a minimal impact of around 5% on the test loss of the final model.Source: Information sciences 575 (2021): 417–436. doi:10.1016/j.ins.2021.06.039
DOI: 10.1016/j.ins.2021.06.039
Project(s): BigDataGrapes

2021 Journal article Open Access

Fast filtering of search results sorted by attribute
Nardini F. M., Trani R., Venturini R.
Modern search services often provide multiple options to rank the search results, e.g., sort "by relevance", "by price" or "by discount" in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this paper, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ?-Filtering, which, given an allowed approximation error ?, finds a (1-?)-optimal filtering, i.e., the relevance of its solution is at least (1-?) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ?-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ?-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.Source: ACM transactions on information systems 40 (2021). doi:10.1145/3477982
DOI: 10.1145/3477982

See at: ISTI Repository | CNR ExploRA

2021 Conference article Open Access

Learning early exit strategies for additive ranking ensembles
Busolin F., Lucchese C., Nardini F. M., Orlando S., Perego R., Trani S.
Modern search engine ranking pipelines are commonly based on large machine-learned ensembles of regression trees. We propose LEAR, a novel - learned - technique aimed to reduce the average number of trees traversed by documents to accumulate the scores, thus reducing the overall query response time. LEAR exploits a classifier that predicts whether a document can early exit the ensemble because it is unlikely to be ranked among the final top-k results. The early exit decision occurs at a sentinel point, i.e., after having evaluated a limited number of trees, and the partial scores are exploited to filter out non-promising documents. We evaluate LEAR by deploying it in a production-like setting, adopting a state-of-the-art algorithm for ensembles traversal. We provide a comprehensive experimental evaluation on two public datasets. The experiments show that LEAR has a significant impact on the efficiency of the query processing without hindering its ranking quality. In detail, on a first dataset, LEAR is able to achieve a speedup of 3x without any loss in NDCG@10, while on a second dataset the speedup is larger than 5x with a negligible NDCG@10 loss (< 0.05%).Source: SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2217–2221, Online conference, 11-15/07/ 2021
DOI: 10.1145/3404835.3463088

2020 Journal article Restricted

Leveraging feature selection to detect potential tax fraudsters
Matos T., Macedo J. A., Lettich F., Monteiro J. M., Renso C., Perego R., Nardini F. M.
Tax evasion is any act that knowingly or unknowingly, legally or unlawfully, leads to non-payment or underpayment of tax due. Enforcing the correct payment of taxes by taxpayers is fundamental in maintaining investments that are necessary and benefits a society as a whole. Indeed, without taxes it is not possible to guarantee basic services such as health-care, education, sanitation, transportation, infrastructure, among other services essential to the population. This issue is especially relevant in developing countries such as Brazil. In this work we consider a real-world case study involving the Treasury Office of the State of Ceará (SEFAZ-CE, Brazil), the agency in charge of supervising more than 300,000 active taxpayers companies. SEFAZ-CE maintains a very large database containing vast amounts of information concerning such companies. Its enforcement team struggles to perform thorough inspections on taxpayers accounts as the underlying traditional human-based inspection processes involve the evaluation of countless fraud indicators (i.e., binary features), thus requiring burdensome amounts of time and being potentially prone to human errors. On the other hand, the vast amount of taxpayer information collected by fiscal agencies opens up the possibility of devising novel techniques able to tackle fiscal evasion much more effectively than traditional approaches. In this work we address the problem of using feature selection to select the most relevant binary features to improve the classification of potential tax fraudsters. Finding out possible fraudsters from taxpayer data with binary features presents several challenges. First, taxpayer data typically have features with low linear correlation between themselves. Also, tax frauds may originate from intricate illicit tactics, which in turn requires to uncover non-linear relationships between multiple features. Finally, few features may be correlated with the targeted class. In this work we propose Alicia, a new feature selection method based on association rules and propositional logic with a carefully crafted graph centrality measure that attempts to tackle the above challenges while, at the same time, being agnostic to specific classification techniques. Alicia is structured in three phases: first, it generates a set of relevant association rules from a set of fraud indicators (features). Subsequently, from such association rules Alicia builds a graph, which structure is then used to determine the most relevant features. To achieve this Alicia applies a novel centrality measure we call the Feature Topological Importance. We perform an extensive experimental evaluation to assess the validity of our proposal on four different real-world datasets, where we compare our solution with eight other feature selection methods. The results show that Alicia achieves F-measure scores up to 76.88%, and consistently outperforms its competitors.Source: Expert systems with applications 145 (2020). doi:10.1016/j.eswa.2019.113128
DOI: 10.1016/j.eswa.2019.113128

2020 Conference article Open Access

Dynamic Wi-Fi RSSI normalization in unmapped locations
Kavalionak H., Tosato M., Barsocchi P., Nardini F. M.
With the growing availability of open access WLAN networks, we assisted to the increase of marketing services that are based on the data collected from the WLAN access points. The identification of visitors of a commercial venue using WLAN data is one of the issues to create successful marketing products. One of the ways to separate visitors is to analyse the RSSI of the mobile devices signals coming to various access points at the venue. Nevertheless, the indoor signal distortion makes RSSI based methods unreliable. In this work we propose the algorithm for the WLAN based RSSI normalization in uncontrolled environments. Our approach is based on the two steps, where at first based on the collected data we detect the devices whose RSSI can be taken as a basic one. At the second step the algorithm allows based on the previously detected basic RSSI to normalize the received signal from mobile devices. We provide the analysis of a real dataset of WLAN probes collected in several real commercial venues in Italy.Source: EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 30th March - 2nd April, 2020

See at: ceur-ws.org | CNR ExploRA

2020 Report Open Access

Dynamic hard pruning of neural networks at the edge of the internet
Valerio L., Nardini F. M., Passarella A., Perego R.
Neural Networks (NN), although successfully applied to several Artificial Intelligence tasks, are often unnecessarily over-parametrized. In fog/edge computing, this might make their training prohibitive on resource-constrained devices, contrasting with the current trend of decentralising intelligence from remote data-centres to local constrained devices. Therefore, we investigate the problem of training effective NN models on constrained devices having a fixed, potentially small, memory budget. We target techniques that are both resource-efficient and performance effective while enabling significant network compression. Our technique, called Dynamic Hard Pruning (DynHP), incrementally prunes the network during training, identifying neurons that marginally contribute to the model accuracy. DynHP enables a tunable size reduction of the final neural network and reduces the NN memory occupancy during training. Freed memory is reused by a\emph {dynamic batch sizing} approach to counterbalance the accuracy degradation caused by the hard pruning strategy, improving its convergence and effectiveness. We assess the performance of DynHP through reproducible experiments on two public datasets, comparing them against reference competitors. Results show that DynHP compresses a NN up to times without significant performance drops (up to relative error wrt competitors), reducing up to the training memory occupancySource: IIT TR-21/2020 and ISTI Technical Reports 2020/016, 2020, 2020
DOI: 10.32079/isti-tr-2020/016
Project(s): BigDataGrapes

See at: ISTI Repository | CNR ExploRA

2020 Journal article Restricted

Weighting passages enhances accuracy
Muntean C. I., Nardini F. M., Perego R., Tonellotto N., Frieder O.
We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.Source: ACM transactions on information systems 39 (2020). doi:10.1145/3428687
DOI: 10.1145/3428687

2020 Conference article Open Access

Efficient document re-ranking for transformers by precomputing term representations
Macavaney S., Nardini F. M., Perego R., Tonellotto N., Goharian N., Frieder O.
Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.Source: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58, online, 25-30 July, 2020
DOI: 10.1145/3397271.3401093
Project(s): BigDataGrapes

2020 Conference article Open Access

Training curricula for open domain answer re-ranking
Macavaney S., Nardini F. M., Perego R., Tonellotto N., Goharian N., Frieder O.
DOI: 10.1145/3397271.3401094
Project(s): BigDataGrapes

2020 Conference article Open Access

Expansion via prediction of importance with contextualization
Macavaney S., Nardini F. M., Perego R., Tonellotto N., Goharian N., Frieder O.
The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.Source: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1573–1576, online, 25-30 July, 2020
DOI: 10.1145/3397271.3401262
Project(s): BigDataGrapes

2020 Conference article Open Access

Query-level early exit for additive learning-to-rank ensembles
Lucchese C., Nardini F. M., Orlando S., Perego R., Trani S.
Search engine ranking pipelines are commonly based on large ensembles of machine-learned decision trees. The tight constraints on query response time recently motivated researchers to investigate algorithms to make faster the traversal of the additive ensemble or to early terminate the evaluation of documents that are unlikely to be ranked among the top-k. In this paper, we investigate the novel problem of query-level early exiting, aimed at deciding the profitability of early stopping the traversal of the ranking ensemble for all the candidate documents to be scored for a query, by simply returning a ranking based on the additive scores computed by a limited portion of the ensemble. Besides the obvious advantage on query latency and throughput, we address the possible positive impact on ranking effectiveness. To this end, we study the actual contribution of incremental portions of the tree ensemble to the ranking of the top-k documents scored for a given query. Our main finding is that queries exhibit different behaviors as scores are accumulated during the traversal of the ensemble and that query-level early stopping can remarkably improve ranking quality. We present a reproducible and comprehensive experimental evaluation, conducted on two public datasets, showing that query-level early exiting achieves an overall gain of up to 7.5% in terms of NDCG@10 with a speedup of the scoring process of up to 2.2x.Source: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2033–2036, Online Conference, 25-30 July, 2020
DOI: 10.1145/3397271.3401256
Project(s): BigDataGrapes

2020 Conference article Open Access

Predicting and explaining privacy risk exposure in mobility data
Naretto F., Pellungrini R., Monreale A., Nardini F. M., Musolesi M.
Mobility data is a proxy of different social dynamics and its analysis enables a wide range of user services. Unfortunately, mobility data are very sensitive because the sharing of people's whereabouts may arise serious privacy concerns. Existing frameworks for privacy risk assessment provide tools to identify and measure privacy risks, but they often (i) have high computational complexity; and (ii) are not able to provide users with a justification of the reported risks. In this paper, we propose expert, a new framework for the prediction and explanation of privacy risk on mobility data. We empirically evaluate privacy risk on real data, simulating a privacy attack with a state-of-the-art privacy risk assessment framework. We then extract individual mobility profiles from the data for predicting their risk. We compare the performance of several machine learning algorithms in order to identify the best approach for our task. Finally, we show how it is possible to explain privacy risk prediction on real data, using two algorithms: Shap, a feature importance-based method and Lore, a rule-based method. Overall, expert is able to provide a user with the privacy risk and an explanation of the risk itself. The experiments show excellent performance for the prediction task.Source: DS 2020 - International Conference on Discovery Science, pp. 403–418, Thessaloniki, Greece, October 19-21, 2020
DOI: 10.1007/978-3-030-61527-7_27
Project(s): XAI , SoBigData-PlusPlus

2020 Journal article Restricted

A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems
Melo Junior L., Nardini F. M. Renso C., Trani R., Macedo J. A.
Lenders, such as banks and credit card companies, use credit scoring models to evaluate the potential risk posed by lending money to customers, and therefore to mitigate losses due to bad credit. The profitability of the banks thus highly depends on the models used to decide on the customer's loans. State-of-the-art credit scoring models are based on machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with ensemble methods and preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the local competence of each base classifier. Then, the technique selects only competent classifiers to predict the query sample. In this paper, we evaluate the suitability of dynamic selection techniques for credit scoring problem, and we present Reduced Minority k-Nearest Neighbors (RMkNN), an approach that enhances state of the art in defining the local region of dynamic selection techniques for imbalanced credit scoring datasets. This proposed technique has a superior prediction performance in imbalanced credit scoring datasets compared to state of the art. Furthermore, RMkNN does not need any preprocessing or sampling method to generate the dynamic selection dataset (called DSEL). Additionally, we observe an equivalence between dynamic selection and static selection classification. We conduct a comprehensive evaluation of the proposed technique against state-of-the-art competitors on six real-world public datasets and one private one. Experiments show that RMkNN improves the classification performance of the evaluated datasets regarding AUC, balanced accuracy, H-measure, G-mean, F-measure, and Recall.Source: Expert systems with applications 152 (2020). doi:10.1016/j.eswa.2020.113351
DOI: 10.1016/j.eswa.2020.113351
Project(s): MC2020 , BigDataGrapes , MASTER

2020 Conference article Open Access

Topic propagation in conversational search
Mele I., Muntean C. I., Nardini F. M., Perego R., Tonellotto N., Frieder O.
In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system replies. The retrieval of documents relevant to a given utterance in a conversation is challenging due to ambiguity of natural language and to the difficulty of detecting possible topic shifts and semantic relationships among utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing: (i) topic-aware utterance rewriting, (ii) retrieval of candidate passages for the rewritten utterances, and (iii) neural-based re-ranking of candidate passages. We present a comprehensive experimental evaluation of the architecture assessed in terms of traditional IR metrics at small cutoffs. Experimental results show the effectiveness of our techniques that achieve an improvement of up to $0.28$ (+93%) for P@1 and $0.19$ (+89.9%) for nDCG@3 w.r.t. the CAsT baseline.Source: SIGIR 2020 - 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2057–2060, Online Conference, July 25-30, 2020
DOI: 10.1145/3397271.3401268
Project(s): BigDataGrapes

2020 Journal article Open Access

RankEval: Evaluation and investigation of ranking models
Lucchese C., Muntean C. I., Nardini F. M., Perego R., Trani S.
RankEval is a Python open-source tool for the analysis and evaluation of ranking models based on ensembles of decision trees. Learning-to-Rank (LtR) approaches that generate tree-ensembles are considered the most effective solution for difficult ranking tasks and several impactful LtR libraries have been developed aimed at improving ranking quality and training efficiency. However, these libraries are not very helpful in terms of hyper-parameters tuning and in-depth analysis of the learned models, and even the implementation of most popular Information Retrieval (IR) metrics differ among them, thus making difficult to compare different models. RankEval overcomes these limitations by providing a unified environment where to perform an easy, comprehensive inspection and assessment of ranking models trained using different machine learning libraries. The tool focuses on ensuring efficiency, flexibility and extensibility and is fully interoperable with most popular LtR libraries.Source: Softwarex (Amsterdam) 12 (2020). doi:10.1016/j.softx.2020.100614
DOI: 10.1016/j.softx.2020.100614
Project(s): BigDataGrapes

2020 Conference article Embargo

High-quality prediction of tourist movements using temporal trajectories in graphs
Moghtasedi S., Muntean C. I., Nardini F. M., Grossi R., Marino A.
In this paper, we study the problem of predicting the next position of a tourist given his history. In particular, we propose a model to identify the next point of interest that a tourist will visit in the future, by making use of similarity between trajectories on a graph and taking into account the spatial-temporal aspect of trajectories. We compare our method with a well-known machine learning-based technique, as well as with a popularity baseline, using three public real-world datasets. Our experimental results show that our technique outperforms state-of-the-art machine learning-based methods effectively, by providing at least twice more accurate results.Source: ASONAM 2020 - The 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 348–352, Online conference, 7-10/12/2020
DOI: 10.1109/asonam49781.2020.9381450

2020 Conference article Restricted

Prediction and explanation of privacy risk on mobility data with neural networks
Naretto F., Pellungrini R., Nardini F. M., Giannotti F.
The analysis of privacy risk for mobility data is a fundamental part of any privacy-aware process based on such data. Mobility data are highly sensitive. Therefore, the correct identification of the privacy risk before releasing the data to the public is of utmost importance. However, existing privacy risk assessment frameworks have high computational complexity. To tackle these issues, some recent work proposed a solution based on classification approaches to predict privacy risk using mobility features extracted from the data. In this paper, we propose an improvement of this approach by applying long short-term memory (LSTM) neural networks to predict the privacy risk directly from original mobility data. We empirically evaluate privacy risk on real data by applying our LSTM-based approach. Results show that our proposed method based on a LSTM network is effective in predicting the privacy risk with results in terms of F1 of up to 0.91. Moreover, to explain the predictions of our model, we employ a state-of-the-art explanation algorithm, Shap. We explore the resulting explanation, showing how it is possible to provide effective predictions while explaining them to the end-user.Source: ECML PKDD 2020 Workshops, pp. 501–516, Ghent, Belgium, 14-18/10/2020
DOI: 10.1007/978-3-030-65965-3_34
Project(s): HumanE-AI-Net , XAI , SoBigData-PlusPlus