Page 1 of 1

2020 Journal article Open Access

Crime and its fear in social media
Prieto Curiel R., Cresci S., Muntean C. I., Bishop S. R.
Social media posts incorporate real-time information that has, elsewhere, been exploited to predict social trends. This paper considers whether such information can be useful in relation to crime and fear of crime. A large number of tweets were collected from the 18 largest Spanish-speaking countries in Latin America, over a period of 70 days. These tweets are then classified as being crime-related or not and additional information is extracted, including the type of crime and where possible, any geo-location at a city level. From the analysis of collected data, it is established that around 15 out of every 1000 tweets have text related to a crime, or fear of crime. The frequency of tweets related to crime is then compared against the number of murders, the murder rate, or the level of fear of crime as recorded in surveys. Results show that, like mass media, such as newspapers, social media suffer from a strong bias towards violent or sexual crimes. Furthermore, social media messages are not highly correlated with crime. Thus, social media is shown not to be highly useful for detecting trends in crime itself, but what they do demonstrate is rather a reflection of the level of the fear of crime.Source: Palgrave communications 6 (2020). doi:10.1057/s41599-020-0430-7
DOI: 10.1057/s41599-020-0430-7
Project(s): CIMPLEX via OpenAIRE

, SoBigData via OpenAIRE

Metrics:

2020 Journal article Open Access

Human migration: the big data perspective
Sîrbu A., Andrienko G., Andrienko N., Boldrini C., Conti M., Giannotti F., Guidotti R., Bertoli S., Kim J., Muntean C. I., Pappalardo L., Passarella A., Pedreschi D., Pollacci L., Pratesi F., Sharma R.
How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants.Source: International Journal of Data Science and Analytics (Online) 11 (2020): 341–360. doi:10.1007/s41060-020-00213-5
DOI: 10.1007/s41060-020-00213-5
Project(s): SoBigData via OpenAIRE

Metrics:

2020 Conference article Open Access

Topic propagation in conversational search
Mele I., Muntean C. I., Nardini F. M., Perego R., Tonellotto N., Frieder O.
In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system replies. The retrieval of documents relevant to a given utterance in a conversation is challenging due to ambiguity of natural language and to the difficulty of detecting possible topic shifts and semantic relationships among utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing: (i) topic-aware utterance rewriting, (ii) retrieval of candidate passages for the rewritten utterances, and (iii) neural-based re-ranking of candidate passages. We present a comprehensive experimental evaluation of the architecture assessed in terms of traditional IR metrics at small cutoffs. Experimental results show the effectiveness of our techniques that achieve an improvement of up to $0.28$ (+93%) for P@1 and $0.19$ (+89.9%) for nDCG@3 w.r.t. the CAsT baseline.Source: SIGIR 2020 - 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2057–2060, Online Conference, July 25-30, 2020
DOI: 10.1145/3397271.3401268
DOI: 10.48550/arxiv.2004.14054
Project(s): BigDataGrapes via OpenAIRE

Metrics:

2020 Journal article Restricted

Weighting passages enhances accuracy
Muntean C. I., Nardini F. M., Perego R., Tonellotto N., Frieder O.
We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.Source: ACM transactions on information systems 39 (2020). doi:10.1145/3428687
DOI: 10.1145/3428687
Metrics:

See at: ACM Transactions on Information Systems Restricted | CNR ExploRA

2020 Journal article Open Access

RankEval: Evaluation and investigation of ranking models
Lucchese C., Muntean C. I., Nardini F. M., Perego R., Trani S.
RankEval is a Python open-source tool for the analysis and evaluation of ranking models based on ensembles of decision trees. Learning-to-Rank (LtR) approaches that generate tree-ensembles are considered the most effective solution for difficult ranking tasks and several impactful LtR libraries have been developed aimed at improving ranking quality and training efficiency. However, these libraries are not very helpful in terms of hyper-parameters tuning and in-depth analysis of the learned models, and even the implementation of most popular Information Retrieval (IR) metrics differ among them, thus making difficult to compare different models. RankEval overcomes these limitations by providing a unified environment where to perform an easy, comprehensive inspection and assessment of ranking models trained using different machine learning libraries. The tool focuses on ensuring efficiency, flexibility and extensibility and is fully interoperable with most popular LtR libraries.Source: Softwarex (Amsterdam) 12 (2020). doi:10.1016/j.softx.2020.100614
DOI: 10.1016/j.softx.2020.100614
Project(s): BigDataGrapes via OpenAIRE

Metrics:

See at: SoftwareX Open Access | ISTI Repository | SoftwareX | www.sciencedirect.com | CNR ExploRA

2020 Conference article Embargo

High-quality prediction of tourist movements using temporal trajectories in graphs
Moghtasedi S., Muntean C. I., Nardini F. M., Grossi R., Marino A.
In this paper, we study the problem of predicting the next position of a tourist given his history. In particular, we propose a model to identify the next point of interest that a tourist will visit in the future, by making use of similarity between trajectories on a graph and taking into account the spatial-temporal aspect of trajectories. We compare our method with a well-known machine learning-based technique, as well as with a popularity baseline, using three public real-world datasets. Our experimental results show that our technique outperforms state-of-the-art machine learning-based methods effectively, by providing at least twice more accurate results.Source: ASONAM 2020 - The 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 348–352, Online conference, 7-10/12/2020
DOI: 10.1109/asonam49781.2020.9381450
Metrics:

See at: ieeexplore.ieee.org Restricted | xplorestaging.ieee.org | CNR ExploRA

2020 Journal article Open Access

(So) Big Data and the transformation of the city
Andrienko G., Andrienko N., Boldrini C., Caldarelli G., Cintia P., Cresci S., Facchini A., Giannotti F., Gionis A., Guidotti R., Mathioudakis M., Muntean C. I., Pappalardo L., Pedreschi D., Pournaras E., Pratesi F., Tesconi M., Trasarti R.
The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the "City of Citizens" thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality.Source: International Journal of Data Science and Analytics (Print) 1 (2020). doi:10.1007/s41060-020-00207-3
DOI: 10.1007/s41060-020-00207-3
Project(s): SoBigData via OpenAIRE

Metrics: