26 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2026 Journal article Open Access OPEN
Projection-displacement-based query performance prediction for embedded space of dense retrievers
Datta Suchana, Faggioli Guglielmo, Ferro Nicola, Ganguly Debasis, Muntean Cristina Ioana, Perego Raffaele, Tonellotto Nicola
Recent advances in representation learning have enabled neural Information Retrieval (IR) systems to use learned dense representations for queries and documents to effectively handle semantics, language nuances, and vocabulary mismatch problems. In contrast to traditional IR systems that rely on word matching, dense IR models exploit query/document similarity in dense latent spaces to account for semantics. This requires substantial training data and comes with increased computational demands. Thus, it would be beneficial to predict how a system will perform for a given query to decide whether a dense IR model is the best option or alternatives should be used. Traditional Query Performance Prediction (QPP) models are designed for lexical IR approaches and perform sub-optimally when applied to dense neural IR systems. Therefore, there has been a renewed interest in QPP methods to improve their effectiveness for dense neural IR models. While the results of the new QPP methods are generally encouraging, there is ample room for improvement in absolute performance and stability. We argue that by using features more aligned with the underlying rationale of dense IR models, we can enhance the performance of QPP. In this respect, we propose the Projection-Displacement-Based QPP (PDQPP), which exploits the geometric properties of dense IR models, projects queries and retrieved documents onto subspaces defined by pseudo-relevant documents, and considers changes in retrieval scores within them as a proxy for retrieval coherence. Minor score changes suggest robust and coherent retrieval, while significant alterations indicate semantic divergence and potentially poor performance. Results over a wide range of experimental settings on both traditional (TREC Robust) and neural-oriented (TREC Deep Learning) test collections show that PDQPP mostly outperforms the state-of-the-art QPP baselines.Source: ACM TRANSACTIONS ON INFORMATION SYSTEMS, vol. 44 (issue 1), pp. 1-30
DOI: 10.1145/3765617
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | ACM Transactions on Information Systems Restricted | CNR IRIS Restricted


2025 Journal article Open Access OPEN
ChatGPT versus modest large language models: an extensive study on benefits and drawbacks for conversational search
Rocchietti G., Rulli C., Nardini F. M., Muntean Cristina Ioana, Perego R., Frieder O.
Large Language Models (LLMs) are effective in modeling text syntactic and semantic content, making them a strong choice to perform conversational query rewriting. While previous approaches proposed NLP-based custom models, requiring significant engineering effort, our approach is straightforward and conceptually simpler. Not only do we improve effectiveness over the current state-of-the-art, but we also curate the cost and efficiency aspects. We explore the use of pre-trained LLMs fine-tuned to generate quality user query rewrites, aiming to reduce computational costs while maintaining or improving retrieval effectiveness. As a first contribution, we study various prompting approaches - including zero, one, and few-shot methods - with ChatGPT (e.g., gpt-3.5-turbo). We observe an increase in the quality of rewrites leading to improved retrieval. We then fine-tuned smaller open LLMs on the query rewriting task. Our results demonstrate that our fine-tuned models, including the smallest with 780 million parameters, achieve better performance during the retrieval phase than gpt-3.5-turbo. To fine-tune the selected models, we used the QReCC dataset, which is specifically designed for query rewriting tasks. For evaluation, we used the TREC CAsT datasets to assess the retrieval effectiveness of the rewrites of both gpt-3.5-turbo and our fine-tuned models. Our findings show that fine-tuning LLMs on conversational query rewriting datasets can be more effective than relying on generic instruction-tuned models or traditional query reformulation techniques.Source: IEEE ACCESS, vol. 13, pp. 15253-15271
DOI: 10.1109/access.2025.3529741
Metrics:


See at: IEEE Access Open Access | IEEE Access Open Access | CNR IRIS Open Access | ieeexplore.ieee.org Open Access | CNR IRIS Restricted


2025 Other Open Access OPEN
ISTI-day 2025 Proceedings
Del Corso G., Pedrotti A., Federico G., Gennaro C., Carrara F., Amato G., Di Benedetto M., Gabrielli E., Belli D., Matrullo Z., Miori V., Tolomei G., Waheed T., Marchetti E., Calabrò A., Rossetti G., Stella M., Cazabet R., Abramski K., Cau E., Citraro S., Failla A., Mesina V., Morini V., Pansanella V., Colantonio S., Germanese D., Pascali M. A., Bianchi L., Messina N., Falchi F., Barsellotti L., Pacini G., Cassese M., Puccetti G., Esuli A., Volpi L., Moreo A., Sebastiani F., Sperduti G., Nguyen D., Broccia G., Ter Beek M. H., Ferrari A., Massink M., Belmonte G., Ciancia V., Papini O., Canapa G., Catricalà B., Manca M., Paternò F., Santoro C., Zedda E., Gallo S., Maenza S., Mattioli A., Simeoli L., Rucci D., Carlini E., Dazzi P., Kavalionak H., Mordacchini M., Rulli C., Muntean Cristina Ioana, Nardini F. M., Perego R., Rocchietti G., Lettich F., Renso C., Pugliese C., Casini G., Haldimann J., Meyer T., Assante M., Candela L., Dell'Amico A., Frosini L., Mangiacrapa F., Oliviero A., Pagano P., Panichi G., Peccerillo B., Procaccini M., Mannocci A., Manghi P., Lonetti F., Kang D., Di Giandomenico F., Jee E., Lazzini G., Conti F., Scopigno R., D'Acunto M., Moroni D., Cafiso M., Paradisi P., Callieri M., Pavoni G., Corsini M., De Falco A., Sala F., Saraceni Q., Gattiglia G.
ISTI-Day is an annual information and networking event organized by the Institute of Information Science and Technologies "A. Faedo" (ISTI) of the Italian National Research Council (CNR). This event features an opening talk of the Director of the Dept. DIITET (Emilio F. Campana) as well as an overview of the Institute's activities presented by the ISTI Director (Roberto Scopigno). Those institutional segments are complemented by dedicated presentations and round tables featuring former staff members, as well as internal and external collaborators. To foster a network of knowledge and collaboration among newcomers, the 2025 ISTI Day edition also includes a large poster session that provides a comprehensive overview of current research activities. Each of the 13 laboratories contributes 1–3 posters, highlighting the most innovative work and offering early-career researchers a platform for discussion. Thus these proceedings include the posters selected for ISTI-Day 2025, reflecting the diverse and innovative nature of the Institute's research.

See at: CNR IRIS Open Access | www.isti.cnr.it Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
CoSRec: a joint conversational search and recommendation dataset
Alessio M., Merlo S., Di Noia T., Faggioli G., Ferrante M., Ferro N., Muntean Cristina Ioana, Nardini F. M., Narducci F., Perego R., Santucci G., Viterbo N.
Conversational Information Access systems have experienced wide-spread diffusion thanks to the natural and effortless interactionsthey enable with the user. In particular, they represent an effectiveinteraction interface for conversational search (CS) and conversa-tional recommendation (CR) scenarios. Despite their commonali-ties, CR and CS systems are often devised, developed, and evalu-ated as isolated components. Integrating these two elements wouldallow for handling complex information access scenarios, suchas exploring unfamiliar recommended product aspects, enablingricher dialogues, and improving user satisfaction. As of today, thescarce availability of integrated datasets — focused exclusively oneither of the tasks — limits the possibilities for evaluating by-designintegrated CS and CR systems. To address this gap, we proposeCoSRec1, the first dataset for joint Conversational Search and Rec-ommendation (CSR) evaluation. The CoSRec test set includes 20high-quality conversations, with human-made annotations for thequality of conversations, and manually crafted relevance judgmentsfor products and documents. Additionally, we provide supplemen-tary training data comprising partially annotated dialogues and rawconversations to support diverse learning paradigms. CoSRec is the first resource to model CR and CS tasks in a unified framework,enabling the training and evaluation of systems that must shiftbetween answering queries and making suggestions dynamically.DOI: 10.1145/3726302.3730319
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | Padua research Archive (Archivio istituzionale della ricerca - Università di Padova) Restricted | Padua research Archive (Archivio istituzionale della ricerca - Università di Padova) Restricted | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Efficient conversational search via topical locality in dense retrieval
Muntean Cristina Ioana, Nardini F. M., Perego R., Rocchietti G., Rulli C.
Pre-trained language models have been widely exploited to learn dense representations of documents and queries for information retrieval. While previous efforts have primarily focused on improving effectiveness and user satisfaction, response time remains a critical bottleneck of conversational search systems. To address this, we exploit the topical locality inherent in conversational queries, i.e., the tendency of queries within a conversation to focus on related topics. By leveraging query embedding similarities, we dynamically restrict the search space to semantically relevant document clusters, reducing computational complexity without compromising retrieval quality. We evaluate our approach on the TREC CAsT, 2019 and 2020 datasets using multiple embedding models and vector indexes, achieving improvements in processing speed of up to 10.3X with little loss in performance (4.3X without any loss). Our results show that the proposed system effectively handles complex, multi-turn queries with high precision and efficiency, offering a practical solution for real-time conversational search.DOI: 10.1145/3726302.3730186
DOI: 10.48550/arxiv.2504.21507
Project(s): EFRA via OpenAIRE, Future Artificial Intelligence Research” - Spoke 1” Human-centered AI”
Metrics:


See at: arXiv.org e-Print Archive Open Access | dl.acm.org Open Access | CNR IRIS Open Access | doi.org Restricted | doi.org Restricted | Archivio della Ricerca - Università di Pisa Restricted | CNR IRIS Restricted


2024 Patent Restricted
Caching historical embeddings in conversational search
Frieder O., Mele I., Muntean C., Nardini F. M., Perego R., Tonellotto N.
A method and system are described for improving the speed and efficiency of obtaining conversational search results. A user may speak a phrase to perform a conversational search or a series of phrases to perform a series of searches. These spoken phrases may be enriched by context and then converted into a query embedding. A similarity between the query embedding and document embeddings is used to determine the search results including a query cutoff number of documents and a cache cutoff number of documents. A second search phrase may use the cache of documents along with comparisons of the returned documents and the first query embedding to determine the quality of the cache for responding to the second search query. If the results are high-quality then the search may proceed much more rapidly by applying the second query only to the cached documents rather than to the server.

See at: CNR IRIS Restricted | CNR IRIS Restricted


2023 Conference article Restricted
A geometric framework for query performance prediction in conversational search
Faggioli G., Ferro N., Muntean C. I., Perego R., Tonellotto N.
Thanks to recent advances in IR and NLP, the way users interact with search engines is evolving rapidly, with multi-turn conversations replacing traditional one-shot textual queries. Given its interactive nature, Conversational Search (CS) is one of the scenarios that can benefit the most from Query Performance Prediction (QPP) techniques. QPP for the CS domain is a relatively new field and lacks proper framing. In this study, we address this gap by proposing a framework for the application of QPP in the CS domain and use it to evaluate the performance of predictors. We characterize what it means to predict the performance in the CS scenario, where information needs are not independent queries but a series of closely related utterances. We identify three main ways to use QPP models in the CS domain: as a diagnostic tool, as a way to adjust the system's behaviour during a conversation, or as a way to predict the system's performance on the next utterance. Due to the lack of established evaluation procedures for QPP in the CS domain, we propose a protocol to evaluate QPPs for each of the use cases. Additionally, we introduce a set of spatial-based QPP models designed to work the best in the conversational search domain, where dense neural retrieval models are the most common approaches and query cutoffs are typically small. We show how the proposed QPP approaches improve significantly the predictive performance over the state-of-the-art in different scenarios and collections.Source: SIGIR '23 - 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1355–1365, Taipei, Taiwan, 23-27/07/2023
DOI: 10.1145/3539618.3591625
Project(s): SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: dl.acm.org Restricted | CNR ExploRA


2023 Conference article Open Access OPEN
Commonsense injection in conversational systems: an adaptable framework for query expansion
Rocchietti G., Frieder O., Muntean Cristina, Nardini F. M., Perego R.
Recent advancements in conversational agents are leading a paradigm shift in how people search for their information needs, from text queries to entire spoken conversations. This paradigm shift poses a new challenge: a single question may lack the context driven by the entire conversation. We propose and evaluate a framework to deal with multi-turn conversations with the injection of commonsense knowledge. Specifically, we propose a novel approach for conversational search that uses pre-trained large language models and commonsense knowledge bases to enrich queries with relevant concepts. Our framework comprises a generator of candidate concepts related to the context of the conversation and a selector for deciding which candidate concept to add to the current utterance to improve retrieval effectiveness. We use the TREC CAsT datasets and ConceptNet to show that our framework improves retrieval performance by up to 82% in terms of Recall@200 and up to 154% in terms of NDCG@3 as compared to the performance achieved by the original utterances in the conversations.DOI: 10.1109/wi-iat59888.2023.00013
Project(s): EFRA via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ieeexplore.ieee.org Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2023 Conference article Open Access OPEN
Rewriting conversational utterances with instructed large language models
Galimzhanova E, Muntean Ci, Nardini Fm, Perego R, Rocchietti G
Many recent studies have shown the ability of large language models (LLMs) to achieve state-of-the-art performance on many NLP tasks, such as question answering, text summarization, coding, and translation. In some cases, the results provided by LLMs are on par with those of human experts. These models' most disruptive innovation is their ability to perform tasks via zero-shot or few-shot prompting. This capability has been successfully exploited to train instructed LLMs, where reinforcement learning with human feedback is used to guide the model to follow the user's requests directly. In this paper, we investigate the ability of instructed LLMs to improve conversational search effectiveness by rewriting user questions in a conversational setting. We study which prompts provide the most informative rewritten utterances that lead to the best retrieval performance. Reproducible experiments are conducted on publicly-available TREC CAST datasets. The results show that rewriting conversational utterances with instructed LLMs achieves significant improvements of up to 25.2% in MRR, 31.7% in Precision@1, 27% in NDCG@3, and 11.5% in Recall@500 over state-of-the-art techniques.DOI: 10.1109/wi-iat59888.2023.00014
Project(s): EFRA via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ieeexplore.ieee.org Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2023 Conference article Open Access OPEN
A spatial approach to predict performance of conversational search systems
Faggioli G, Ferro N, Muntean C, Perego R, Tonellotto N
Recent advancements in Information Retrieval and Natural Language Processing have led to significant developments in the way users interact with search engines, with traditional one-shot textual queries being replaced by multi-turn conversations. As a highly interactive search scenario, Conversational Search (CS) can significantly benefit from Query Performance Prediction (QPP) techniques. However, the application of QPP in the CS domain is a relatively new field and requires proper framing. This study proposes a set of spatial-based QPP models, designed to work effectively in the conversational search domain, where dense neural retrieval models are the most common approach and query cutoffs are small. The proposed QPP approaches are shown to improve the predictive performance over the state-of-the-art in different scenarios and collections, highlighting the utility of QPP in the CS domain.Source: CEUR WORKSHOP PROCEEDINGS, pp. 41-46. Pisa, Italy, 8-9/06/2023

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2022 Conference article Open Access OPEN
The 2nd workshop on Mixed-Initiative ConveRsatiOnal Systems (MICROS)
Mele I, Muntean Ci, Aliannejadi M, Voskarides N
The Mixed-Initiative ConveRsatiOnal Systems workshop (MICROS) aims at bringing novel ideas and investigating new solutions on conversational assistant systems. The increasing popularity of personal assistant systems, as well as smartphones, has changed the way users access online information, posing new challenges for information seeking and filtering. MICROS has a particular focus on mixed-initiative conversational systems, namely, systems that can provide answers in a proactive way (e.g., asking for clarification or proposing possible interpretations for ambiguous and vague requests). We invite people working on conversational systems or interested in the workshop topics to send us their position and research manuscripts.DOI: 10.1145/3511808.3557938
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2022 Journal article Open Access OPEN
Caching historical embeddings in conversational search
Frieder O., Mele I., Muntean C., Nardini F. M., Perego R., Tonellotto N.
Rapid response, namely low latency, is fundamental in search applications; it is particularly so in interactive search sessions, such as those encountered in conversational settings. An observation with a potential to reduce latency asserts that conversational queries exhibit a temporal locality in the lists of documents retrieved. Motivated by this observation, we propose and evaluate a client-side document embedding cache, improving the responsiveness of conversational search systems. By leveraging state-of-the-art dense retrieval models to abstract document and query semantics, we cache the embeddings of documents retrieved for a topic introduced in the conversation, as they are likely relevant to successive queries. Our document embedding cache implements an efficient metric index, answering nearest-neighbor similarity queries by estimating the approximate result sets returned. We demonstrate the efficiency achieved using our cache via reproducible experiments based on TREC CAsT datasets, achieving a hit rate of up to 75% without degrading answer quality. Our achieved high cache hit rates significantly improve the responsiveness of conversational systems while likewise reducing the number of queries managed on the search back-end.Source: ACM TRANSACTIONS ON THE WEB, vol. 18 (issue 4)
DOI: 10.1145/3578519
DOI: 10.48550/arxiv.2211.14155
Metrics:


See at: arXiv.org e-Print Archive Open Access | IRIS Cnr Open Access | IRIS Cnr Open Access | IRIS Cnr Open Access | ACM Transactions on the Web Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2021 Conference article Open Access OPEN
MICROS: Mixed-Initiative ConveRsatiOnal Systems Workshop
Mele I, Muntean Ci, Aliannejadi M, Voskarides N
The 1st edition of the workshop on Mixed-Initiative ConveRsatiOnal Systems (MICROS@ECIR2021) aims at investigating and collecting novel ideas and contributions in the field of conversational systems. Oftentimes, the users fulfill their information need using smartphones and home assistants. This has revolutionized the way users access online information, thus posing new challenges compared to traditional search and recommendation. The first edition of MICROS will have a particular focus on mixed-initiative conversational systems. Indeed, conversational systems need to be proactive, proposing not only answers but also possible interpretations for ambiguous or vague requests.DOI: 10.1007/978-3-030-72240-1_86
DOI: 10.48550/arxiv.2101.10219
Metrics:


See at: arXiv.org e-Print Archive Open Access | arxiv.org Open Access | CNR IRIS Open Access | link.springer.com Open Access | ISTI Repository Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2021 Journal article Restricted
Adaptive utterance rewriting for conversational search
Mele I, Muntean Ci, Nardini Fm, Perego R, Tonellotto N, Frieder O
In a conversational context, a user converses with a system through a sequence of natural-language questions, i.e., utterances. Starting from a given subject, the conversation evolves through sequences of user utterances and system replies. The retrieval of documents relevant to an utterance is difficult due to informal use of natural language in speech and the complexity of understanding the semantic context coming from previous utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing in order: (i) automatic utterance understanding and rewriting, (ii) first-stage retrieval of candidate passages for the rewritten utterances, and (iii) neural re-ranking of candidate passages. By understanding the conversational context, we propose adaptive utterance rewriting strategies based on the current utterance and the dialogue evolution of the user with the system. A classifier identifies those utterances lacking context information as well as the dependencies on the previous utterances. Experimentally, we evaluate the proposed architecture in terms of traditional information retrieval metrics at small cutoffs. Results demonstrate the effectiveness of our techniques, achieving an improvement up to 0.6512 for P@1 and 0.4484 for nDCG@3 w.r.t. the CAsT baseline.Source: INFORMATION PROCESSING & MANAGEMENT, vol. 58 (issue 6)
DOI: 10.1016/j.ipm.2021.102682
Project(s): BigDataGrapes via OpenAIRE
Metrics:


See at: Information Processing & Management Restricted | Information Processing & Management Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2020 Journal article Open Access OPEN
Crime and its fear in social media
Prieto Curiel R., Cresci S., Muntean C., Bishop S. R.
Social media posts incorporate real-time information that has, elsewhere, been exploited to predict social trends. This paper considers whether such information can be useful in relation to crime and fear of crime. A large number of tweets were collected from the 18 largest Spanish-speaking countries in Latin America, over a period of 70 days. These tweets are then classified as being crime-related or not and additional information is extracted, including the type of crime and where possible, any geo-location at a city level. From the analysis of collected data, it is established that around 15 out of every 1000 tweets have text related to a crime, or fear of crime. The frequency of tweets related to crime is then compared against the number of murders, the murder rate, or the level of fear of crime as recorded in surveys. Results show that, like mass media, such as newspapers, social media suffer from a strong bias towards violent or sexual crimes. Furthermore, social media messages are not highly correlated with crime. Thus, social media is shown not to be highly useful for detecting trends in crime itself, but what they do demonstrate is rather a reflection of the level of the fear of crime.Source: PALGRAVE COMMUNICATIONS, vol. 6 (issue 1)
DOI: 10.1057/s41599-020-0430-7
Project(s): CIMPLEX via OpenAIRE, SoBigData via OpenAIRE
Metrics:


See at: Palgrave Communications Open Access | CNR IRIS Open Access | Palgrave Communications Open Access | ISTI Repository Open Access | www.nature.com Open Access | Palgrave Communications Open Access | CNR IRIS Restricted


2020 Journal article Open Access OPEN
(So) Big Data and the transformation of the city
Andrienko G., Andrienko N., Boldrini C., Caldarelli G., Cintia P., Cresci S., Facchini A., Giannotti F., Gionis A., Guidotti R., Mathioudakis M., Muntean C. I., Pappalardo L., Pedreschi D., Pournaras E., Pratesi F., Tesconi M., Trasarti R.
The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the "City of Citizens" thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality.Source: International Journal of Data Science and Analytics (Print) 1 (2020). doi:10.1007/s41060-020-00207-3
DOI: 10.1007/s41060-020-00207-3
Project(s): SoBigData via OpenAIRE
Metrics:


See at: Aaltodoc Publication Archive Open Access | International Journal of Data Science and Analytics Open Access | White Rose Research Online Open Access | HELDA - Digital Repository of the University of Helsinki Open Access | Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari Open Access | link.springer.com Open Access | International Journal of Data Science and Analytics Open Access | City Research Online Open Access | ISTI Repository Open Access | Fraunhofer-ePrints Restricted | CNR ExploRA


2020 Journal article Open Access OPEN
Human migration: the big data perspective
Sîrbu A, Andrienko G, Andrienko N, Boldrini C, Conti M, Giannotti F, Guidotti R, Bertoli S, Kim J, Muntean Ci, Pappalardo L, Passarella A, Pedreschi D, Pollacci L, Pratesi F, Sharma R
How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants.Source: INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, vol. 11, pp. 341-360
DOI: 10.1007/s41060-020-00213-5
Project(s): SoBigData via OpenAIRE
Metrics:


See at: International Journal of Data Science and Analytics Open Access | CNR IRIS Open Access | link.springer.com Open Access | ISTI Repository Open Access | HAL Clermont Université Restricted | CNR IRIS Restricted | Fraunhofer-ePrints Restricted


2020 Conference article Open Access OPEN
Topic propagation in conversational search
Mele I, Muntean Ci, Nardini Fm, Perego R, Tonellotto N, Frieder O
In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system replies. The retrieval of documents relevant to a given utterance in a conversation is challenging due to ambiguity of natural language and to the difficulty of detecting possible topic shifts and semantic relationships among utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing: (i) topic-aware utterance rewriting, (ii) retrieval of candidate passages for the rewritten utterances, and (iii) neural-based re-ranking of candidate passages. We present a comprehensive experimental evaluation of the architecture assessed in terms of traditional IR metrics at small cutoffs. Experimental results show the effectiveness of our techniques that achieve an improvement of up to $0.28$ (+93%) for P@1 and $0.19$ (+89.9%) for nDCG@3 w.r.t. the CAsT baseline.DOI: 10.1145/3397271.3401268
DOI: 10.48550/arxiv.2004.14054
Project(s): BigDataGrapes via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | arxiv.org Open Access | dl.acm.org Restricted | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2020 Journal article Restricted
Weighting passages enhances accuracy
Muntean C., Nardini F. M., Perego R., Tonellotto N., Frieder O.
We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.Source: ACM TRANSACTIONS ON INFORMATION SYSTEMS, vol. 39 (issue 2)
DOI: 10.1145/3428687
Metrics:


See at: ACM Transactions on Information Systems Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2020 Journal article Open Access OPEN
RankEval: Evaluation and investigation of ranking models
Lucchese C., Muntean C., Nardini F. M., Perego R., Trani S.
RankEval is a Python open-source tool for the analysis and evaluation of ranking models based on ensembles of decision trees. Learning-to-Rank (LtR) approaches that generate tree-ensembles are considered the most effective solution for difficult ranking tasks and several impactful LtR libraries have been developed aimed at improving ranking quality and training efficiency. However, these libraries are not very helpful in terms of hyper-parameters tuning and in-depth analysis of the learned models, and even the implementation of most popular Information Retrieval (IR) metrics differ among them, thus making difficult to compare different models. RankEval overcomes these limitations by providing a unified environment where to perform an easy, comprehensive inspection and assessment of ranking models trained using different machine learning libraries. The tool focuses on ensuring efficiency, flexibility and extensibility and is fully interoperable with most popular LtR libraries.Source: SOFTWAREX, vol. 12
DOI: 10.1016/j.softx.2020.100614
Project(s): BigDataGrapes via OpenAIRE
Metrics:


See at: SoftwareX Open Access | CNR IRIS Open Access | ISTI Repository Open Access | SoftwareX Open Access | www.sciencedirect.com Open Access | CNR IRIS Restricted