Page 1 of 2

2023 Conference article Restricted

A geometric framework for query performance prediction in conversational search
Faggioli G., Ferro N., Muntean C. I., Perego R., Tonellotto N.
Thanks to recent advances in IR and NLP, the way users interact with search engines is evolving rapidly, with multi-turn conversations replacing traditional one-shot textual queries. Given its interactive nature, Conversational Search (CS) is one of the scenarios that can benefit the most from Query Performance Prediction (QPP) techniques. QPP for the CS domain is a relatively new field and lacks proper framing. In this study, we address this gap by proposing a framework for the application of QPP in the CS domain and use it to evaluate the performance of predictors. We characterize what it means to predict the performance in the CS scenario, where information needs are not independent queries but a series of closely related utterances. We identify three main ways to use QPP models in the CS domain: as a diagnostic tool, as a way to adjust the system's behaviour during a conversation, or as a way to predict the system's performance on the next utterance. Due to the lack of established evaluation procedures for QPP in the CS domain, we propose a protocol to evaluate QPPs for each of the use cases. Additionally, we introduce a set of spatial-based QPP models designed to work the best in the conversational search domain, where dense neural retrieval models are the most common approaches and query cutoffs are typically small. We show how the proposed QPP approaches improve significantly the predictive performance over the state-of-the-art in different scenarios and collections.Source: SIGIR '23 - 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1355–1365, Taipei, Taiwan, 23-27/07/2023
DOI: 10.1145/3539618.3591625
Project(s): SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: dl.acm.org Restricted | CNR ExploRA

2023 Conference article Open Access

Commonsense injection in conversational systems: an adaptable framework for query expansion
Rocchietti G., Frieder O., Muntean C. I., Nardini F. M., Perego R.
Recent advancements in conversational agents are leading a paradigm shift in how people search for their information needs, from text queries to entire spoken conversations. This paradigm shift poses a new challenge: a single question may lack the context driven by the entire conversation. We propose and evaluate a framework to deal with multi-turn conversations with the injection of commonsense knowledge. Specifically, we propose a novel approach for conversational search that uses pre-trained large language models and commonsense knowledge bases to enrich queries with relevant concepts. Our framework comprises a generator of candidate concepts related to the context of the conversation and a selector for deciding which candidate concept to add to the current utterance to improve retrieval effectiveness. We use the TREC CAsT datasets and ConceptNet to show that our framework improves retrieval performance by up to 82% in terms of Recall@200 and up to 154% in terms of NDCG@3 as compared to the performance achieved by the original utterances in the conversations.Source: IEEE/WAT - 22nd International Conference on Web Intelligence and Intelligent Agent Technology, pp. 48–55, Venezia, Italy, 26-29/10/2023
DOI: 10.1109/wi-iat59888.2023.00013
Metrics:

See at: ISTI Repository Open Access | ieeexplore.ieee.org Restricted | CNR ExploRA

2023 Conference article Open Access

Rewriting conversational utterances with instructed large language models
Galimzhanova E., Muntean C. I., Nardini F. M., Perego R., Rocchietti G.
Many recent studies have shown the ability of large language models (LLMs) to achieve state-of-the-art performance on many NLP tasks, such as question answering, text summarization, coding, and translation. In some cases, the results provided by LLMs are on par with those of human experts. These models' most disruptive innovation is their ability to perform tasks via zero-shot or few-shot prompting. This capability has been successfully exploited to train instructed LLMs, where reinforcement learning with human feedback is used to guide the model to follow the user's requests directly. In this paper, we investigate the ability of instructed LLMs to improve conversational search effectiveness by rewriting user questions in a conversational setting. We study which prompts provide the most informative rewritten utterances that lead to the best retrieval performance. Reproducible experiments are conducted on publicly-available TREC CAST datasets. The results show that rewriting conversational utterances with instructed LLMs achieves significant improvements of up to 25.2% in MRR, 31.7% in Precision@1, 27% in NDCG@3, and 11.5% in Recall@500 over state-of-the-art techniques.Source: IEEE/WAT - 22nd International Conference on Web Intelligence and Intelligent Agent Technology, pp. 56–63, Venezia, Italy, 26-29/10/2023
DOI: 10.1109/wi-iat59888.2023.00014
Metrics:

See at: ISTI Repository Open Access | ieeexplore.ieee.org Restricted | CNR ExploRA

2023 Contribution to conference Restricted

A spatial approach to predict performance of conversational search systems
Faggioli G., Ferro N., Muntean C., Perego R., Tonellotto N.
Recent advancements in Information Retrieval and Natural Language Processing have led to significant developments in the way users interact with search engines, with traditional one-shot textual queries being replaced by multi-turn conversations. As a highly interactive search scenario, Conversational Search (CS) can significantly benefit from Query Performance Prediction (QPP) techniques. However, the application of QPP in the CS domain is a relatively new field and requires proper framing. This study proposes a set of spatial-based QPP models, designed to work effectively in the conversational search domain, where dense neural retrieval models are the most common approach and query cutoffs are small. The proposed QPP approaches are shown to improve the predictive performance over the state-of-the-art in different scenarios and collections, highlighting the utility of QPP in the CS domain.Source: IIR2023 - 13th Italian Information Retrieval Workshop, pp. 41–46, Pisa, Italy, 8-9/06/2023

See at: ceur-ws.org Restricted | CNR ExploRA

2022 Conference article Open Access

The 2nd workshop on Mixed-Initiative ConveRsatiOnal Systems (MICROS)
Mele I., Muntean C. I., Aliannejadi M., Voskarides N.
The Mixed-Initiative ConveRsatiOnal Systems workshop (MICROS) aims at bringing novel ideas and investigating new solutions on conversational assistant systems. The increasing popularity of personal assistant systems, as well as smartphones, has changed the way users access online information, posing new challenges for information seeking and filtering. MICROS has a particular focus on mixed-initiative conversational systems, namely, systems that can provide answers in a proactive way (e.g., asking for clarification or proposing possible interpretations for ambiguous and vague requests). We invite people working on conversational systems or interested in the workshop topics to send us their position and research manuscripts.Source: CIKM '22 - 31st ACM International Conference on Information & Knowledge Management, pp. 5173–5174, Atlanta, USA, 17-21/10/2022
DOI: 10.1145/3511808.3557938
Metrics:

See at: ISTI Repository Open Access | dl.acm.org Restricted | doi.org | CNR ExploRA

2021 Conference article Open Access

MICROS: Mixed-Initiative ConveRsatiOnal Systems Workshop
Mele I., Muntean C. I., Aliannejadi M., Voskarides N.
The 1st edition of the workshop on Mixed-Initiative ConveRsatiOnal Systems (MICROS@ECIR2021) aims at investigating and collecting novel ideas and contributions in the field of conversational systems. Oftentimes, the users fulfill their information need using smartphones and home assistants. This has revolutionized the way users access online information, thus posing new challenges compared to traditional search and recommendation. The first edition of MICROS will have a particular focus on mixed-initiative conversational systems. Indeed, conversational systems need to be proactive, proposing not only answers but also possible interpretations for ambiguous or vague requests.Source: ECIR 2021 - 43rd European Conference on IR Research, pp. 710–713, Online Conference, March 28 - April 1, 2021
DOI: 10.1007/978-3-030-72240-1_86
DOI: 10.48550/arxiv.2101.10219
Metrics:

2021 Journal article Closed Access

Adaptive utterance rewriting for conversational search
Mele I., Muntean C. I., Nardini F. M., Perego R., Tonellotto N., Frieder O.
In a conversational context, a user converses with a system through a sequence of natural-language questions, i.e., utterances. Starting from a given subject, the conversation evolves through sequences of user utterances and system replies. The retrieval of documents relevant to an utterance is difficult due to informal use of natural language in speech and the complexity of understanding the semantic context coming from previous utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing in order: (i) automatic utterance understanding and rewriting, (ii) first-stage retrieval of candidate passages for the rewritten utterances, and (iii) neural re-ranking of candidate passages. By understanding the conversational context, we propose adaptive utterance rewriting strategies based on the current utterance and the dialogue evolution of the user with the system. A classifier identifies those utterances lacking context information as well as the dependencies on the previous utterances. Experimentally, we evaluate the proposed architecture in terms of traditional information retrieval metrics at small cutoffs. Results demonstrate the effectiveness of our techniques, achieving an improvement up to 0.6512 for P@1 and 0.4484 for nDCG@3 w.r.t. the CAsT baseline.Source: Information processing & management 58 (2021). doi:10.1016/j.ipm.2021.102682
DOI: 10.1016/j.ipm.2021.102682
Project(s): BigDataGrapes via OpenAIRE

Metrics:

See at: Information Processing & Management Restricted | Information Processing & Management | CNR ExploRA

2020 Journal article Open Access

Crime and its fear in social media
Prieto Curiel R., Cresci S., Muntean C. I., Bishop S. R.
Social media posts incorporate real-time information that has, elsewhere, been exploited to predict social trends. This paper considers whether such information can be useful in relation to crime and fear of crime. A large number of tweets were collected from the 18 largest Spanish-speaking countries in Latin America, over a period of 70 days. These tweets are then classified as being crime-related or not and additional information is extracted, including the type of crime and where possible, any geo-location at a city level. From the analysis of collected data, it is established that around 15 out of every 1000 tweets have text related to a crime, or fear of crime. The frequency of tweets related to crime is then compared against the number of murders, the murder rate, or the level of fear of crime as recorded in surveys. Results show that, like mass media, such as newspapers, social media suffer from a strong bias towards violent or sexual crimes. Furthermore, social media messages are not highly correlated with crime. Thus, social media is shown not to be highly useful for detecting trends in crime itself, but what they do demonstrate is rather a reflection of the level of the fear of crime.Source: Palgrave communications 6 (2020). doi:10.1057/s41599-020-0430-7
DOI: 10.1057/s41599-020-0430-7
Project(s): CIMPLEX via OpenAIRE

, SoBigData via OpenAIRE

Metrics:

2020 Journal article Open Access

(So) Big Data and the transformation of the city
Andrienko G., Andrienko N., Boldrini C., Caldarelli G., Cintia P., Cresci S., Facchini A., Giannotti F., Gionis A., Guidotti R., Mathioudakis M., Muntean C. I., Pappalardo L., Pedreschi D., Pournaras E., Pratesi F., Tesconi M., Trasarti R.
The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the "City of Citizens" thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality.Source: International Journal of Data Science and Analytics (Print) 1 (2020). doi:10.1007/s41060-020-00207-3
DOI: 10.1007/s41060-020-00207-3
Project(s): SoBigData via OpenAIRE

Metrics:

2020 Journal article Open Access

Human migration: the big data perspective
Sîrbu A., Andrienko G., Andrienko N., Boldrini C., Conti M., Giannotti F., Guidotti R., Bertoli S., Kim J., Muntean C. I., Pappalardo L., Passarella A., Pedreschi D., Pollacci L., Pratesi F., Sharma R.
How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants.Source: International Journal of Data Science and Analytics (Online) 11 (2020): 341–360. doi:10.1007/s41060-020-00213-5
DOI: 10.1007/s41060-020-00213-5
Project(s): SoBigData via OpenAIRE

Metrics:

2020 Conference article Open Access

Topic propagation in conversational search
Mele I., Muntean C. I., Nardini F. M., Perego R., Tonellotto N., Frieder O.
In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system replies. The retrieval of documents relevant to a given utterance in a conversation is challenging due to ambiguity of natural language and to the difficulty of detecting possible topic shifts and semantic relationships among utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing: (i) topic-aware utterance rewriting, (ii) retrieval of candidate passages for the rewritten utterances, and (iii) neural-based re-ranking of candidate passages. We present a comprehensive experimental evaluation of the architecture assessed in terms of traditional IR metrics at small cutoffs. Experimental results show the effectiveness of our techniques that achieve an improvement of up to $0.28$ (+93%) for P@1 and $0.19$ (+89.9%) for nDCG@3 w.r.t. the CAsT baseline.Source: SIGIR 2020 - 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2057–2060, Online Conference, July 25-30, 2020
DOI: 10.1145/3397271.3401268
DOI: 10.48550/arxiv.2004.14054
Project(s): BigDataGrapes via OpenAIRE

Metrics:

2020 Journal article Restricted

Weighting passages enhances accuracy
Muntean C. I., Nardini F. M., Perego R., Tonellotto N., Frieder O.
We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.Source: ACM transactions on information systems 39 (2020). doi:10.1145/3428687
DOI: 10.1145/3428687
Metrics:

See at: ACM Transactions on Information Systems Restricted | CNR ExploRA

2020 Journal article Open Access

RankEval: Evaluation and investigation of ranking models
Lucchese C., Muntean C. I., Nardini F. M., Perego R., Trani S.
RankEval is a Python open-source tool for the analysis and evaluation of ranking models based on ensembles of decision trees. Learning-to-Rank (LtR) approaches that generate tree-ensembles are considered the most effective solution for difficult ranking tasks and several impactful LtR libraries have been developed aimed at improving ranking quality and training efficiency. However, these libraries are not very helpful in terms of hyper-parameters tuning and in-depth analysis of the learned models, and even the implementation of most popular Information Retrieval (IR) metrics differ among them, thus making difficult to compare different models. RankEval overcomes these limitations by providing a unified environment where to perform an easy, comprehensive inspection and assessment of ranking models trained using different machine learning libraries. The tool focuses on ensuring efficiency, flexibility and extensibility and is fully interoperable with most popular LtR libraries.Source: Softwarex (Amsterdam) 12 (2020). doi:10.1016/j.softx.2020.100614
DOI: 10.1016/j.softx.2020.100614
Project(s): BigDataGrapes via OpenAIRE

Metrics:

See at: SoftwareX Open Access | ISTI Repository | SoftwareX | www.sciencedirect.com | CNR ExploRA

2020 Conference article Embargo

High-quality prediction of tourist movements using temporal trajectories in graphs
Moghtasedi S., Muntean C. I., Nardini F. M., Grossi R., Marino A.
In this paper, we study the problem of predicting the next position of a tourist given his history. In particular, we propose a model to identify the next point of interest that a tourist will visit in the future, by making use of similarity between trajectories on a graph and taking into account the spatial-temporal aspect of trajectories. We compare our method with a well-known machine learning-based technique, as well as with a popularity baseline, using three public real-world datasets. Our experimental results show that our technique outperforms state-of-the-art machine learning-based methods effectively, by providing at least twice more accurate results.Source: ASONAM 2020 - The 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 348–352, Online conference, 7-10/12/2020
DOI: 10.1109/asonam49781.2020.9381450
Metrics:

See at: ieeexplore.ieee.org Restricted | xplorestaging.ieee.org | CNR ExploRA

2019 Conference article Closed Access

Enhanced news retrieval: passages lead the way!
Catena M., Nardini F. M., Frieder O., Perego R., Muntean C. I., Tonellotto N.
We observe that most relevant terms in unstructured news articles are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of news articles. Our experimentation, conducted using three publicly available news datasets, demonstrates that BM25P markedly outperforms BM25 in term of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1.Source: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1269–1272, Parigi, Francia, 21-25 July 2019
DOI: 10.1145/3331184.3331373
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR ExploRA

2018 Report Open Access

BASMATI - D3.5 Server- and Client-side Applications Adaptation and Reconfiguration: Design and Specification
Dazzi P., Carlini E., De Lira V. M., Munteanu C.
This report provides a description of the mechanisms, tools, and algorithms used to support application adaptation and reconfiguration in the BASMATI brokerage platform. At the core of this support lies the BASMATI Enriched Application Model (BEAM), which is the xml-based language in which an application is modelled and represented in BASMATI. The design principles behind the BEAM (namel: compatibility, extensibility, decomposability) are the prerequisites to provide efficient and effective geo-placement of services and applications on top of federated Cloud resources. The BEAM is made available to all the components of the platform by the Application Repository, which works as a centralization point for the BEAMs of all the applications. The decomposability of BEAM is exploited by the Decision Maker that has the task to proactively and reactively adapt the application according to the behaviour of users and resources, by means of advanced placement algorithms.Source: Project report, BASMATI, Deliverable D3.5, 2018
Project(s): BASMATI via OpenAIRE

See at: ISTI Repository Open Access | CNR ExploRA

2017 Conference article Restricted

Social Media Image Recognition for Food Trend Analysis
Amato G., Bolettieri P., Monteiro De Lira V., Muntean C. I., Perego R., Renso C.
n increasing number of people share their thoughts and the images of their lives on social media platforms. People are exposed to food in their everyday lives and share on-line what they are eating by means of photos taken to their dishes. The hashtag #foodporn is constantly among the popular hashtags in Twitter and food photos are the second most popular subject in Instagram after selfies. The system that we propose, WorldFoodMap, captures the stream of food photos from social media and, thanks to a CNN food image classifier, identifies the categories of food that people are sharing. By collecting food images from the Twitter stream and associating food category and location to them, WorldFoodMap permits to investigate and interactively visualize the popularity and trends of the shared food all over the world.Source: SIGIR 2017 - 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1333–1336, Tokyo, Japan, 7 - 11 August, 2017
DOI: 10.1145/3077136.3084142
Project(s): SoBigData via OpenAIRE

Metrics:

See at: dl.acm.org Restricted | doi.org | CNR ExploRA

2017 Conference article Restricted

RankEval: an evaluation and analysis framework for learning-to-rank solutions
Lucchese C., Muntean C. I., Nardini F. M., Perego R., Trani S.
In this demo paper we propose RankEval, an open-source tool for the analysis and evaluation of Learning-to-Rank (LtR) models based on ensembles of regression trees. Gradient Boosted Regression Trees (GBRT) is a flexible statistical learning technique for classification and regression at the state of the art for training effective LtR solutions. Indeed, the success of GBRT fostered the development of several open-source LtR libraries targeting efficiency of the learning phase and effectiveness of the resulting models. However, these libraries offer only very limited help for the tuning and evaluation of the trained models. In addition, the implementations provided for even the most traditional IR evaluation metrics differ from library to library, thus making the objective evaluation and comparison between trained models a difficult task. RankEval addresses these issues by providing a common ground for LtR libraries that offers useful and interoperable tools for a comprehensive comparison and in-depth analysis of ranking models.Source: SIGIR '17 - 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1281–1284, Tokyo, Japan, 9-11 August 2017
DOI: 10.1145/3077136.3084140
Project(s): SoBigData via OpenAIRE

Metrics:

See at: dl.acm.org Restricted | doi.org | CNR ExploRA

2017 Journal article Open Access

Perception of social phenomena through the multidimensional analysis of online social networks
Coletto M., Esuli A., Lucchese C., Muntean C. I., Nardini F. M., Perego R., Renso C.
We propose an analytical framework aimed at investigating different views of the discussions regarding polarized topics which occur in Online Social Networks (OSNs). The framework supports the analysis along multiple dimensions, i.e., time, space and sentiment of the opposite views about a controversial topic emerging in an OSN. To assess its usefulness in mining insights about social phenomena, we apply it to two different Twitter case studies: the discussions about the refugee crisis and the United Kingdom European Union membership referendum. These complex and contended topics are very important issues for EU citizens and stimulated a multitude of Twitter users to take side and actively participate in the discussions. Our framework allows to monitor in a scalable way the raw stream of relevant tweets and to automatically enrich them with location information (user and mentioned locations), and sentiment polarity (positive vs. negative). The analyses we conducted show how the framework captures the differences in positive and negative user sentiment over time and space. The resulting knowledge can support the understanding of complex dynamics by identifying variations in the perception of specific events and locations.Source: Online social networks and media 1 (2017): 14–32. doi:10.1016/j.osnem.2017.03.001
DOI: 10.1016/j.osnem.2017.03.001
Project(s): SoBigData via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | Online Social Networks and Media Restricted | www.sciencedirect.com | CNR ExploRA

2017 Conference article Restricted

Sentiment spreading: an epidemic model for lexicon-based sentiment analysis on Twitter
Pollacci L., Sirbu A., Giannotti F., Pedreschi D., Lucchese C., Muntean C. I.
While sentiment analysis has received significant attention in the last years, problems still exist when tools need to be applied to microblogging content. This because, typically, the text to be analysed consists of very short messages lacking in structure and semantic context. At the same time, the amount of text produced by online platforms is enormous. So, one needs simple, fast and effective methods in order to be able to efficiently study sentiment in these data. Lexicon-based methods, which use a predefined dictionary of terms tagged with sentiment valences to evaluate sentiment in longer sentences, can be a valid approach. Here we present a method based on epidemic spreading to automatically extend the dictionary used in lexicon-based sentiment analysis, starting from a reduced dictionary and large amounts of Twitter data. The resulting dictionary is shown to contain valences that correlate well with human-annotated sentiment, and to produce tweet sentiment classifications comparable to the original dictionary, with the advantage of being able to tag more tweets than the original. The method is easily extensible to various languages and applicable to large amounts of data.Source: AI*IA Conference of the Italian Association for Artificial Intelligence, pp. 114–127, Bari, Italy, 14-17 November 2017
DOI: 10.1007/978-3-319-70169-1_9
Project(s): SoBigData via OpenAIRE

Metrics:

See at: Lecture Notes in Computer Science Restricted | Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari | link.springer.com | CNR ExploRA