Page 1 of 2

2012 Contribution to book Restricted

Exploring the meaning behind Twitter hashtags through clustering.
Muntean C. I., Morar G. A., Moldovan D.
Social networks are generators of large amount of data produced by users, who are not limited with respect to the content of the information they exchange. The data generated can be a good indicator of trends and topic preferences among users. In our paper we focus on analyzing and representing hashtags by the corpus in which they appear. We cluster a large set of hashtags using K-means on map reduce in order to process data in a distributed manner. Our intention is to retrieve connections that might exist between different hashtags and their textual representation, and grasp their semantics through the main topics they occur with.Source: BIS 2012 - Business Information Systems Workshops. Revised papers, edited by Witold Abramowicz, John Domingue, Krzysztof W?cel, pp. 231–242. London: Springer, 2012
DOI: 10.1007/978-3-642-34228-8_22
Metrics:

See at: doi.org Restricted | gateway.webofknowledge.com | link.springer.com | CNR ExploRA

2011 Conference article Open Access

Raising engagement in e-learning through gamification
Muntean C.
Games are part of day to day life, entertaining users, but at the same time modelling behaviors. By applying game mechanics and dynamics to tasks and e-learning processes we can increase user engagement with an e-learning application and its specific tasks. While having multiple uses in commercial practices, gamification implies well established techniques similar to those found in games. We will take a closer look at the ones that are appropriate to the learning process and moreover to e-learning and analyze relevant examples.

See at: CNR IRIS Open Access | CNR IRIS Restricted

2020 Journal article Open Access

Crime and its fear in social media
Prieto Curiel R., Cresci S., Muntean C., Bishop S. R.
Social media posts incorporate real-time information that has, elsewhere, been exploited to predict social trends. This paper considers whether such information can be useful in relation to crime and fear of crime. A large number of tweets were collected from the 18 largest Spanish-speaking countries in Latin America, over a period of 70 days. These tweets are then classified as being crime-related or not and additional information is extracted, including the type of crime and where possible, any geo-location at a city level. From the analysis of collected data, it is established that around 15 out of every 1000 tweets have text related to a crime, or fear of crime. The frequency of tweets related to crime is then compared against the number of murders, the murder rate, or the level of fear of crime as recorded in surveys. Results show that, like mass media, such as newspapers, social media suffer from a strong bias towards violent or sexual crimes. Furthermore, social media messages are not highly correlated with crime. Thus, social media is shown not to be highly useful for detecting trends in crime itself, but what they do demonstrate is rather a reflection of the level of the fear of crime.Source: PALGRAVE COMMUNICATIONS, vol. 6 (issue 1)
DOI: 10.1057/s41599-020-0430-7
Project(s): CIMPLEX via OpenAIRE

, SoBigData via OpenAIRE

Metrics:

2020 Conference article Open Access

High-quality prediction of tourist movements using temporal trajectories in graphs
Moghtasedi S., Muntean C., Nardini F. M., Grossi R., Marino A.
In this paper, we study the problem of predicting the next position of a tourist given his history. In particular, we propose a model to identify the next point of interest that a tourist will visit in the future, by making use of similarity between trajectories on a graph and taking into account the spatial-temporal aspect of trajectories. We compare our method with a well-known machine learning-based technique, as well as with a popularity baseline, using three public real-world datasets. Our experimental results show that our technique outperforms state-of-the-art machine learning-based methods effectively, by providing at least twice more accurate results.DOI: 10.1109/asonam49781.2020.9381450
Metrics:

See at: CNR IRIS Open Access | ieeexplore.ieee.org | CNR IRIS Restricted | xplorestaging.ieee.org

2021 Conference article Open Access

MICROS: Mixed-Initiative ConveRsatiOnal Systems Workshop
Mele I, Muntean Ci, Aliannejadi M, Voskarides N
The 1st edition of the workshop on Mixed-Initiative ConveRsatiOnal Systems (MICROS@ECIR2021) aims at investigating and collecting novel ideas and contributions in the field of conversational systems. Oftentimes, the users fulfill their information need using smartphones and home assistants. This has revolutionized the way users access online information, thus posing new challenges compared to traditional search and recommendation. The first edition of MICROS will have a particular focus on mixed-initiative conversational systems. Indeed, conversational systems need to be proactive, proposing not only answers but also possible interpretations for ambiguous or vague requests.DOI: 10.1007/978-3-030-72240-1_86
DOI: 10.48550/arxiv.2101.10219
Metrics:

2022 Conference article Open Access

The 2nd workshop on Mixed-Initiative ConveRsatiOnal Systems (MICROS)
Mele I, Muntean Ci, Aliannejadi M, Voskarides N
The Mixed-Initiative ConveRsatiOnal Systems workshop (MICROS) aims at bringing novel ideas and investigating new solutions on conversational assistant systems. The increasing popularity of personal assistant systems, as well as smartphones, has changed the way users access online information, posing new challenges for information seeking and filtering. MICROS has a particular focus on mixed-initiative conversational systems, namely, systems that can provide answers in a proactive way (e.g., asking for clarification or proposing possible interpretations for ambiguous and vague requests). We invite people working on conversational systems or interested in the workshop topics to send us their position and research manuscripts.DOI: 10.1145/3511808.3557938
Metrics:

2023 Conference article Open Access

A spatial approach to predict performance of conversational search systems
Faggioli G, Ferro N, Muntean C, Perego R, Tonellotto N
Recent advancements in Information Retrieval and Natural Language Processing have led to significant developments in the way users interact with search engines, with traditional one-shot textual queries being replaced by multi-turn conversations. As a highly interactive search scenario, Conversational Search (CS) can significantly benefit from Query Performance Prediction (QPP) techniques. However, the application of QPP in the CS domain is a relatively new field and requires proper framing. This study proposes a set of spatial-based QPP models, designed to work effectively in the conversational search domain, where dense neural retrieval models are the most common approach and query cutoffs are small. The proposed QPP approaches are shown to improve the predictive performance over the state-of-the-art in different scenarios and collections, highlighting the utility of QPP in the CS domain.Source: CEUR WORKSHOP PROCEEDINGS, pp. 41-46. Pisa, Italy, 8-9/06/2023

See at: ceur-ws.org Open Access | CNR IRIS | CNR IRIS Restricted | CNR IRIS

2013 Conference article Open Access

Learning to shorten query sessions.
Muntean C, Nardini F M, Silvestri F, Sydow M
We propose the use of learning to rank techniques to shorten query sessions by maximizing the probability that the query we predict is the final query of the current search session. We present a preliminary evaluation showing that this approach is a promising research direction.

See at: dl.acm.org Open Access | CNR IRIS | CNR IRIS Restricted

2020 Conference article Open Access

Topic propagation in conversational search
Mele I, Muntean Ci, Nardini Fm, Perego R, Tonellotto N, Frieder O
In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system replies. The retrieval of documents relevant to a given utterance in a conversation is challenging due to ambiguity of natural language and to the difficulty of detecting possible topic shifts and semantic relationships among utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing: (i) topic-aware utterance rewriting, (ii) retrieval of candidate passages for the rewritten utterances, and (iii) neural-based re-ranking of candidate passages. We present a comprehensive experimental evaluation of the architecture assessed in terms of traditional IR metrics at small cutoffs. Experimental results show the effectiveness of our techniques that achieve an improvement of up to $0.28$ (+93%) for P@1 and $0.19$ (+89.9%) for nDCG@3 w.r.t. the CAsT baseline.DOI: 10.1145/3397271.3401268
DOI: 10.48550/arxiv.2004.14054
Project(s): BigDataGrapes via OpenAIRE

Metrics:

2020 Journal article Restricted

Weighting passages enhances accuracy
Muntean C., Nardini F. M., Perego R., Tonellotto N., Frieder O.
We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.Source: ACM TRANSACTIONS ON INFORMATION SYSTEMS, vol. 39 (issue 2)
DOI: 10.1145/3428687
Metrics:

See at: ACM Transactions on Information Systems Restricted | CNR IRIS | CNR IRIS

2021 Journal article Restricted

Adaptive utterance rewriting for conversational search
Mele I, Muntean Ci, Nardini Fm, Perego R, Tonellotto N, Frieder O
In a conversational context, a user converses with a system through a sequence of natural-language questions, i.e., utterances. Starting from a given subject, the conversation evolves through sequences of user utterances and system replies. The retrieval of documents relevant to an utterance is difficult due to informal use of natural language in speech and the complexity of understanding the semantic context coming from previous utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing in order: (i) automatic utterance understanding and rewriting, (ii) first-stage retrieval of candidate passages for the rewritten utterances, and (iii) neural re-ranking of candidate passages. By understanding the conversational context, we propose adaptive utterance rewriting strategies based on the current utterance and the dialogue evolution of the user with the system. A classifier identifies those utterances lacking context information as well as the dependencies on the previous utterances. Experimentally, we evaluate the proposed architecture in terms of traditional information retrieval metrics at small cutoffs. Results demonstrate the effectiveness of our techniques, achieving an improvement up to 0.6512 for P@1 and 0.4484 for nDCG@3 w.r.t. the CAsT baseline.Source: INFORMATION PROCESSING & MANAGEMENT, vol. 58 (issue 6)
DOI: 10.1016/j.ipm.2021.102682
Project(s): BigDataGrapes via OpenAIRE

Metrics:

See at: Information Processing & Management Restricted | Information Processing & Management | CNR IRIS | CNR IRIS

2023 Conference article Restricted

A geometric framework for query performance prediction in conversational search
Faggioli G., Ferro N., Muntean C., Perego R., Tonellotto N.
Thanks to recent advances in IR and NLP, the way users interact with search engines is evolving rapidly, with multi-turn conversations replacing traditional one-shot textual queries. Given its interactive nature, Conversational Search (CS) is one of the scenarios that can benefit the most from Query Performance Prediction (QPP) techniques. QPP for the CS domain is a relatively new field and lacks proper framing. In this study, we address this gap by proposing a framework for the application of QPP in the CS domain and use it to evaluate the performance of predictors. We characterize what it means to predict the performance in the CS scenario, where information needs are not independent queries but a series of closely related utterances. We identify three main ways to use QPP models in the CS domain: as a diagnostic tool, as a way to adjust the system's behaviour during a conversation, or as a way to predict the system's performance on the next utterance. Due to the lack of established evaluation procedures for QPP in the CS domain, we propose a protocol to evaluate QPPs for each of the use cases. Additionally, we introduce a set of spatial-based QPP models designed to work the best in the conversational search domain, where dense neural retrieval models are the most common approaches and query cutoffs are typically small. We show how the proposed QPP approaches improve significantly the predictive performance over the state-of-the-art in different scenarios and collections.DOI: 10.1145/3539618.3591625
Project(s): SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: dl.acm.org Restricted | CNR IRIS | CNR IRIS

2023 Conference article Open Access

Commonsense injection in conversational systems: an adaptable framework for query expansion
Rocchietti G., Frieder O., Muntean Cristina, Nardini F. M., Perego R.
Recent advancements in conversational agents are leading a paradigm shift in how people search for their information needs, from text queries to entire spoken conversations. This paradigm shift poses a new challenge: a single question may lack the context driven by the entire conversation. We propose and evaluate a framework to deal with multi-turn conversations with the injection of commonsense knowledge. Specifically, we propose a novel approach for conversational search that uses pre-trained large language models and commonsense knowledge bases to enrich queries with relevant concepts. Our framework comprises a generator of candidate concepts related to the context of the conversation and a selector for deciding which candidate concept to add to the current utterance to improve retrieval effectiveness. We use the TREC CAsT datasets and ConceptNet to show that our framework improves retrieval performance by up to 82% in terms of Recall@200 and up to 154% in terms of NDCG@3 as compared to the performance achieved by the original utterances in the conversations.DOI: 10.1109/wi-iat59888.2023.00013
Project(s): EFRA via OpenAIRE

Metrics:

See at: CNR IRIS Open Access | ieeexplore.ieee.org | ISTI Repository | CNR IRIS Restricted | CNR IRIS

2023 Conference article Open Access

Rewriting conversational utterances with instructed large language models
Galimzhanova E, Muntean Ci, Nardini Fm, Perego R, Rocchietti G
Many recent studies have shown the ability of large language models (LLMs) to achieve state-of-the-art performance on many NLP tasks, such as question answering, text summarization, coding, and translation. In some cases, the results provided by LLMs are on par with those of human experts. These models' most disruptive innovation is their ability to perform tasks via zero-shot or few-shot prompting. This capability has been successfully exploited to train instructed LLMs, where reinforcement learning with human feedback is used to guide the model to follow the user's requests directly. In this paper, we investigate the ability of instructed LLMs to improve conversational search effectiveness by rewriting user questions in a conversational setting. We study which prompts provide the most informative rewritten utterances that lead to the best retrieval performance. Reproducible experiments are conducted on publicly-available TREC CAST datasets. The results show that rewriting conversational utterances with instructed LLMs achieves significant improvements of up to 25.2% in MRR, 31.7% in Precision@1, 27% in NDCG@3, and 11.5% in Recall@500 over state-of-the-art techniques.DOI: 10.1109/wi-iat59888.2023.00014
Project(s): EFRA via OpenAIRE

Metrics:

See at: CNR IRIS Open Access | ieeexplore.ieee.org | ISTI Repository | CNR IRIS Restricted | CNR IRIS

2022 Journal article Open Access

Caching historical embeddings in conversational search
Frieder O., Mele I., Muntean C., Nardini F. M., Perego R., Tonellotto N.
Rapid response, namely low latency, is fundamental in search applications; it is particularly so in interactive search sessions, such as those encountered in conversational settings. An observation with a potential to reduce latency asserts that conversational queries exhibit a temporal locality in the lists of documents retrieved. Motivated by this observation, we propose and evaluate a client-side document embedding cache, improving the responsiveness of conversational search systems. By leveraging state-of-the-art dense retrieval models to abstract document and query semantics, we cache the embeddings of documents retrieved for a topic introduced in the conversation, as they are likely relevant to successive queries. Our document embedding cache implements an efficient metric index, answering nearest-neighbor similarity queries by estimating the approximate result sets returned. We demonstrate the efficiency achieved using our cache via reproducible experiments based on TREC CAsT datasets, achieving a hit rate of up to 75% without degrading answer quality. Our achieved high cache hit rates significantly improve the responsiveness of conversational systems while likewise reducing the number of queries managed on the search back-end.Source: ACM TRANSACTIONS ON THE WEB, vol. 18 (issue 4)
DOI: 10.1145/3578519
DOI: 10.48550/arxiv.2211.14155
Metrics:

2019 Conference article Restricted

Enhanced news retrieval: passages lead the way!
Catena M, Nardini Fm, Frieder O, Perego R, Muntean Ci, Tonellotto N
We observe that most relevant terms in unstructured news articles are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of news articles. Our experimentation, conducted using three publicly available news datasets, demonstrates that BM25P markedly outperforms BM25 in term of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1.DOI: 10.1145/3331184.3331373
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR IRIS | CNR IRIS

2020 Journal article Open Access

RankEval: Evaluation and investigation of ranking models
Lucchese C., Muntean C., Nardini F. M., Perego R., Trani S.
RankEval is a Python open-source tool for the analysis and evaluation of ranking models based on ensembles of decision trees. Learning-to-Rank (LtR) approaches that generate tree-ensembles are considered the most effective solution for difficult ranking tasks and several impactful LtR libraries have been developed aimed at improving ranking quality and training efficiency. However, these libraries are not very helpful in terms of hyper-parameters tuning and in-depth analysis of the learned models, and even the implementation of most popular Information Retrieval (IR) metrics differ among them, thus making difficult to compare different models. RankEval overcomes these limitations by providing a unified environment where to perform an easy, comprehensive inspection and assessment of ranking models trained using different machine learning libraries. The tool focuses on ensuring efficiency, flexibility and extensibility and is fully interoperable with most popular LtR libraries.Source: SOFTWAREX, vol. 12
DOI: 10.1016/j.softx.2020.100614
Project(s): BigDataGrapes via OpenAIRE

Metrics:

2024 Patent Restricted

Caching historical embeddings in conversational search
Frieder O., Mele I., Muntean C., Nardini F. M., Perego R., Tonellotto N.
A method and system are described for improving the speed and efficiency of obtaining conversational search results. A user may speak a phrase to perform a conversational search or a series of phrases to perform a series of searches. These spoken phrases may be enriched by context and then converted into a query embedding. A similarity between the query embedding and document embeddings is used to determine the search results including a query cutoff number of documents and a cache cutoff number of documents. A second search phrase may use the cache of documents along with comparisons of the returned documents and the first query embedding to determine the quality of the cache for responding to the second search query. If the results are high-quality then the search may proceed much more rapidly by applying the second query only to the cached documents rather than to the server.

See at: CNR IRIS Restricted | CNR IRIS

2020 Journal article Open Access

Human migration: the big data perspective
Sîrbu A, Andrienko G, Andrienko N, Boldrini C, Conti M, Giannotti F, Guidotti R, Bertoli S, Kim J, Muntean Ci, Pappalardo L, Passarella A, Pedreschi D, Pollacci L, Pratesi F, Sharma R
How can big data help to understand the migration phenomenon? In this paper, we try to answer this question through an analysis of various phases of migration, comparing traditional and novel data sources and models at each phase. We concentrate on three phases of migration, at each phase describing the state of the art and recent developments and ideas. The first phase includes the journey, and we study migration flows and stocks, providing examples where big data can have an impact. The second phase discusses the stay, i.e. migrant integration in the destination country. We explore various data sets and models that can be used to quantify and understand migrant integration, with the final aim of providing the basis for the construction of a novel multi-level integration index. The last phase is related to the effects of migration on the source countries and the return of migrants.Source: INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, vol. 11, pp. 341-360
DOI: 10.1007/s41060-020-00213-5
Project(s): SoBigData via OpenAIRE

Metrics:

2024 Conference article Open Access

LongDoc summarization using instruction-tuned large language models for food safety regulations
Rocchietti G., Rulli C., Randl K., Muntean C., Nardini F. M., Perego R., Trani S., Karvounis M., Janostik J.
We design and implement a summarization pipeline for regulatory documents, focusing on two main objectives: creating two silver standard datasets using instruction-tuned large language models (LLMs) and finetuning smaller LLMs to perform summarization of regulatory text. In the first task, we employ state-of-the-art models, Cohere C4AI Command-R-4bit and Llama-3-8B, to generate summaries of regulatory documents. These generated summaries serve as ground-truth data for the second task, where we finetune three general-purpose LLMs to specialize in high-quality summary generation for specific documents while reducing the computational requirements. Specifically, we finetune two Google Flan-T5 models using datasets generated by Llama-3-8B and Cohere C4AI, and we create a quantized (4-bit) version of Google Gemma 2-B based on summaries from Cohere C4AI. Additionally, we initiated a pilot activity involving legal experts from SGS-Digicomply to validate the effectiveness of our summarization pipeline.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3802, pp. 33-42. Udine, Italy, 5-6/09/2024
Project(s): EFRA via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS | CNR IRIS Restricted