17 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2025 Journal article Open Access OPEN
ChatGPT versus modest large language models: an extensive study on benefits and drawbacks for conversational search
Rocchietti G., Rulli C., Nardini F. M., Muntean Cristina Ioana, Perego R., Frieder O.
Large Language Models (LLMs) are effective in modeling text syntactic and semantic content, making them a strong choice to perform conversational query rewriting. While previous approaches proposed NLP-based custom models, requiring significant engineering effort, our approach is straightforward and conceptually simpler. Not only do we improve effectiveness over the current state-of-the-art, but we also curate the cost and efficiency aspects. We explore the use of pre-trained LLMs fine-tuned to generate quality user query rewrites, aiming to reduce computational costs while maintaining or improving retrieval effectiveness. As a first contribution, we study various prompting approaches - including zero, one, and few-shot methods - with ChatGPT (e.g., gpt-3.5-turbo). We observe an increase in the quality of rewrites leading to improved retrieval. We then fine-tuned smaller open LLMs on the query rewriting task. Our results demonstrate that our fine-tuned models, including the smallest with 780 million parameters, achieve better performance during the retrieval phase than gpt-3.5-turbo. To fine-tune the selected models, we used the QReCC dataset, which is specifically designed for query rewriting tasks. For evaluation, we used the TREC CAsT datasets to assess the retrieval effectiveness of the rewrites of both gpt-3.5-turbo and our fine-tuned models. Our findings show that fine-tuning LLMs on conversational query rewriting datasets can be more effective than relying on generic instruction-tuned models or traditional query reformulation techniques.Source: IEEE ACCESS, vol. 13, pp. 15253-15271
DOI: 10.1109/access.2025.3529741
Metrics:


See at: IEEE Access Open Access | IEEE Access Open Access | CNR IRIS Open Access | ieeexplore.ieee.org Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Efficient approximate nearest neighbor search on a raspberry Pi
Martinico S., Nardini F. M., Rulli C., Venturini R.
Approximate Nearest Neighbors (ANN) search is a core task in Information Retrieval. However, the high computational demands and reliance on expensive infrastructures limit broader contributions to ANN research. Enabling efficient and effective ANN search on low-resource devices would allow researchers in low-income countries to participate in the ANN community, thereby democratizing the field. Despite its potential, the IR literature offers little work on the feasibility of ANN search under resource constraints. In this proposal, we explore efficient solutions for large-scale ANN search on low-resource devices. We report a preliminary experimentation highlighting current limitations and outlining future challenges.DOI: 10.1145/3726302.3730268
Project(s): EFRA via OpenAIRE
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Neural lexical search with learned sparse retrieval
Yates A., Lassance C., Rulli C., Yang E., Macavaney S., Singh Siddharth A. K., Nguyen T., Lei Y.
Learned Sparse Retrieval (LSR) techniques use neural machinery to represent queries and documents as learned bags of words. In contrast with other neural retrieval techniques, such as generative retrieval and dense retrieval, LSR has been shown to be a remarkably robust, transferable, and efficient family of methods for retrieving high-quality search results. This half-day tutorial aims to provide an extensive overview of LSR, ranging from its fundamentals to the latest emerging techniques. By the end of the tutorial, attendees will be familiar with the important design decisions of an LSR system, know how to apply them to text and other modalities, and understand the latest techniques for retrieving with them efficiently. Website: https://lsr-tutorial.github.ioDOI: 10.1145/3726302.3731693
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Effective inference-free retrieval for learned sparse representations
Nardini F. M., Nguyen T., Rulli C., Venturini R., Yates A.
Learned Sparse Retrieval (LSR) is an effective IR approach that exploits pre-trained language models for encoding text into a learned bag of words. Several efforts in the literature have shown that sparsity is key to enabling a good trade-off between the efficiency and effectiveness of the query processor. To induce the right degree of sparsity, researchers typically use regularization techniques when training LSR models. Recently, new efficient-inverted index-based-retrieval engines have been proposed, leading to a natural question: has the role of regularization changed in training LSR models? In this paper, we conduct an extended evaluation of regularization approaches for LSR where we discuss their effectiveness, efficiency, and out-of-domain generalization capabilities. We first show that regularization can be relaxed to produce more effective LSR en- coders. We also show that query encoding is now the bottleneck limiting the overall query processor performance. To remove this bottleneck, we advance the state-of-the-art of inference-free LSR by proposing Learned Inference-free Retrieval (Li-Lsr). At training time, Li-Lsr learns a score for each token, casting the query encoding step into a seamless table lookup. Our approach yields state-of-the-art effectiveness for both in-domain and out-of-domain evaluation,surpassing Splade-v3-Doc by 1 point of mRR@10 on MsMarco and 1.8 points of nDCG@10 on Beir.DOI: 10.1145/3726302.3730185
Project(s): EFRA via OpenAIRE
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
kANNolo: sweet and smooth approximate k-nearest neighbors search
Delfino L., Erriquez D., Martinico S., Nardini F. M., Rulli C., Venturini R.
Approximate Nearest Neighbors (ANN) search is a crucial task in several applications like recommender systems and information retrieval. Current state-of-the-art ANN libraries, although being performance-oriented, often lack modularity and ease of use. This translates into them not being fully suitable for easy prototyping and testing of research ideas, an important feature to enable. We address these limitations by introducing kANNolo, a novel—research-oriented—ANN library written in Rust and explicitly designed to combine usability with performance effectively. kANNolo is the first ANN library that supports dense and sparse vector representations made available on top of different similarity measures, e.g., euclidean distance and inner product. Moreover, it also supports vector quantization techniques, e.g., Product Quantization, on top of the indexing strategies implemented. These functionalities are managed through Rust traits, allowing shared behaviors to be handled abstractly. This abstraction ensures flexibility and facilitates an easy integration of new components. In this work, we detail the architecture of kANNolo and demonstrate that its flexibility does not compromise performance. The experimental analysis shows that kANNolo achieves state-of-the-art performance in terms of speed-accuracy trade-off while allowing fast and easy prototyping, thus making kANNolo a valuable tool for advancing ANN research. Source code available on GitHub: https://github.com/TusKANNy/kannolo.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15575, pp. 400-406. Lucca, Italy, 6–10/04/2025
DOI: 10.1007/978-3-031-88717-8_29
DOI: 10.48550/arxiv.2501.06121
Project(s): EFRA via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | dl.acm.org Open Access | CNR IRIS Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Investigating the scalability of approximate sparse retrieval algorithms to massive datasets
Bruch S., Nardini F. M., Rulli C., Venturini R., Venuta L.
Learned sparse text embeddings have gained popularity due to their effectiveness in top-k retrieval and inherent interpretability. Their distributional idiosyncrasies, however, have long hindered their use in real-world retrieval systems. That changed with the recent development of approximate algorithms that leverage the distributional properties of sparse embeddings to speed up retrieval. Nonetheless, in much of the existing literature, evaluation has been limited to datasets with only a few million documents such as MsMarco. It remains unclear how these systems behave on much larger datasets and what challenges lurk in larger scales. To bridge that gap, we investigate the behavior of state-of-the-art retrieval algorithms on massive datasets. We compare and contrast the recently-proposed Seismic and graph-based solutions adapted from dense retrieval. We extensively evaluate Splade embeddings of 138M passages from MsMarco-v2 and report indexing time and other efficiency and effectiveness metrics.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15574, pp. 437-445. Lucca, Italy, 06-10/04/2025
DOI: 10.1007/978-3-031-88714-7_43
Project(s): EFRA via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2025 Other Open Access OPEN
ISTI-day 2025 Proceedings
Del Corso G., Pedrotti A., Federico G., Gennaro C., Carrara F., Amato G., Di Benedetto M., Gabrielli E., Belli D., Matrullo Z., Miori V., Tolomei G., Waheed T., Marchetti E., Calabrò A., Rossetti G., Stella M., Cazabet R., Abramski K., Cau E., Citraro S., Failla A., Mesina V., Morini V., Pansanella V., Colantonio S., Germanese D., Pascali M. A., Bianchi L., Messina N., Falchi F., Barsellotti L., Pacini G., Cassese M., Puccetti G., Esuli A., Volpi L., Moreo A., Sebastiani F., Sperduti G., Nguyen D., Broccia G., Ter Beek M. H., Ferrari A., Massink M., Belmonte G., Ciancia V., Papini O., Canapa G., Catricalà B., Manca M., Paternò F., Santoro C., Zedda E., Gallo S., Maenza S., Mattioli A., Simeoli L., Rucci D., Carlini E., Dazzi P., Kavalionak H., Mordacchini M., Rulli C., Muntean Cristina Ioana, Nardini F. M., Perego R., Rocchietti G., Lettich F., Renso C., Pugliese C., Casini G., Haldimann J., Meyer T., Assante M., Candela L., Dell'Amico A., Frosini L., Mangiacrapa F., Oliviero A., Pagano P., Panichi G., Peccerillo B., Procaccini M., Mannocci A., Manghi P., Lonetti F., Kang D., Di Giandomenico F., Jee E., Lazzini G., Conti F., Scopigno R., D'Acunto M., Moroni D., Cafiso M., Paradisi P., Callieri M., Pavoni G., Corsini M., De Falco A., Sala F., Saraceni Q., Gattiglia G.
ISTI-Day is an annual information and networking event organized by the Institute of Information Science and Technologies "A. Faedo" (ISTI) of the Italian National Research Council (CNR). This event features an opening talk of the Director of the Dept. DIITET (Emilio F. Campana) as well as an overview of the Institute's activities presented by the ISTI Director (Roberto Scopigno). Those institutional segments are complemented by dedicated presentations and round tables featuring former staff members, as well as internal and external collaborators. To foster a network of knowledge and collaboration among newcomers, the 2025 ISTI Day edition also includes a large poster session that provides a comprehensive overview of current research activities. Each of the 13 laboratories contributes 1–3 posters, highlighting the most innovative work and offering early-career researchers a platform for discussion. Thus these proceedings include the posters selected for ISTI-Day 2025, reflecting the diverse and innovative nature of the Institute's research.

See at: CNR IRIS Open Access | www.isti.cnr.it Open Access | CNR IRIS Restricted


2025 Journal article Open Access OPEN
Neural network compression using binarization and few full-precision weights
Nardini F. M., Rulli C., Trani S., Venturini R.
Quantization and pruning are two effective Deep Neural Network model compression methods. In this paper, we propose Automatic Prune Binarization (APB), a novel compression technique combining quantization with pruning. APB enhances the representational capability of binary networks using a few full-precision weights. Our technique jointly maximizes the accuracy of the network while minimizing its memory impact by deciding whether each weight should be binarized or kept in full precision. We show how to efficiently perform a forward pass through layers compressed using APB by decomposing it into a binary and a sparse-dense matrix multiplication. Moreover, we design two novel efficient algorithms for extremely quantized matrix multiplication on CPU, leveraging highly efficient bitwise operations. The proposed algorithms are 6.9× and 1.5× faster than available state-of-the-art solutions. We extensively evaluate APB on two widely adopted model compression datasets, namely CIFAR-10 and ImageNet. APB shows to deliver better accuracy/memory trade-off compared to state-of-the-art methods based on i) quantization, ii) pruning, and iii) a combination of pruning and quantization. APB also outperforms quantization in the accuracy/efficiency trade-off, being up to 2× faster than the 2-bits quantized model with no loss in accuracy.Source: INFORMATION SCIENCES, vol. 716
DOI: 10.1016/j.ins.2025.122251
DOI: 10.2139/ssrn.4927691
DOI: 10.48550/arxiv.2306.08960
Project(s): EFRA via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | Information Sciences Open Access | CNR IRIS Open Access | www.sciencedirect.com Open Access | ZENODO Open Access | Software Heritage Restricted | Software Heritage Restricted | doi.org Restricted | doi.org Restricted | GitHub Restricted | GitHub Restricted | GitHub Restricted | GitHub Restricted | GitHub Restricted | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Efficient conversational search via topical locality in dense retrieval
Muntean Cristina Ioana, Nardini F. M., Perego R., Rocchietti G., Rulli C.
Pre-trained language models have been widely exploited to learn dense representations of documents and queries for information retrieval. While previous efforts have primarily focused on improving effectiveness and user satisfaction, response time remains a critical bottleneck of conversational search systems. To address this, we exploit the topical locality inherent in conversational queries, i.e., the tendency of queries within a conversation to focus on related topics. By leveraging query embedding similarities, we dynamically restrict the search space to semantically relevant document clusters, reducing computational complexity without compromising retrieval quality. We evaluate our approach on the TREC CAsT, 2019 and 2020 datasets using multiple embedding models and vector indexes, achieving improvements in processing speed of up to 10.3X with little loss in performance (4.3X without any loss). Our results show that the proposed system effectively handles complex, multi-turn queries with high precision and efficiency, offering a practical solution for real-time conversational search.DOI: 10.1145/3726302.3730186
DOI: 10.48550/arxiv.2504.21507
Project(s): EFRA via OpenAIRE, Future Artificial Intelligence Research” - Spoke 1” Human-centered AI”
Metrics:


See at: arXiv.org e-Print Archive Open Access | dl.acm.org Open Access | CNR IRIS Open Access | doi.org Restricted | doi.org Restricted | Archivio della Ricerca - Università di Pisa Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
LongDoc summarization using instruction-tuned large language models for food safety regulations
Rocchietti G., Rulli C., Randl K., Muntean C., Nardini F. M., Perego R., Trani S., Karvounis M., Janostik J.
We design and implement a summarization pipeline for regulatory documents, focusing on two main objectives: creating two silver standard datasets using instruction-tuned large language models (LLMs) and finetuning smaller LLMs to perform summarization of regulatory text. In the first task, we employ state-of-the-art models, Cohere C4AI Command-R-4bit and Llama-3-8B, to generate summaries of regulatory documents. These generated summaries serve as ground-truth data for the second task, where we finetune three general-purpose LLMs to specialize in high-quality summary generation for specific documents while reducing the computational requirements. Specifically, we finetune two Google Flan-T5 models using datasets generated by Llama-3-8B and Cohere C4AI, and we create a quantized (4-bit) version of Google Gemma 2-B based on summaries from Cohere C4AI. Additionally, we initiated a pilot activity involving legal experts from SGS-Digicomply to validate the effectiveness of our summarization pipeline.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3802, pp. 33-42. Udine, Italy, 5-6/09/2024
Project(s): EFRA via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Seismic: efficient and effective retrieval over learned sparse representation
Bruch S., Nardini F. M., Rulli C., Venturini R.
Learned sparse representations form an attractive class of contextual embeddings for text retrieval thanks to their effectiveness and interpretability. Retrieval over sparse embeddings remains challenging due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance, such as BM25. Recognizing this challenge, recent research trades off exactness for efficiency, moving to approximate retrieval systems. In this work1, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. Experiments on the Splade and E-Splade embeddings on the Ms Marco and NQ datasets show that our approach is up to 21× time faster than the winning (graph-based) submissions to the BigANN Challenge.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3802. Udine, Italy, 05-06/09/2024

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Pairing clustered inverted indexes with κ-NN graphs for fast approximate retrieval over learned sparse representations
Bruch S., Nardini F. M., Rulli C., Venturini R.
Learned sparse representations form an effective and interpretable class of embeddings for text retrieval. While exact top-k retrieval over such embeddings faces efficiency challenges, a recent algorithm called Seismic has enabled remarkably fast, highly-accurate approximate retrieval. Seismic statically prunes inverted lists, organizes each list into geometrically-cohesive blocks, and augments each block with a summary vector. At query time, each inverted list associated with a query term is traversed one block at a time in an arbitrary order, with the inner product between the query and summaries determining if a block must be evaluated. When a block is deemed promising, its documents are fully evaluated with a forward index. Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and significantly outperforms the winning graph-based submissions to the BigANN 2023 Challenge. In this work, we speed up Seismic further by introducing two innovations to its query processing subroutine. First, we traverse blocks in order of importance, rather than arbitrarily. Second, we take the list of documents retrieved by Seismic and expand it to include the neighbors of each document using an offline k-regular nearest neighbor graph; the expanded list is then ranked to produce the final top-k set. Experiments on two public datasets show that our extension, named SeismicWave, can reach almost-exact accuracy levels and is up to 2.2x faster than Seismic.DOI: 10.1145/3627673.3679977
DOI: 10.48550/arxiv.2408.04443
Project(s): EFRA via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | dl.acm.org Open Access | CNR IRIS Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted


2024 Contribution to conference Restricted
Distilled neural networks for efficient learning to rank: (Extended Abstract)
Nardini F. M., Rulli C., Trani S., Venturini R.
Recent studies in Learning to Rank (LtR) have shown the possibility of effectively distilling a neural network from an ensemble of regression trees. This fully enables the use of neural-based ranking models in query processors of modern Web search engines. Nevertheless, ensembles of regression trees outperform neural models both in terms of efficiency and effectiveness on CPU. In this paper, we propose a framework to design and train neural networks outperforming ensembles of regression trees. After distilling the networks from tree-based models, we exploit an efficiency-oriented pruning technique that works by sparsifying the most computationally intensive layers of the model. Moreover, we develop inference time predictors, which help devise neural network architectures that match the desired efficiency requirements. Comprehensive experiments on two public learning-to-rank datasets show that the neural networks produced with our novel approach are competitive in terms of effectiveness-efficiency trade-off when compared with tree-based ensembles by providing up to 4x inference time speed-up without degradation of the ranking quality.Source: PROCEEDINGS - INTERNATIONAL CONFERENCE ON DATA ENGINEERING, pp. 5693-5694. Utrecht, Netherlands, 13-16/05/2024
DOI: 10.1109/icde60146.2024.00478
Metrics:


See at: doi.org Restricted | CNR IRIS Restricted | ieeexplore.ieee.org Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Efficient multi-vector dense retrieval with bit vectors
Nardini F. M., Rulli C., Venturini R.
Dense retrieval techniques employ pre-trained large language models to build high-dimensional representations of queries and passages. These representations compute the relevance of a passage with respect to a query using efficient similarity measures. Multi-vector representations show improved effectiveness but come with a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID addressed these challenges by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, reducing the cost of successive ranking stages. This paper proposes "Efficient Multi-Vector Dense Retrieval with Bit Vectors" (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, leveraging SIMD instructions to reduce latency. Third, EMVB uses Product Quantization (PQ) to reduce the memory footprint of storing vector representations while allowing for fast late interaction. Finally, we introduce a per-document term filtering method that further improves the efficiency of the final step. Experiments on MS MARCO and LoTTE demonstrate that EMVB is up to 2.8× faster and reduces the memory footprint by 1.8× without any loss in retrieval accuracy compared to PLAID.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 14609, pp. 3-17. Glasgow, UK, 24–28/03/2024
DOI: 10.1007/978-3-031-56060-6_1
Project(s): EFRA via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | doi.org Restricted | Archivio della Ricerca - Università di Pisa Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Efficient inverted indexes for approximate retrieval over learned sparse representations
Bruch S., Nardini F. M., Rulli C., Venturini R.
Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over sparse embeddings remains challenging. That is due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance such as BM25. Recognizing this challenge, a great deal of research has gone into, among other things, designing retrieval algorithms tailored to the properties of learned sparse representations, including approximate retrieval systems. In fact, this task featured prominently in the latest BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on a large benchmark dataset by throughput and recall. In this work, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. As we show experimentally, single-threaded query processing using our method, Seismic, reaches sub-millisecond per-query latency on various sparse embeddings of the Ms Marco dataset while maintaining high recall. Our results indicate that Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and further outperforms the winning (graph-based) submissions to the BigANN Challenge by a significant margin.DOI: 10.1145/3626772.3657769
DOI: 10.48550/arxiv.2404.18812
Project(s): EFRA via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | dl.acm.org Open Access | CNR IRIS Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Efficient and effective multi-vector dense retrieval with EMVB
Nardini F. M., Rulli C., Venturini R.
Dense retrieval techniques utilize large pre-trained language models to construct a high-dimensional representation of queries and passages. These representations assess the relevance of a passage concerning a query through efficient similarity measures. Multi-vector representations, while enhancing effectiveness, cause a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. The current state-of-the-art approach, namely PLAID, has introduced a centroid-based term representation to mitigate the memory impact of multi-vector systems. By employing a centroid interaction mechanism, PLAID filters out non-relevant documents, reducing the cost of subsequent ranking stages. This paper1 introduces "Efficient Multi-Vector dense retrieval with Bit vectors" (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. Firstly, EMVB utilizes an optimized bit vector pre-filtering step for passages, enhancing efficiency. Secondly, the computation of centroid interaction occurs column-wise, leveraging SIMD instructions to reduce latency. Thirdly, EMVB incorporates Product Quantization (PQ) to decrease the memory footprint of storing vector representations while facilitating fast late interaction. Lastly, a per-document term filtering method is introduced, further improving the efficiency of the final step. Experiments conducted on MS MARCO and LoTTE demonstrate that EMVB achieves up to a 2.8× speed improvement while reducing the memory footprint by 1.8×, without compromising retrieval accuracy compared to PLAID.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3741, pp. 281-289. Villasimius, Sud Sardegna, Italy (virtual due to Covid-19 pandemic), 23-26/06/2024

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2022 Journal article Open Access OPEN
Distilled neural networks for efficient learning to rank
Nardini Fm, Rulli C, Trani S, Venturini R
Recent studies in Learning to Rank have shown the possibility to effectively distill a neural network from an ensemble of regression trees. This result leads neural networks to become a natural competitor of tree-based ensembles on the ranking task. Nevertheless, ensembles of regression trees outperform neural models both in terms of efficiency and effectiveness, particularly when scoring on CPU. In this paper, we propose an approach for speeding up neural scoring time by applying a combination of Distillation, Pruning and Fast Matrix multiplication. We employ knowledge distillation to learn shallow neural networks from an ensemble of regression trees. Then, we exploit an efficiency-oriented pruning technique that performs a sparsification of the most computationally-intensive layers of the neural network that is then scored with optimized sparse matrix multiplication. Moreover, by studying both dense and sparse high performance matrix multiplication, we develop a scoring time prediction model which helps in devising neural network architectures that match the desired efficiency requirements. Comprehensive experiments on two public learning-to-rank datasets show that neural networks produced with our novel approach are competitive at any point of the effectiveness-efficiency trade-off when compared with tree-based ensembles, providing up to 4x scoring time speed-up without affecting the ranking quality.Source: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (ONLINE), vol. 35 (issue 5), pp. 4695-4712
DOI: 10.1109/tkde.2022.3152585
Metrics:


See at: CNR IRIS Open Access | ieeexplore.ieee.org Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted