22 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2026 Conference article Open Access OPEN
Multivector Reranking in the era of strong first-stage retrievers
Martinico Silvio, Nardini Franco Maria, Rulli Cosimo, Venturini Rossano
Learned multivector representations power modern search systems with strong retrieval effectiveness, but their real-world use is limited by the high cost of exhaustive token-level retrieval. Therefore, most systems adopt a gather-and-refine strategy, where a lightweight gather phase selects candidates for full scoring. However, this approach requires expensive searches over large token-level indexes and often misses the documents that would rank highest under full similarity. In this paper, we reproduce several state-of-the-art multivector retrieval methods on two publicly available datasets, providing a clear picture of the current multivector retrieval field and observing the inefficiency of token-level gathering. Building on top of that, we show that replacing the token-level gather phase with a single-vector document retriever—specifically, a learned sparse retriever (LSR)—produces a smaller and more semantically coherent candidate set. This recasts the gather-and-refine pipeline into the well-established two-stage retrieval architecture. As retrieval latency decreases, query encoding with two neural encoders becomes the dominant computational bottleneck. To mitigate this, we integrate recent inference-free LSR methods, demonstrating that they preserve the retrieval effectiveness of the dual-encoder pipeline while substantially reducing query encoding time. Finally, we investigate multiple reranking configurations that balance efficiency, memory, and effectiveness, and we introduce two optimization techniques that prune low-quality candidates early. Empirical results show that these techniques improve retrieval efficiency by up to 1.8× with no loss in quality. Overall, our two-stage approach achieves over 24× speedup over the state-of-the-art multivector retrieval systems, while maintaining comparable or superior retrieval quality.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 16485, pp. 49-65. Delft, The Netherlands, 29/03-02/04/2026
DOI: 10.1007/978-3-032-21324-2_4
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2026 Conference article Open Access OPEN
Forward index compression for learned sparse retrieval
Bruch Sebastian, Fontana Martino, Nardini Franco Maria, Rulli Cosimo, Venturini Rossano
Text retrieval using learned sparse representations of queries and documents has, over the years, evolved into a highly effective approach to search. It is thanks to recent advances in approximate nearest neighbor search—with the emergence of highly efficient algorithms such as the inverted index-based Seismic and the graph-based Hnsw—that retrieval with sparse representations became viable in practice. In this work, we scrutinize the efficiency of sparse retrieval algorithms and focus particularly on the size of a data structure that is common to all algorithmic flavors and that constitutes a substantial fraction of the overall index size: the forward index. In particular, we seek compression techniques to reduce the storage footprint of the forward index without compromising search quality or inner product computation latency. In our examination with various integer compression techniques, we report that StreamVByte achieves the best trade-off between memory footprint, retrieval accuracy, and latency. We then improve StreamVByte by introducing DotVByte, a new algorithm tailored to inner product computation. Experiments on MsMarco show that our improvements lead to significant space savings while maintaining retrieval efficiency.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 16484, pp. 444-451. Delft, The Netherlands, 29/03-02/04/2026
DOI: 10.1007/978-3-032-21300-6_35
DOI: 10.48550/arxiv.2602.05445
Metrics:


See at: arXiv.org e-Print Archive Open Access | CNR IRIS Open Access | link.springer.com Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2026 Conference article Restricted
Evaluating the efficiency and effectiveness of learned sparse retrieval with the lsr_benchmark
Frobe Maik, Schlatt Ferdinand, Rulli Cosimo, Hagen Tim, Merker Jan Heinrich, Hendriksen Gijs, Lassance Carlos, Nardini Franco Maria, Venturini Rossano, Potthast Martin
Learned sparse retrieval (LSR) models exhibit varying trade-offs between effectiveness and efficiency. But while standard tools exist for evaluating LSR effectiveness, there is none for evaluating efficiency. Also, datasets with high-quality relevance judgments are too large for repeated efficiency experiments, e.g., on different hardware configurations. To promote the evaluation of LSR models in terms of their effectiveness and efficiency, we introduce the lsr_benchmark, which measures retrieval efficiency at each step of an LSR pipeline (document embedding, indexing, query embedding, and retrieval) as well as its overall effectiveness. To ensure tractability and extensibility, we apply current corpus subsampling methods to eleven TREC tasks, precompute embeddings with eleven LSR models per task, and evaluate eight retrieval engines as baselines. For the benchmark’s hosted version, a modular API, along with tools for evaluating effectiveness and efficiency, facilitates the submission of new approaches. Our experiments show that the chosen embedding model significantly affects the efficiency of a retrieval engine and that LSR is more effective but less efficient than BM25—an efficiency gap that our benchmark now tracks as new LSR models are published.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 16486, pp. 528-543. Delft, The Netherlands, 29/03-02/04/2026
DOI: 10.1007/978-3-032-21321-1_57
Metrics:


See at: doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted | link.springer.com Restricted


2026 Conference article Open Access OPEN
Neural lexical search with learned sparse retrieval
Yates Andrew, Lassance Carlos, Rulli Cosimo, Yang Eugene, Macavaney Sean, Singh Siddharth A. K., Nguyen Thong, Lei Yibin
Learned Sparse Retrieval (LSR) techniques use neural machinery to represent queries and documents as learned bags of words. In contrast with other neural retrieval techniques, such as generative retrieval and dense retrieval, LSR has been shown to be a remarkably robust, transferable, and efficient family of methods for retrieving high-quality search results. This half-day tutorial aims to provide an extensive overview of LSR, ranging from its fundamentals to the latest emerging techniques. By the end of the tutorial, attendees will be familiar with the important design decisions of an LSR model, know how to apply them to text and other modalities, and understand the latest techniques for retrieving with them efficiently. Website: https://lsr-tutorial.github.io.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 16486, pp. 35-43. Delft, The Netherlands, 29/03-02/04/2026
DOI: 10.1007/978-3-032-21321-1_5
DOI: 10.1145/3726302.3731693
Project(s): HEFPA via OpenAIRE
Metrics:


See at: IRIS Cnr Open Access | IRIS Cnr Open Access | Universiteit van Amsterdam (UvA) Institutional Repository UvA-DARE Open Access | Universiteit van Amsterdam (UvA) Institutional Repository UvA-DARE Open Access | IRIS Cnr Open Access | doi.org Restricted | DBLP Restricted | CNR IRIS Restricted | CNR IRIS Restricted | CNR IRIS Restricted | link.springer.com Restricted


2025 Journal article Open Access OPEN
ChatGPT versus modest large language models: an extensive study on benefits and drawbacks for conversational search
Rocchietti G., Rulli C., Nardini F. M., Muntean Cristina Ioana, Perego R., Frieder O.
Large Language Models (LLMs) are effective in modeling text syntactic and semantic content, making them a strong choice to perform conversational query rewriting. While previous approaches proposed NLP-based custom models, requiring significant engineering effort, our approach is straightforward and conceptually simpler. Not only do we improve effectiveness over the current state-of-the-art, but we also curate the cost and efficiency aspects. We explore the use of pre-trained LLMs fine-tuned to generate quality user query rewrites, aiming to reduce computational costs while maintaining or improving retrieval effectiveness. As a first contribution, we study various prompting approaches - including zero, one, and few-shot methods - with ChatGPT (e.g., gpt-3.5-turbo). We observe an increase in the quality of rewrites leading to improved retrieval. We then fine-tuned smaller open LLMs on the query rewriting task. Our results demonstrate that our fine-tuned models, including the smallest with 780 million parameters, achieve better performance during the retrieval phase than gpt-3.5-turbo. To fine-tune the selected models, we used the QReCC dataset, which is specifically designed for query rewriting tasks. For evaluation, we used the TREC CAsT datasets to assess the retrieval effectiveness of the rewrites of both gpt-3.5-turbo and our fine-tuned models. Our findings show that fine-tuning LLMs on conversational query rewriting datasets can be more effective than relying on generic instruction-tuned models or traditional query reformulation techniques.Source: IEEE ACCESS, vol. 13, pp. 15253-15271
DOI: 10.1109/access.2025.3529741
Metrics:


See at: IEEE Access Open Access | IEEE Access Open Access | CNR IRIS Open Access | ieeexplore.ieee.org Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Efficient approximate nearest neighbor search on a raspberry Pi
Martinico S., Nardini F. M., Rulli C., Venturini R.
Approximate Nearest Neighbors (ANN) search is a core task in Information Retrieval. However, the high computational demands and reliance on expensive infrastructures limit broader contributions to ANN research. Enabling efficient and effective ANN search on low-resource devices would allow researchers in low-income countries to participate in the ANN community, thereby democratizing the field. Despite its potential, the IR literature offers little work on the feasibility of ANN search under resource constraints. In this proposal, we explore efficient solutions for large-scale ANN search on low-resource devices. We report a preliminary experimentation highlighting current limitations and outlining future challenges.DOI: 10.1145/3726302.3730268
Project(s): EFRA via OpenAIRE
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Neural lexical search with learned sparse retrieval
Yates A., Lassance C., Rulli C., Yang E., Macavaney S., Singh Siddharth A. K., Nguyen T., Lei Y.
Learned Sparse Retrieval (LSR) techniques use neural machinery to represent queries and documents as learned bags of words. In contrast with other neural retrieval techniques, such as generative retrieval and dense retrieval, LSR has been shown to be a remarkably robust, transferable, and efficient family of methods for retrieving high-quality search results. This half-day tutorial aims to provide an extensive overview of LSR, ranging from its fundamentals to the latest emerging techniques. By the end of the tutorial, attendees will be familiar with the important design decisions of an LSR system, know how to apply them to text and other modalities, and understand the latest techniques for retrieving with them efficiently. Website: https://lsr-tutorial.github.ioDOI: 10.1145/3726302.3731693
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Effective inference-free retrieval for learned sparse representations
Nardini F. M., Nguyen T., Rulli C., Venturini R., Yates A.
Learned Sparse Retrieval (LSR) is an effective IR approach that exploits pre-trained language models for encoding text into a learned bag of words. Several efforts in the literature have shown that sparsity is key to enabling a good trade-off between the efficiency and effectiveness of the query processor. To induce the right degree of sparsity, researchers typically use regularization techniques when training LSR models. Recently, new efficient-inverted index-based-retrieval engines have been proposed, leading to a natural question: has the role of regularization changed in training LSR models? In this paper, we conduct an extended evaluation of regularization approaches for LSR where we discuss their effectiveness, efficiency, and out-of-domain generalization capabilities. We first show that regularization can be relaxed to produce more effective LSR en- coders. We also show that query encoding is now the bottleneck limiting the overall query processor performance. To remove this bottleneck, we advance the state-of-the-art of inference-free LSR by proposing Learned Inference-free Retrieval (Li-Lsr). At training time, Li-Lsr learns a score for each token, casting the query encoding step into a seamless table lookup. Our approach yields state-of-the-art effectiveness for both in-domain and out-of-domain evaluation,surpassing Splade-v3-Doc by 1 point of mRR@10 on MsMarco and 1.8 points of nDCG@10 on Beir.DOI: 10.1145/3726302.3730185
Project(s): EFRA via OpenAIRE
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
kANNolo: sweet and smooth approximate k-nearest neighbors search
Delfino L., Erriquez D., Martinico S., Nardini F. M., Rulli C., Venturini R.
Approximate Nearest Neighbors (ANN) search is a crucial task in several applications like recommender systems and information retrieval. Current state-of-the-art ANN libraries, although being performance-oriented, often lack modularity and ease of use. This translates into them not being fully suitable for easy prototyping and testing of research ideas, an important feature to enable. We address these limitations by introducing kANNolo, a novel—research-oriented—ANN library written in Rust and explicitly designed to combine usability with performance effectively. kANNolo is the first ANN library that supports dense and sparse vector representations made available on top of different similarity measures, e.g., euclidean distance and inner product. Moreover, it also supports vector quantization techniques, e.g., Product Quantization, on top of the indexing strategies implemented. These functionalities are managed through Rust traits, allowing shared behaviors to be handled abstractly. This abstraction ensures flexibility and facilitates an easy integration of new components. In this work, we detail the architecture of kANNolo and demonstrate that its flexibility does not compromise performance. The experimental analysis shows that kANNolo achieves state-of-the-art performance in terms of speed-accuracy trade-off while allowing fast and easy prototyping, thus making kANNolo a valuable tool for advancing ANN research. Source code available on GitHub: https://github.com/TusKANNy/kannolo.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15575, pp. 400-406. Lucca, Italy, 6–10/04/2025
DOI: 10.1007/978-3-031-88717-8_29
DOI: 10.48550/arxiv.2501.06121
Project(s): EFRA via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | dl.acm.org Open Access | CNR IRIS Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Investigating the scalability of approximate sparse retrieval algorithms to massive datasets
Bruch S., Nardini F. M., Rulli C., Venturini R., Venuta L.
Learned sparse text embeddings have gained popularity due to their effectiveness in top-k retrieval and inherent interpretability. Their distributional idiosyncrasies, however, have long hindered their use in real-world retrieval systems. That changed with the recent development of approximate algorithms that leverage the distributional properties of sparse embeddings to speed up retrieval. Nonetheless, in much of the existing literature, evaluation has been limited to datasets with only a few million documents such as MsMarco. It remains unclear how these systems behave on much larger datasets and what challenges lurk in larger scales. To bridge that gap, we investigate the behavior of state-of-the-art retrieval algorithms on massive datasets. We compare and contrast the recently-proposed Seismic and graph-based solutions adapted from dense retrieval. We extensively evaluate Splade embeddings of 138M passages from MsMarco-v2 and report indexing time and other efficiency and effectiveness metrics.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15574, pp. 437-445. Lucca, Italy, 06-10/04/2025
DOI: 10.1007/978-3-031-88714-7_43
Project(s): EFRA via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2025 Other Open Access OPEN
ISTI-day 2025 Proceedings
Del Corso G., Pedrotti A., Federico G., Gennaro C., Carrara F., Amato G., Di Benedetto M., Gabrielli E., Belli D., Matrullo Zoe, Miori V., Tolomei Gabriele, Waheed T., Marchetti E., Calabrò Antonello., Rossetti G., Stella Massimo, Cazabet Rémy, Abramski K., Cau E., Citraro S., Failla A., Mesina V., Morini V., Pansanella V., Colantonio S., Germanese D., Pascali M. A., Bianchi L., Messina N., Falchi F., Barsellotti L., Pacini G., Cassese M., Puccetti G., Esuli A., Volpi L., Moreo Alejandro, Sebastiani F., Sperduti G., Nguyen Dong, Broccia G., Ter Beek M. H., Ferrari A., Massink M., Belmonte Gina, Ciancia V., Papini O., Canapa G., Catricalà B., Manca M., Paternò F., Santoro C., Zedda E., Gallo S., Maenza S., Mattioli A., Simeoli L., Rucci D., Carlini E., Dazzi P., Kavalionak H., Mordacchini M., Rulli C., Muntean Cristina Ioana, Nardini F. M., Perego R., Rocchietti G., Lettich F., Renso C., Pugliese C., Casini G., Haldimann Jonas, Meyer Thomas, Assante M., Candela L., Dell'Amico A., Frosini L., Mangiacrapa F., Oliviero A., Pagano P., Panichi G., Peccerillo B., Procaccini M., Mannocci A., Manghi P., Lonetti F., Kang Dongjae, Di Giandomenico F., Jee Eunkyoung, Lazzini G., Conti F., Scopigno R., D'Acunto M., Moroni D., Cafiso M., Paradisi P., Callieri M., Pavoni G., Corsini M., De Falco A., Sala F., Saraceni Q., Gattiglia Gabriele
ISTI-Day is an annual information and networking event organized by the Institute of Information Science and Technologies "A. Faedo" (ISTI) of the Italian National Research Council (CNR). This event features an opening talk of the Director of the Dept. DIITET (Emilio F. Campana) as well as an overview of the Institute's activities presented by the ISTI Director (Roberto Scopigno). Those institutional segments are complemented by dedicated presentations and round tables featuring former staff members, as well as internal and external collaborators. To foster a network of knowledge and collaboration among newcomers, the 2025 ISTI Day edition also includes a large poster session that provides a comprehensive overview of current research activities. Each of the 13 laboratories contributes 1–3 posters, highlighting the most innovative work and offering early-career researchers a platform for discussion. Thus these proceedings include the posters selected for ISTI-Day 2025, reflecting the diverse and innovative nature of the Institute's research.

See at: CNR IRIS Open Access | www.isti.cnr.it Open Access | CNR IRIS Restricted


2025 Journal article Open Access OPEN
Neural network compression using binarization and few full-precision weights
Nardini F. M., Rulli C., Trani S., Venturini R.
Quantization and pruning are two effective Deep Neural Network model compression methods. In this paper, we propose Automatic Prune Binarization (APB), a novel compression technique combining quantization with pruning. APB enhances the representational capability of binary networks using a few full-precision weights. Our technique jointly maximizes the accuracy of the network while minimizing its memory impact by deciding whether each weight should be binarized or kept in full precision. We show how to efficiently perform a forward pass through layers compressed using APB by decomposing it into a binary and a sparse-dense matrix multiplication. Moreover, we design two novel efficient algorithms for extremely quantized matrix multiplication on CPU, leveraging highly efficient bitwise operations. The proposed algorithms are 6.9× and 1.5× faster than available state-of-the-art solutions. We extensively evaluate APB on two widely adopted model compression datasets, namely CIFAR-10 and ImageNet. APB shows to deliver better accuracy/memory trade-off compared to state-of-the-art methods based on i) quantization, ii) pruning, and iii) a combination of pruning and quantization. APB also outperforms quantization in the accuracy/efficiency trade-off, being up to 2× faster than the 2-bits quantized model with no loss in accuracy.Source: INFORMATION SCIENCES, vol. 716
DOI: 10.1016/j.ins.2025.122251
DOI: 10.2139/ssrn.4927691
DOI: 10.48550/arxiv.2306.08960
Project(s): EFRA via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | Information Sciences Open Access | CNR IRIS Open Access | www.sciencedirect.com Open Access | ZENODO Open Access | Software Heritage Restricted | Software Heritage Restricted | doi.org Restricted | doi.org Restricted | GitHub Restricted | GitHub Restricted | GitHub Restricted | GitHub Restricted | GitHub Restricted | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Efficient conversational search via topical locality in dense retrieval
Muntean Cristina Ioana, Nardini F. M., Perego R., Rocchietti G., Rulli C.
Pre-trained language models have been widely exploited to learn dense representations of documents and queries for information retrieval. While previous efforts have primarily focused on improving effectiveness and user satisfaction, response time remains a critical bottleneck of conversational search systems. To address this, we exploit the topical locality inherent in conversational queries, i.e., the tendency of queries within a conversation to focus on related topics. By leveraging query embedding similarities, we dynamically restrict the search space to semantically relevant document clusters, reducing computational complexity without compromising retrieval quality. We evaluate our approach on the TREC CAsT, 2019 and 2020 datasets using multiple embedding models and vector indexes, achieving improvements in processing speed of up to 10.3X with little loss in performance (4.3X without any loss). Our results show that the proposed system effectively handles complex, multi-turn queries with high precision and efficiency, offering a practical solution for real-time conversational search.DOI: 10.1145/3726302.3730186
DOI: 10.48550/arxiv.2504.21507
Project(s): EFRA via OpenAIRE, Future Artificial Intelligence Research” - Spoke 1” Human-centered AI”
Metrics:


See at: arXiv.org e-Print Archive Open Access | dl.acm.org Open Access | CNR IRIS Open Access | doi.org Restricted | doi.org Restricted | Archivio della Ricerca - Università di Pisa Restricted | CNR IRIS Restricted


2025 Conference article Open Access OPEN
FoodSafeSum: enabling natural language processing applications for food safety document summarization and analysis
Bakagianni J., Randl K., Rocchietti G., Rulli C., Nardini F. M., Henriksson A., Trani S., Romanova A., Pavlopoulos J.
Food safety demands timely detection, regulation, and public communication, yet the lack of structured datasets hinders Natural Language Processing (NLP) research. We present and release a new dataset of human-written and Large Language Model (LLM)-generated summaries of food safety documents, plus food safety related metadata. We evaluate its utility on three NLP tasks directly reflecting food safety practices: multilabel classification for organizing documents into domain-specific categories; document retrieval for accessing regulatory and scientific evidence; and question answering via retrieval-augmented generation that improves factual accuracy. We show that LLM summaries perform comparably or better than human ones across tasks. We also demonstrate clustering of summaries for event tracking and compliance monitoring. This dataset enables NLP applications that support core food safety practices, including the organization of regulatory and scientific evidence, monitoring of compliance issues, and communication of risks to the public.DOI: 10.18653/v1/2025.findings-emnlp.911
Project(s): MIS
Metrics:


See at: aclanthology.org Open Access | CNR IRIS Open Access | doi.org Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
LongDoc summarization using instruction-tuned large language models for food safety regulations
Rocchietti G., Rulli C., Randl K., Muntean C., Nardini F. M., Perego R., Trani S., Karvounis M., Janostik J.
We design and implement a summarization pipeline for regulatory documents, focusing on two main objectives: creating two silver standard datasets using instruction-tuned large language models (LLMs) and finetuning smaller LLMs to perform summarization of regulatory text. In the first task, we employ state-of-the-art models, Cohere C4AI Command-R-4bit and Llama-3-8B, to generate summaries of regulatory documents. These generated summaries serve as ground-truth data for the second task, where we finetune three general-purpose LLMs to specialize in high-quality summary generation for specific documents while reducing the computational requirements. Specifically, we finetune two Google Flan-T5 models using datasets generated by Llama-3-8B and Cohere C4AI, and we create a quantized (4-bit) version of Google Gemma 2-B based on summaries from Cohere C4AI. Additionally, we initiated a pilot activity involving legal experts from SGS-Digicomply to validate the effectiveness of our summarization pipeline.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3802, pp. 33-42. Udine, Italy, 5-6/09/2024
Project(s): EFRA via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Seismic: efficient and effective retrieval over learned sparse representation
Bruch S., Nardini F. M., Rulli C., Venturini R.
Learned sparse representations form an attractive class of contextual embeddings for text retrieval thanks to their effectiveness and interpretability. Retrieval over sparse embeddings remains challenging due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance, such as BM25. Recognizing this challenge, recent research trades off exactness for efficiency, moving to approximate retrieval systems. In this work1, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. Experiments on the Splade and E-Splade embeddings on the Ms Marco and NQ datasets show that our approach is up to 21× time faster than the winning (graph-based) submissions to the BigANN Challenge.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3802. Udine, Italy, 05-06/09/2024

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Pairing clustered inverted indexes with κ-NN graphs for fast approximate retrieval over learned sparse representations
Bruch S., Nardini F. M., Rulli C., Venturini R.
Learned sparse representations form an effective and interpretable class of embeddings for text retrieval. While exact top-k retrieval over such embeddings faces efficiency challenges, a recent algorithm called Seismic has enabled remarkably fast, highly-accurate approximate retrieval. Seismic statically prunes inverted lists, organizes each list into geometrically-cohesive blocks, and augments each block with a summary vector. At query time, each inverted list associated with a query term is traversed one block at a time in an arbitrary order, with the inner product between the query and summaries determining if a block must be evaluated. When a block is deemed promising, its documents are fully evaluated with a forward index. Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and significantly outperforms the winning graph-based submissions to the BigANN 2023 Challenge. In this work, we speed up Seismic further by introducing two innovations to its query processing subroutine. First, we traverse blocks in order of importance, rather than arbitrarily. Second, we take the list of documents retrieved by Seismic and expand it to include the neighbors of each document using an offline k-regular nearest neighbor graph; the expanded list is then ranked to produce the final top-k set. Experiments on two public datasets show that our extension, named SeismicWave, can reach almost-exact accuracy levels and is up to 2.2x faster than Seismic.DOI: 10.1145/3627673.3679977
DOI: 10.48550/arxiv.2408.04443
Project(s): EFRA via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | dl.acm.org Open Access | CNR IRIS Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted


2024 Contribution to conference Restricted
Distilled neural networks for efficient learning to rank: (Extended Abstract)
Nardini F. M., Rulli C., Trani S., Venturini R.
Recent studies in Learning to Rank (LtR) have shown the possibility of effectively distilling a neural network from an ensemble of regression trees. This fully enables the use of neural-based ranking models in query processors of modern Web search engines. Nevertheless, ensembles of regression trees outperform neural models both in terms of efficiency and effectiveness on CPU. In this paper, we propose a framework to design and train neural networks outperforming ensembles of regression trees. After distilling the networks from tree-based models, we exploit an efficiency-oriented pruning technique that works by sparsifying the most computationally intensive layers of the model. Moreover, we develop inference time predictors, which help devise neural network architectures that match the desired efficiency requirements. Comprehensive experiments on two public learning-to-rank datasets show that the neural networks produced with our novel approach are competitive in terms of effectiveness-efficiency trade-off when compared with tree-based ensembles by providing up to 4x inference time speed-up without degradation of the ranking quality.Source: PROCEEDINGS - INTERNATIONAL CONFERENCE ON DATA ENGINEERING, pp. 5693-5694. Utrecht, Netherlands, 13-16/05/2024
DOI: 10.1109/icde60146.2024.00478
Metrics:


See at: doi.org Restricted | CNR IRIS Restricted | ieeexplore.ieee.org Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Efficient multi-vector dense retrieval with bit vectors
Nardini F. M., Rulli C., Venturini R.
Dense retrieval techniques employ pre-trained large language models to build high-dimensional representations of queries and passages. These representations compute the relevance of a passage with respect to a query using efficient similarity measures. Multi-vector representations show improved effectiveness but come with a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID addressed these challenges by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, reducing the cost of successive ranking stages. This paper proposes "Efficient Multi-Vector Dense Retrieval with Bit Vectors" (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, leveraging SIMD instructions to reduce latency. Third, EMVB uses Product Quantization (PQ) to reduce the memory footprint of storing vector representations while allowing for fast late interaction. Finally, we introduce a per-document term filtering method that further improves the efficiency of the final step. Experiments on MS MARCO and LoTTE demonstrate that EMVB is up to 2.8× faster and reduces the memory footprint by 1.8× without any loss in retrieval accuracy compared to PLAID.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 14609, pp. 3-17. Glasgow, UK, 24–28/03/2024
DOI: 10.1007/978-3-031-56060-6_1
Project(s): EFRA via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | doi.org Restricted | Archivio della Ricerca - Università di Pisa Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Efficient inverted indexes for approximate retrieval over learned sparse representations
Bruch S., Nardini F. M., Rulli C., Venturini R.
Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over sparse embeddings remains challenging. That is due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance such as BM25. Recognizing this challenge, a great deal of research has gone into, among other things, designing retrieval algorithms tailored to the properties of learned sparse representations, including approximate retrieval systems. In fact, this task featured prominently in the latest BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on a large benchmark dataset by throughput and recall. In this work, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. As we show experimentally, single-threaded query processing using our method, Seismic, reaches sub-millisecond per-query latency on various sparse embeddings of the Ms Marco dataset while maintaining high recall. Our results indicate that Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and further outperforms the winning (graph-based) submissions to the BigANN Challenge by a significant margin.DOI: 10.1145/3626772.3657769
DOI: 10.48550/arxiv.2404.18812
Project(s): EFRA via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | dl.acm.org Open Access | CNR IRIS Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted