Faggioli Guglielmo, Ferro Nicola, Perego Raffaele, Tonellotto Nicola
Information retrieval
Dense Information Retrieval (IR) systems rely on neural networks to embed documents and queries within a latent low-dimensional space. Among the Dense IR approaches, bi-encoders are particularly popular, as they achieve state-of-the-art performance and allow for efficient encoding of documents and queries. Nevertheless, using this class of systems, by construction, all the documents and queries are represented using the same set of dimensions. In this article, we introduce the Manifold Clustering (MC) hypothesis which states that, for each query, there exists a query-dependent manifold of the original embedding space where the query and documents relevant to it cluster more effectively. We empirically validate the MC hypothesis showing that it is possible to find a query-dependent linear subspace of the original embedding space where high retrieval effectiveness is achieved.
Source: ACM TRANSACTIONS ON INFORMATION SYSTEMS, vol. 44 (issue 1), pp. 1-34
@article{oai:iris.cnr.it:20.500.14243/562501,
title = {Getting off the DIME: dimension pruning via dimension importance estimation for dense information retrieval},
author = {Faggioli Guglielmo and Ferro Nicola and Perego Raffaele and Tonellotto Nicola},
doi = {10.1145/3765619},
year = {2026}
}