Busolin F., Lucchese C., Nardini F. M., Orlando S., Perego R., Trani S., Veneri A.
Efficiency Early exit LLM-based rankers
Pre-trained language models based on transformer networks arehighly effective for document re-ranking in ad-hoc search. Amongthese, cross-encoders stand out for their effectiveness, as they pro-cess query-document pairs through the entire transformer networkto compute ranking scores. However, this traversal is computation-ally expensive. To address this, prior work has explored early-exitstrategies, enabling the model to terminate the traversal of query-document pairs. These techniques rely on learned classifiers, placedafter each transformer block, that decide if a query-document paircan be dropped. Diverging from previous approaches, we proposeSimilarity-based Early Exit ( SEE ), a novel—non-learned—strategythat exploits the similarities between query and document tokenembeddings to early-terminate the inference of documents that willmost likely be non-relevant to the query. Even though SEE can beused after every transformer block, we show that the best advan-tage is achieved when applied before the first transformer block,thus saving most of the inference cost for the query-document pairs.Reproducible experiments on 17 public datasets covering in-domainand out-of-domain evaluation show that SEE can be effectively ap-plied to four different cross-encoders, achieving speedups of up to3.5× with a limited loss in ranking effectiveness.
Publisher: Association for Computing Machinery
@inproceedings{oai:iris.cnr.it:20.500.14243/562499,
title = {Efficient re-ranking with cross-encoders via early exit},
author = {Busolin F. and Lucchese C. and Nardini F. M. and Orlando S. and Perego R. and Trani S. and Veneri A.},
publisher = {Association for Computing Machinery},
doi = {10.1145/3726302.3729962},
year = {2025}
}