2020
Conference article  Open Access

Query-level early exit for additive learning-to-rank ensembles

Lucchese C., Nardini F. M., Orlando S., Perego R., Trani S.

Computer Science - Machine Learning  efficiency/effectiveness trade-offs  Settore INF/01 - Informatica  Efficiency/effectiveness trade-offs  query-level earlyexit  Query-level earlyexit  additive regression trees  Additive regression trees  Information Retrieval (cs.IR)  FOS: Computer and information sciences  Computer Science - Information Retrieval  Learning to rank  Machine Learning (cs.LG)  learning to rank  68P20 

Search engine ranking pipelines are commonly based on large ensembles of machine-learned decision trees. The tight constraints on query response time recently motivated researchers to investigate algorithms to make faster the traversal of the additive ensemble or to early terminate the evaluation of documents that are unlikely to be ranked among the top-k. In this paper, we investigate the novel problem of query-level early exiting, aimed at deciding the profitability of early stopping the traversal of the ranking ensemble for all the candidate documents to be scored for a query, by simply returning a ranking based on the additive scores computed by a limited portion of the ensemble. Besides the obvious advantage on query latency and throughput, we address the possible positive impact on ranking effectiveness. To this end, we study the actual contribution of incremental portions of the tree ensemble to the ranking of the top-k documents scored for a given query. Our main finding is that queries exhibit different behaviors as scores are accumulated during the traversal of the ensemble and that query-level early stopping can remarkably improve ranking quality. We present a reproducible and comprehensive experimental evaluation, conducted on two public datasets, showing that query-level early exiting achieves an overall gain of up to 7.5% in terms of NDCG@10 with a speedup of the scoring process of up to 2.2x.

Source: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2033–2036, Online Conference, 25-30 July, 2020


[1] N. Asadi and J. Lin. 2013. Training eficient tree-based models for document ranking. In Advances in Information Retrieval. Springer, 146-157.
[2] C. JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.
[3] B. B. Cambazoglu, H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt. 2010. Early exit optimizations for additive machine learned ranking systems. In Proc. WSDM. ACM, 411-420.
[4] G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and N. Tonellotto. 2016. Quality versus eficiency in document scoring with learning-to-rank models. Information Processing & Management 52, 6 (2016), 1161 - 1177.
[5] R.-C. Chen, L. Gallagher, R. Blanco, and J. S. Culpepper. 2017. Eficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval. In Proc. SIGIR. ACM, 445-454.
[6] D. Dato, C. Lucchese, F.M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2016. Fast ranking with additive ensembles of oblivious and nonoblivious regression trees. ACM TOIS 35, 2 (2016).
[7] J. H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics 29 (2000), 1189-1232.
[8] X. Jin, D. Agun, T. Yang, Q. Wu, Y. Shen, and S. Zhao. 2016. Hybrid Indexing for Versioned Document Search with Cluster-based Retrieval. In Proc. CIKM. ACM, 377-386.
[9] F. Lettich, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2019. Parallel Traversal of Large Ensembles of Decision Trees. IEEE TPDS 30, 9 (2019), 2075-2089.
[10] C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, F. Silvestri, and S. Trani. 2018. X-CLEaVER: Learning Ranking Ensembles by Growing and Pruning Trees. ACM TIST (2018).
[11] C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, F. Silvestri, and S. Trani. 2016. Post-Learning Optimization of Tree Ensembles for Eficient Ranking. In Proc. SIGIR. ACM, 949-952.
[12] C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2015. QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees. In Proc. SIGIR. ACM, 73-82.
[13] C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and S. Trani. 2017. X-DART: Blending Dropout and Pruning for Eficient Learning to Rank. In Proc. SIGIR. ACM, 1077-1080.
[14] L. Wang, J. Lin, and D. Metzler. 2010. Learning to Eficiently Rank. In Proc. SIGIR. ACM, New York, NY, USA, 138-145.
[15] Q. Wu, C.J.C. Burges, K.M. Svore, and J. Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval (2010).
[16] T. Ye, H. Zhou, W. Y. Zou, B. Gao, and R. Zhang. 2018. RapidScorer: Fast Tree Ensemble Evaluation by Maximizing Compactness in Data Level Parallelization. In Proc. SIGKDD. ACM, 941-950.
[17] D. Yin, Y. Hu, J. Tang, T. Daly Jr., M. Zhou, H. Ouyang, J. Chen, C. Kang, H. Deng, C. Nobata, J.M. Langlois, and Y. Chang. 2016. Ranking Relevance in Yahoo Search. In Proc. ACM SIGKDD. ACM.

Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:440220,
	title = {Query-level early exit for additive learning-to-rank ensembles},
	author = {Lucchese C. and Nardini F. M. and Orlando S. and Perego R. and Trani S.},
	doi = {10.1145/3397271.3401256 and 10.48550/arxiv.2004.14641},
	booktitle = {43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2033–2036, Online Conference, 25-30 July, 2020},
	year = {2020}
}

BigDataGrapes
Big Data to Enable Global Disruption of the Grapevine-powered Industries


OpenAIRE