2020
Conference article  Open Access

Efficient document re-ranking for transformers by precomputing term representations

Macavaney S., Nardini F. M., Perego R., Tonellotto N., Goharian N., Frieder O.

Information Retrieval (cs.IR)  FOS: Computer and information sciences  Computer Science - Information Retrieval  Neural ranking  Contextualized language models  Ranking efficiency  Pre-trained transformer networks 

Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.

Source: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58, online, 25-30 July, 2020


[1] Arash Ardakani, Zhengyun Ji, Sean C Smithson, Brett H Meyer, and Warren J Gross. 2019. Learning Recurrent Binary/Ternary Weights. In International Conference on Learning Representations. http://arxiv.org/abs/1809.11086
[2] Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What Does BERT Look At? An Analysis of BERT's Attention. In BlackBoxNLP @ ACL. http://arxiv.org/abs/1906.04341
[3] Gordon V. Cormack, Mark D. Smucker, and Charles L. A. Clarke. 2010. Eficient and efective spam filtering and re-ranking for large web datasets. Information Retrieval 14 (2010), 441-465.
[4] Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In SIGIR.
[5] Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search. In WSDM. ACM Press, Marina Del Rey, CA, USA, 126-134. http://dl.acm.org/citation.cfm?doid= 3159652.3159659
[6] Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Rafaele Perego, Nicola Tonellotto, and Rossano Venturini. 2016. Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Transactions on Information Systems 35, 2 (2016), 15:1-15:31.
[7] Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W. Bruce Croft. 2017. Neural Ranking Models with Weak Supervision. In SIGIR.
[8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.
[9] Laura Dietz and Ben Gamari. 2017. TREC CAR: A Data Set for Complex Answer Retrieval. (2017). http://trec-car.cs.unh.edu Version 2.0.
[10] Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM. 55-64. http://arxiv.org/abs/1711. 08611
[11] Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Hufman Coding. In ICLR.
[12] Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Hufman Coding. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. http: //arxiv.org/abs/1510.00149
[13] Song Han, Jef Pool, John Tran, and William Dally. 2015. Learning both weights and connections for eficient neural network. In Advances in neural information processing systems. 1135-1143.
[14] Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).
[15] Geofrey E. Hinton, Oriol Vinyals, and Jefrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR abs/1503.02531 (2015). arXiv:1503.02531 http://arxiv.org/abs/1503.02531
[16] Sebastian Hofstätter and Allan Hanbury. 2019. Let's measure run time! Extending the IR replicability infrastructure to include performance aspects. In OSIRRC@SIGIR.
[17] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry P. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM.
[18] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869-6898.
[19] Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2017. PACRR: A Position-Aware Neural IR Model for Relevance Matching. In EMNLP.
[20] Samuel Huston and W Bruce Croft. 2014. Parameters learned in the comparison of retrieval models using term dependencies. Technical Report (2014).
[21] Shiyu Ji, Jinjin Shao, and Tao Yang. 2019. Eficient Interaction-based Neural Ranking with Locality Sensitive Hashing. In WWW.
[22] Xiaoqi Jiao, Y. Yin, Lifeng Shang, Xin Jiang, Xusong Chen, Linlin Li, Fang Wang, and Qun Liu. 2019. TinyBERT: Distilling BERT for Natural Language Understanding. ArXiv abs/1909.10351 (2019).
[23] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
[24] Francesco Lettich, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Rafaele Perego, Nicola Tonellotto, and Rossano Venturini. 2018. Parallel Traversal of Large Ensembles of Decision Trees. IEEE Transactions on Parallel and Distributed Systems (2018), 14. https://doi.org/10.1109/TPDS.2018.2860982
[25] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar S. Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke S. Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019).
[26] Sean MacAvaney. 2020. OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline. In WSDM.
[27] Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. In SIGIR.
[28] Sean MacAvaney, Andrew Yates, Kai Hui, and Ophir Frieder. 2019. Content-Based Weak Supervision for Ad-Hoc Re-Ranking. In SIGIR.
[29] Irina Matveeva, Christopher J. C. Burges, Timo Burkard, Andy Laucius, and Leon Wong. 2006. High accuracy retrieval with multiple nested ranker. In SIGIR.
[30] Federico Nanni, Bhaskar Mitra, Matt Magnusson, and Laura Dietz. 2017. Benchmark for Complex Answer Retrieval. In ICTIR.
[31] Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. ArXiv abs/1901.04085 (2019).
[32] Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document Expansion by Query Prediction. ArXiv abs/1904.08375 (2019).
[33] Wei Pan, Hao Dong, and Yike Guo. 2016. DropNeuron: Simplifying the Structure of Deep Neural Networks. CoRR abs/1606.07326 (2016). arXiv:1606.07326 http: //arxiv.org/abs/1606.07326
[34] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Technical Report. OpenAI.
[35] Corby Rosset, Damien Jose, Gargi Ghosh, Bhaskar Mitra, and Saurabh Tiwary. 2018. Optimizing Query Evaluations Using Reinforcement Learning for Web Search. In SIGIR.
[36] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In Workshop on Energy Eficient Machine Learning and Cognitive Computing @ NeuIPS .
[37] Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2019. Green AI. ArXiv abs/1907.10597 (2019).
[38] Sanghyun Seo and Juntae Kim. 2019. Eficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer. Appl. Sci (2019).
[39] Mohammad Shoeybi, Mostofa Ali Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. ArXiv abs/1909.08053 (2019).
[40] Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy Lin. 2019. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. ArXiv abs/1903.12136 (2019).
[41] Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2018. Eficient Query Processing for Scalable Web Search. Foundations and Trends in Information Retrieval 12, 4-5 (2018), 319-492.
[42] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In NeuIPS. http://arxiv.org/abs/1706.03762
[43] Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for eficient ranked retrieval. In SIGIR.
[44] Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In SIGIR. 55-64. http://arxiv.org/abs/1706.06613 arXiv: 1706.06613.
[45] Chen Xu, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong Wang, and Hongbin Zha. 2018. Alternating Multi-bit Quantization for Recurrent Neural Networks. In International Conference on Learning Representations. https: //arxiv.org/abs/1802.00150
[46] Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In SIGIR.
[47] Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. End-to-End Open-Domain Question Answering with BERTserini. In NAACL-HLT.
[48] Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Simple Applications of BERT for Ad Hoc Document Retrieval. ArXiv abs/1903.10972 (2019).
[49] Hamed Zamani, Mostafa Dehghani, W Bruce Croft, Erik Learned-Miller, and Jaap Kamps. 2018. From neural re-ranking to neural ranking: Learning a sparse representation for inverted indexing. In CIKM.

Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:440216,
	title = {Efficient document re-ranking for transformers by precomputing term representations},
	author = {Macavaney S. and Nardini F. M. and Perego R. and Tonellotto N. and Goharian N. and Frieder O.},
	doi = {10.1145/3397271.3401093 and 10.48550/arxiv.2004.14255},
	booktitle = {43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58, online, 25-30 July, 2020},
	year = {2020}
}

BigDataGrapes
Big Data to Enable Global Disruption of the Grapevine-powered Industries


OpenAIRE