2020
Conference article  Open Access

Expansion via prediction of importance with contextualization

Macavaney S., Nardini F. M., Perego R., Tonellotto N., Goharian N., Frieder O.

Document representation  Information Retrieval (cs.IR)  Query representation  FOS: Computer and information sciences  Computer Science - Information Retrieval  Efficient ranking  Neural ranking 

The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.

Source: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1573–1576, online, 25-30 July, 2020


[1] Nick Craswell, Bhaskar Mitra, and Daniel Campos. 2019. Overview of the TREC 2019 Deep Learning Track. In TREC.
[2] Zhuyun Dai and Jamie Callan. 2019. Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv (2019).
[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.
[4] Laura Dietz, Ben Gamari, Jef Dalton, and Nick Craswell. 2017. TREC Complex Answer Retrieval Overview. In TREC.
[5] Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM.
[6] Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, and W. Bruce Croft. 2019. ANTIQUE: A Non-Factoid Question Answering Benchmark. arXiv (2019).
[7] Sebastian Hofstätter and Allan Hanbury. 2019. Let's measure run time! Extending the IR replicability infrastructure to include performance aspects. In OSIRRC@SIGIR.
[8] Sebastian Hofstätter, Markus Zlabinger, and Allan Hanbury. 2020. Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking. arXiv (2020).
[9] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
[10] Sean MacAvaney. 2020. OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline. In WSDM.
[11] Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. In SIGIR.
[12] Bhaskar Mitra and Nick Craswell. 2019. An updated duet model for passage re-ranking. arXiv (2019).
[13] Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv (2019).
[14] Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. (2019). https://cs.uwaterloo.ca/~jimmylin/publications/Nogueira_Lin_2019_ docTTTTTquery-v2.pdf
[15] Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document expansion by query prediction. arXiv (2019).
[16] Colin Rafel, Noam Shazeer, Adam Kaleo Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv (2019).
[17] Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2018. Eficient Query Processing for Scalable Web Search. Foundations and Trends in Information Retrieval 12, 4-5 (2018), 319-492. http://dx.doi.org/10.1561/1500000057
[18] Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. J. Data and Information Quality 10 (2018), 16:1-16:20.

Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:440218,
	title = {Expansion via prediction of importance with contextualization},
	author = {Macavaney S. and Nardini F. M. and Perego R. and Tonellotto N. and Goharian N. and Frieder O.},
	doi = {10.1145/3397271.3401262 and 10.48550/arxiv.2004.14245},
	booktitle = {43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1573–1576, online, 25-30 July, 2020},
	year = {2020}
}

BigDataGrapes
Big Data to Enable Global Disruption of the Grapevine-powered Industries


OpenAIRE