2020
Conference object  Open Access

Efficient and effective query auto-completion

Gog S., Pibiri G. E., Venturini R.

Computer Science - Information Retrieval  Efficiency  Ebay  Query Auto-Completion 

Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems, suggesting possible ways of completing the query being typed by the user. Efficiency is crucial to make the system have a real-time responsiveness when operating in the million-scale search space. Prior work has extensively advocated the use of a trie data structure for fast prefix-search operations in compact space. However, searching by prefix has little discovery power in that only completions that are prefixed by the query are returned. This may impact negatively the effectiveness of the QAC system, with a consequent monetary loss for real applications like Web Search Engines and eCommerce. In this work we describe the implementation that empowers a new QAC system at eBay, and discuss its efficiency/effectiveness in relation to other approaches at the state-of-the-art. The solution is based on the combination of an inverted index with succinct data structures, a much less explored direction in the literature. This system is replacing the previous implementation based on Apache SOLR that was not always able to meet the required service-level-agreement.

Source: ACM Conference on Research and Development in Information Retrieval, pp. 2271–2280, 25/07/2020-30/07/2020


Citations

[1] Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive query auto-completion. In Proceedings of the 20th international conference on World wide web. ACM, 107-116.
[2] Holger Bast and Ingmar Weber. 2006. Type less, find more: fast autocompletion search with a succinct index. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 364-371.
[3] Sumit Bhatia, Debapriyo Majumdar, and Prasenjit Mitra. 2011. Query suggestions in the absence of query logs. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 795-804.
[4] Fei Cai, Maarten De Rijke, and others. 2016. A survey of query auto completion in information retrieval. Foundations and Trends® in Information Retrieval 10, 4 (2016), 273-363.
[5] Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. 2008. Context-aware query suggestion by mining click-through and session data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 875-883.
[6] Surajit Chaudhuri and Raghav Kaushik. 2009. Extending autocompletion to tolerate errors. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 707-718.
[7] Giovanni Di Santo, Richard McCreadie, Craig Macdonald, and Iadh Ounis. 2015. Comparing approaches for query autocompletion. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 775-778.
[8] Peter Elias. 1974. Eficient Storage and Retrieval by Content and Address of Static Files. J. ACM 21, 2 (1974), 246-260.
[9] Robert Mario Fano. 1971. On the number of bits required to implement an associative memory. Memorandum 61, Computer Structures Group, MIT (1971).
[10] Johannes Fischer and Volker Heun. 2011. Space-eficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40, 2 (2011), 465- 492.
[11] Edward Fredkin. 1960. Trie memory. Commun. ACM 3, 9 (1960), 490-499.
[12] Bo-June Paul Hsu and Giuseppe Ottaviano. 2013. Space-eficient data structures for top-k completion. In Proceedings of the 22nd international conference on World Wide Web. ACM, 583-594.
[13] Microsoft Inc. 2006. MSN Query Log, https://www.microsoft.com/en-us/research/ people/nickcr.
[14] Shengyue Ji, Guoliang Li, Chen Li, and Jianhua Feng. 2009. Eficient interactive fuzzy keyword search. In Proceedings of the 18th international conference on World wide web. 371-380.
[15] Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. 2006. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web. 387-396.
[16] Unni Krishnan, Alistair Mofat, and Justin Zobel. 2017. A taxonomy of query auto completion modes. In Proceedings of the 22nd Australasian Document Computing
[17] Miguel A Martínez-Prieto, Nieves Brisaboa, Rodrigo Cánovas, Francisco Claude, and Gonzalo Navarro. 2016. Practical compressed string dictionaries. Information Systems 56 (2016), 73-108.
[18] Bhaskar Mitra and Nick Craswell. 2015. Query auto-completion for rare prefixes. In Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, 1755-1758.
[19] Bhaskar Mitra, Milad Shokouhi, Filip Radlinski, and Katja Hofmann. 2014. On user interactions with query auto-completion. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 1055-1058.
[20] Alistair Mofat and Lang Stuiver. 2000. Binary Interpolative Coding for Efective Index Compression. Information Retrieval Journal 3, 1 (2000), 25-47.
[21] Alistair Mofat and Justin Zobel. 1996. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems (TOIS) 14, 4 (1996), 349-379.
[22] Shanmugavelayutham Muthukrishnan. 2002. Eficient algorithms for document retrieval problems. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 657-666.
[23] Giuseppe Ottaviano and Rossano Venturini. 2014. Partitioned elias-fano indexes. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 273-282.
[24] Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In International Conference on Scalable Information Systems, Vol. 152. 1.
[25] Giulio Ermanno Pibiri. 2019. On Slicing Sorted Integer Sequences. CoRR abs/1907.01032 (2019). arXiv:1907.01032 http://arxiv.org/abs/1907.01032
[26] Giulio Ermanno Pibiri, Matthias Petri, and Alistair Mofat. 2019. Fast DictionaryBased Compression for Inverted Indexes. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 6-14.
[27] Giulio Ermanno Pibiri and Rossano Venturini. 2017. Eficient data structures for massive n-gram datasets. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 615-624.
[28] Giulio Ermanno Pibiri and Rossano Venturini. 2019. Handling Massive N-Gram Datasets Eficiently. ACM Transactions on Information Systems (TOIS) 37, 2 (2019), 25.
[29] Giulio Ermanno Pibiri and Rossano Venturini. 2019. On optimally partitioning variable-byte codes. IEEE Transactions on Knowledge and Data Engineering (2019).
[30] Giulio Ermanno Pibiri and Rossano Venturini. 2019. Techniques for Inverted Index Compression. CoRR abs/1908.10598 (2019). arXiv:1908.10598 http://arxiv. org/abs/1908.10598
[31] Jef Plaisance, Nathan Kurz, and Daniel Lemire. 2015. Vectorized VByte Decoding. In International Symposium on Web Algorithms.
[32] Falk Scholer, Hugh E Williams, John Yiannis, and Justin Zobel. 2002. Compression of inverted indexes for fast query evaluation. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 222-229.
[33] Milad Shokouhi. 2013. Learning to personalize query auto-completion. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 103-112.
[34] Milad Shokouhi and Kira Radinsky. 2012. Time-sensitive query auto-completion. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 601-610.
[35] J. Zhang, X. Long, and T. Suel. 2008. Performance of compressed inverted list caching in search engines. In International World Wide Web Conference (WWW). 387-396.


Back to previous page