Journal article  Open Access

Efficient query processing for scalable web search

Tonellotto N., Macdonald C., Ounis I.

Computer Science (miscellaneous)  learning-to-rank techniques  Architectures for IR  Information Systems  query processing infrastructures  Performance issues for IR systems  query efficiency prediction  Web search 

Search engines are exceptionally important tools for accessing information in today's world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware and software architectures.

Source: Foundations and trends in information retrieval 12 (2018): 319–500. doi:10.1561/1500000057

Publisher: Now Publishers, Hanover, Mass. , Stati Uniti d'America


Back to previous page
BibTeX entry
	title = {Efficient query processing for scalable web search},
	author = {Tonellotto N. and Macdonald C. and Ounis I.},
	publisher = {Now Publishers, Hanover, Mass. , Stati Uniti d'America},
	doi = {10.1561/1500000057 and 10.1561/9781680835434 and 10.5281/zenodo.3268358 and 10.5281/zenodo.3268359},
	journal = {Foundations and trends in information retrieval},
	volume = {12},
	pages = {319–500},
	year = {2018}

Big Data to Enable Global Disruption of the Grapevine-powered Industries