2020
Journal article  Open Access

Topical result caching in web search engines

Mele I., Tonellotto N., Frieder O., Perego R.

Topic modeling  Management Science and Operations Research  Library and Information Sciences  Information Systems  Information Retrieval (cs.IR)  Computer Science - Information Retrieval  Databases (cs.DB)  FOS: Computer and information sciences  Caching  Computer Science Applications  Computer Science - Databases  Efficiency  Media Technology 

Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache), a refinement of the traditional SDC (Static-Dynamic Cache) that stores in a static cache the results of popular queries and manages the dynamic cache with a replacement policy for intercepting the temporal variations in the query stream. Our proposed caching scheme includes another layer for topic-based caching, where the entries are allocated to different topics (e.g., weather, education). The results of queries characterized by a topic are kept in the fraction of the cache dedicated to it. This permits to adapt the cache-space utilization to the temporal locality of the various topics and reduces cache misses due to those queries that are neither sufficiently popular to be in the static portion nor requested within short-time intervals to be in the dynamic portion. We simulate different configurations for STD using two real-world query streams. Experiments demonstrate that our approach outperforms SDC with an increase up to 3% in terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the theoretical optimal caching algorithm.

Source: Information processing & management 57 (2020): 1–21. doi:10.1016/j.ipm.2019.102193

Publisher: Pergamon,, New York , Regno Unito


[1] Sadiye Alici, Ismail Sengor Altingovde, Rifat Ozcan, Berkant Barla Cambazoglu, and O zgur Ulusoy. Adaptive time-to-live strategies for query result caching in web search engines. In Proceedings of the 34th European Conference on Advances in Information Retrieval, ECIR '12, pages 401{412, Berlin, Heidelberg, 2012. Springer-Verlag.
[2] Ismail Sengor Altingovde, Rifat Ozcan, Berkant Barla Cambazoglu, and O zgur Ulusoy. Second chance: a hybrid approach for dynamic result caching in search engines. In Proceedings of the 33rd European Conference on Information Retrieval Research, ECIR '11, pages 510{516, Berlin, Heidelberg, 2011. Springer-Verlag.
[3] Ismail Sengor Altingovde, Rifat Ozcan, and O zgur Ulusoy. A cost-aware strategy for query result caching in web search engines. In Proceedings of the 31th European Conference on Information Retrieval Research, ECIR '09, pages 628{636, Berlin, Heidelberg, 2009. Springer-Verlag.
[4] Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and Piotr Sankowski. Stochastic query covering for fast approximate document retrieval. ACM Transactions on Information Systems, 33(3):11:1{11:35, 2015.
[5] Ricardo Baeza-Yates, Flavio Junqueira, Vassilis Plachouras, and Hans Friedrich Witschel. Admission policies for caches of search engine results. In Nivio Ziviani and Ricardo Baeza-Yates, editors, String Processing and Information Retrieval, pages 74{85, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
[6] Ricardo Baeza-Yates and Felipe Saint-Jean. A three level search engine index based in query log distribution. In Mario A. Nascimento, Edleno S. de Moura, and Arlindo L. Oliveira, editors, String Processing and Information Retrieval, pages 56{65, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg.
[7] Luiz Andre Barroso, Je rey Dean, and Urs Holzle. Web search for a planet: The Google cluster architecture. IEEE Micro, 23(2):22{28, 2003.
[8] Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, David Grossman, and Ophir Frieder. Hourly analysis of a very large topically categorized web query log. In Proceedings of the 27th International Conference on Research and Development in Information Retrieval, SIGIR '04, pages 321{328, New York, NY, USA, 2004. ACM.
[9] Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, David Grossman, David D. Lewis, Abdur Chowdhury, and Aleksandr Kolcz. Automatic web query classi cation using labeled and unlabeled training data. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '05, pages 581{582, New York, NY, USA, 2005. ACM.
[10] Laszlo A. Belady, Randolph A. Nelson, and Gerald S. Shedler. An anomaly in space-time characteristics of certain programs running in a paging machine. Communications of the ACM, 12(6):349{353, June 1969.
[11] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet Allocation. Journal on Machine Learning Research, 3:993{1022, 2003.
[12] Andrei Z. Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang. Robust classi cation of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '07, pages 231{238, New York, NY, USA, 2007. ACM.
[13] Berkant Barla Cambazoglu, Ismail Sengor Altingovde, Rifat Ozcan, and O zgur Ulusoy. Cache-based query processing for search engines. ACM Transactions on the Web, 6(4):14:1{14:24, 2012.
[14] Berkant Barla Cambazoglu and Ricardo A. Baeza-Yates. Scalability Challenges in Web Search Engines. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, 2015.

Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:416327,
	title = {Topical result caching in web search engines},
	author = {Mele I. and Tonellotto N. and Frieder O. and Perego R.},
	publisher = {Pergamon,, New York , Regno Unito},
	doi = {10.1016/j.ipm.2019.102193 and 10.48550/arxiv.2001.03010},
	journal = {Information processing \& management},
	volume = {57},
	pages = {1–21},
	year = {2020}
}

BigDataGrapes
Big Data to Enable Global Disruption of the Grapevine-powered Industries


OpenAIRE