103 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2020 Journal article Open Access OPEN

Topical result caching in web search engines
Mele I., Tonellotto N., Frieder O., Perego R.
Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache), a refinement of the traditional SDC (Static-Dynamic Cache) that stores in a static cache the results of popular queries and manages the dynamic cache with a replacement policy for intercepting the temporal variations in the query stream. Our proposed caching scheme includes another layer for topic-based caching, where the entries are allocated to different topics (e.g., weather, education). The results of queries characterized by a topic are kept in the fraction of the cache dedicated to it. This permits to adapt the cache-space utilization to the temporal locality of the various topics and reduces cache misses due to those queries that are neither sufficiently popular to be in the static portion nor requested within short-time intervals to be in the dynamic portion. We simulate different configurations for STD using two real-world query streams. Experiments demonstrate that our approach outperforms SDC with an increase up to 3% in terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the theoretical optimal caching algorithm.Source: Information processing & management 57 (2020): 1–21. doi:10.1016/j.ipm.2019.102193
DOI: 10.1016/j.ipm.2019.102193
Project(s): BigDataGrapes via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | Information Processing & Management Open Access | ISTI Repository Open Access | Information Processing & Management Restricted | Information Processing & Management Restricted | Information Processing & Management Restricted | Information Processing & Management Restricted | Information Processing & Management Restricted | Information Processing & Management Restricted | CNR ExploRA Restricted | Information Processing & Management Restricted


2019 Conference article Open Access OPEN

Performance analysis of WebRTC-based video streaming over power constrained platforms
Bacco M., Catena M., De Cola T., Gotta A., Tonellotto N.
This work analyses the use of the Web Real-Time Communications (WebRTC) framework on resource-constrained platforms. WebRTC is a consolidated solution for real-time video streaming, and it is an appealing solution in a wide range of application scenarios. We focus our attention on those in which power consumption, size and weight are of paramount importance because of the so-called Size, Weight and Power (SWaP) requirements, such as the use case of Unmanned Aerial Vehicles (UAVs) delivering real-time video streams over WebRTC to peers on the ground. The testbed described in this work shows that the power consumption can be reduced by changing WebRTC default settings while maintaining comparable video quality.Source: Globecom 2018 - 2018 IEEE Global Communications Conference, Abu Dhabi, United Arab Emirates, 9-13 December 2018
DOI: 10.1109/glocom.2018.8647375
DOI: 10.5281/zenodo.2705727
DOI: 10.5281/zenodo.2705728
Project(s): BigDataGrapes via OpenAIRE

See at: ISTI Repository Open Access | zenodo.org Open Access | zenodo.org Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | elib.dlr.de Restricted | ieeexplore.ieee.org Restricted | CNR ExploRA Restricted | xplorestaging.ieee.org Restricted


2019 Journal article Open Access OPEN

Parallel Traversal of Large Ensembles of Decision Trees
Lettich F., Lucchese C., Nardini F. M., Orlando S., Perego R., Tonellotto N., Venturini R.
Machine-learnt models based on additive ensembles of regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. The deployment of such models is computationally demanding: to compute the final prediction, the whole ensemble must be traversed by accumulating the contributions of all its trees. In particular, traversal cost impacts applications where the number of candidate items is large, the time budget available to apply the learnt model to them is limited, and the users' expectations in terms of quality-of-service is high. Document ranking in web search, where sub-optimal ranking models are deployed to find a proper trade-off between efficiency and effectiveness of query answering, is probably the most typical example of this challenging issue. This paper investigates multi/many-core parallelization strategies for speeding up the traversal of large ensembles of regression trees thus obtaining machine-learnt models that are, at the same time, effective, fast, and scalable. Our best results are obtained by the GPU-based parallelization of the state-of-the-art algorithm, with speedups of up to 102.6x.Source: IEEE transactions on parallel and distributed systems (Print) 30 (2019): 2075–2089. doi:10.1109/TPDS.2018.2860982
DOI: 10.1109/tpds.2018.2860982
DOI: 10.5281/zenodo.2668379
DOI: 10.5281/zenodo.2668378
Project(s): BigDataGrapes via OpenAIRE

See at: Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari Open Access | ISTI Repository Open Access | ZENODO Open Access | IEEE Transactions on Parallel and Distributed Systems Open Access | IEEE Transactions on Parallel and Distributed Systems Restricted | IEEE Transactions on Parallel and Distributed Systems Restricted | ieeexplore.ieee.org Restricted | IEEE Transactions on Parallel and Distributed Systems Restricted | IEEE Transactions on Parallel and Distributed Systems Restricted | CNR ExploRA Restricted | IEEE Transactions on Parallel and Distributed Systems Restricted


2019 Patent Open Access OPEN

Cache optimization via topics in web search engines
Frieder O., Mele I., Perego R., Tonellotto N.
Source: US 10503792, Internazionale, 2019

See at: ISTI Repository Open Access | patentimages.storage.googleapis.com | CNR ExploRA


2019 Conference article Open Access OPEN

Enhanced news retrieval: passages lead the way!
Catena M., Nardini F. M., Frieder O., Perego R., Muntean C. I., Tonellotto N.
We observe that most relevant terms in unstructured news articles are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of news articles. Our experimentation, conducted using three publicly available news datasets, demonstrates that BM25P markedly outperforms BM25 in term of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1.Source: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1269–1272, Parigi, Francia, 21-25 July 2019
DOI: 10.1145/3331184.3331373

See at: dl.acm.org Open Access | CNR ExploRA Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted


2019 Conference article Restricted

Multiple query processing via logic function factoring
Catena M., Tonellotto N.
Some extensions to search systems require support for multiple query processing. This is the case with query variations, i.e., different query formulations of the same information need. The results of their processing can be fused together to improve effectiveness, but this requires to traverse more than once the query terms' posting lists, thus prolonging the multiple query processing time. In this work, we propose an approach to optimize the processing of query variations to reduce their overall response time. Similarly to the standard Boolean model, we firstly represent a group of query variations as a logic function where Boolean variables represent query terms. We then apply factoring to such function, in order to produce a more compact but logically equivalent representation. The factored form is used to process the query variations in a single pass over the inverted index. We experimentally show that our approach can improve by up to 1.95× the mean processing time of a multiple query with no statistically significant degradation in terms of NDCG@10.Source: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 937–940, Parigi, Francia, 21-25 July, 2019
DOI: 10.1145/3331184.3331297
Project(s): BigDataGrapes via OpenAIRE

See at: academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted | dl.acm.org Restricted | CNR ExploRA Restricted


2018 Journal article Open Access OPEN

Dataset popularity prediction for caching of CMS big data
Meoni M., Perego R., Tonellotto N.
The Compact Muon Solenoid (CMS) experiment at the European Organization for Nuclear Research (CERN) deploys its data collections, simulation and analysis activities on a distributed computing infrastructure involving more than 70 sites worldwide. The historical usage data recorded by this large infrastructure is a rich source of information for system tuning and capacity planning. In this paper we investigate how to leverage machine learning on this huge amount of data in order to discover patterns and correlations useful to enhance the overall efficiency of the distributed infrastructure in terms of CPU utilization and task completion time. In particular we propose a scalable pipeline of components built on top of the Spark engine for large-scale data processing, whose goal is collecting from different sites the dataset access logs, organizing them into weekly snapshots, and training, on these snapshots, predictive models able to forecast which datasets will become popular over time. The high accuracy achieved indicates the ability of the learned model to correctly separate popular datasets from unpopular ones. Dataset popularity predictions are then exploited within a novel data caching policy, called PPC (Popularity Prediction Caching). We evaluate the performance of PPC against popular caching policy baselines like LRU (Least Recently Used). The experiments conducted on large traces of real dataset accesses show that PPC outperforms LRU reducing the number of cache misses up to 20% in some sites.Source: Journal of grid computing 16 (2018): 211–228. doi:10.1007/s10723-018-9436-4
DOI: 10.1007/s10723-018-9436-4

See at: ISTI Repository Open Access | Journal of Grid Computing Restricted | Journal of Grid Computing Restricted | Journal of Grid Computing Restricted | Journal of Grid Computing Restricted | Journal of Grid Computing Restricted | Journal of Grid Computing Restricted | CNR ExploRA Restricted


2018 Conference article Open Access OPEN

Efficient energy management in distributed web search
Catena M., Frieder O., Tonellotto N.
Distributed Web search engines (WSEs) require warehouse-scale computers to deal with the ever-increasing size of the Web and the large amount of user queries they daily receive. The energy consumption of this infrastructure has a major impact on the economic profitability of WSEs. Recently several approaches to reduce the energy consumption of WSEs have been proposed. Such solutions leverage dynamic voltage and frequency scaling techniques in modern CPUs to adapt the WSEs' query processing to the incoming query traffic without negative impacts on latencies. A state-of-the-art research approach is the PESOS (Predictive Energy Saving Online Scheduling) algorithm, which can reduce the energy consumption of a WSE' single server by up to 50%. We evaluate PESOS on a simulated distributed WSE composed of a thousand of servers, and we compare its performance w.r.t. an industry-level baseline, called PEGASUS. Our results show that PESOS can reduce the CPU energy consumption of a distributed WSE by up to 18% with respect to PEGASUS, while providing query response times which are in line with user expectations.Source: 27th ACM International Conference on Information and Knowledge Management, pp. 1555–1558, Torino, Italia, 22-26/10/2018
DOI: 10.1145/3269206.3269263
DOI: 10.5281/zenodo.2710863
DOI: 10.5281/zenodo.2710864
Project(s): BigDataGrapes via OpenAIRE

See at: ISTI Repository Open Access | ZENODO Open Access | zenodo.org Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted | dl.acm.org Restricted | CNR ExploRA Restricted


2018 Conference article Open Access OPEN

Efficient query processing infrastructures: A half-day tutorial at SIGIR 2018
Tonellotto N., Macdonald C.
Typically, techniques that benefit effectiveness of information retrieval (IR) systems have a negative impact on efficiency. Yet, with the large scale of Web search engines, there is a need to deploy efficient query processing techniques to reduce the cost of the infrastructure required. This tutorial aims to provide a detailed overview of the infrastructure of an IR system devoted to the efficient yet effective processing of user queries. This tutorial guides the attendees through the main ideas, approaches and algorithms developed in the last 30 years in query processing. In particular, we illustrate, with detailed examples and simplified pseudo-code, the most important query processing strategies adopted in major search engines, with a particular focus on dynamic pruning techniques. Moreover, we present and discuss the state-of-the-art innovations in query processing, such as impact-sorted and blockmax indexes. We also describe how modern search engines exploit such algorithms with learning-to-rank (LtR) models to produce effective results, exploiting new approaches in LtR query processing. Finally, this tutorial introduces query efficiency predictors for dynamic pruning, and discusses their main applications to scheduling, routing, selective processing and parallelisation of query processing, as deployed by a major search engine.Source: 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), pp. 1403–1406, 8-12/07/2018
DOI: 10.1145/3209978.3210191

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted | dl.acm.org Restricted | CNR ExploRA Restricted


2018 Journal article Open Access OPEN

Efficient query processing for scalable web search
Tonellotto N., Macdonald C., Ounis I.
Search engines are exceptionally important tools for accessing information in today's world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware and software architectures.Source: Foundations and trends in information retrieval 12 (2018): 319–500. doi:10.1561/1500000057
DOI: 10.1561/1500000057
DOI: 10.1561/9781680835434
DOI: 10.5281/zenodo.3268358
DOI: 10.5281/zenodo.3268359
Project(s): BigDataGrapes via OpenAIRE

See at: Enlighten Open Access | Archivio della Ricerca - Università di Pisa Open Access | ZENODO Open Access | Foundations and Trends® in Information Retrieval Open Access | CNR ExploRA | www.nowpublishers.com


2018 Contribution to book Unknown

Popularity-based caching of CMS datasets
Meoni M., Perego R., Tonellotto N.
The distributed monitoring infrastructure of the Compact Muon Solenoid (CMS) experiment at the European Organization for Nuclear Research (CERN) records on a Hadoop infrastructures a broad variety of computing and storage logs. They represent a valuable source of information for system tuning and capacity planning. In this paper we analyze machine learning (ML) techniques on large amount of traces to discover patterns and correlations useful to classify the popularity of experiment-related datasets. We implement a scalable pipeline of Spark components which collect the dataset access logs from heterogeneous monitoring sources and group them into weekly snapshots organized by CMS sites. Predictive models are trained on these snapshots and forecast which dataset will become popular over time. Dataset popularity predictions are then used to experiment a novel strategy of data caching, called Popularity Prediction Caching (PPC). We compare the hit rates of PPC with those produced by well known caching policies. We demonstrate how the performance improvement is as high as 20% in some sites.Source: Parallel Computing is Everywhere, edited by Bassini S., Danelutto M., Dazzi P., Joubert G.R., Peters F., pp. 221–231, 2018
DOI: 10.3233/978-1-61499-843-3-221

See at: ebooks.iospress.nl | CNR ExploRA


2017 Conference article Restricted

Multicore/Manycore parallel traversal of large forests of regression trees
Lettich F., Lucchese C., Nardini F. M., Orlando S., Perego R., Tonellotto N., Venturini R.
Machine-learnt models based on additive ensembles of binary regression trees are currently considered one of the best solutions to address complex classification, regression, and ranking tasks. To evaluate these complex models over a continuous stream of data items with high throughput requirements, we need to optimize, and possibly parallelize, the traversal of thousands of trees, each including hundreds of nodes.Document ranking in Web Search is a typical example of this challenging scenario, where complex tree-based models are used to score query-document pairs and finally rank lists of document results for each incoming query (a.k.a. Learning-to-Rank). In this extended abstract, we briefly discuss some preliminary results concerning the parallelization strategies for QUICKSCORER - indeed the state-of-art scoring algorithm that exploits ensembles of decision trees - by using multicore CPUs (with SIMD coprocessors) and manycore GPUs. We show that QUICKSCORER, which transforms the traversal of thousands of decision trees in a linear access to array data structures, can be parallelized very effectively, by achieving very interesting speedups.Source: HPCS 2017 - International Conference on High Performance Computing & Simulation, pp. 915–915, Genoa, Italy, 17-21 July 2017
DOI: 10.1109/hpcs.2017.154

See at: academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | doi.org Restricted | ieeexplore.ieee.org Restricted | iris.unive.it Restricted | CNR ExploRA Restricted | xplorestaging.ieee.org Restricted


2017 Journal article Open Access OPEN

Energy-efficient query processing in web search engines
Catena M., Tonellotto N.
Web search engines are composed by thousands of query processing nodes, i.e., servers dedicated to process user queries. Such many servers consume a significant amount of energy, mostly accountable to their CPUs, but they are necessary to ensure low latencies, since users expect sub-second response times (e.g., 500 ms). However, users can hardly notice response times that are faster than their expectations. Hence, we propose the Predictive Energy Saving Online Scheduling Algorithm ( PESOS) to select the most appropriate CPU frequency to process a query on a per-core basis. PESOS aims at process queries by their deadlines, and leverage high-level scheduling information to reduce the CPU energy consumption of a query processing node. PESOS bases its decision on query efficiency predictors, estimating the processing volume and processing time of a query. We experimentally evaluate PESOS upon the TREC ClueWeb09B collection and the MSN2006 query log. Results show that PESOS can reduce the CPU energy consumption of a query processing node up to similar to 48 percent compared to a system running at maximum CPU core frequency. PESOS outperforms also the best state-of-the-art competitor with a similar to 20 percent energy saving, while the competitor requires a fine parameter tuning and it may incurs in uncontrollable latency violations.Source: IEEE transactions on knowledge and data engineering (Print) 29 (2017): 1412–1425. doi:10.1109/TKDE.2017.2681279
DOI: 10.1109/tkde.2017.2681279

See at: Archivio della Ricerca - Università di Pisa Open Access | IEEE Transactions on Knowledge and Data Engineering Restricted | IEEE Transactions on Knowledge and Data Engineering Restricted | IEEE Transactions on Knowledge and Data Engineering Restricted | ieeexplore.ieee.org Restricted | IEEE Transactions on Knowledge and Data Engineering Restricted | CNR ExploRA Restricted | IEEE Transactions on Knowledge and Data Engineering Restricted | IEEE Transactions on Knowledge and Data Engineering Restricted


2017 Conference article Open Access OPEN

Upper bound approximations for BlockMaxWand
Macdonald C., Tonellotto N.
BlockMaxWand is a recent advance on the Wand dynamic pruning technique, which allows efficient retrieval without any e.ectiveness degradation to rank K. However, while BMW uses docid-sorted indices, it relies on recording the upper bound of the term weighting model scores for each block of postings in the inverted index. Such a requirement can be disadvantageous in situations such as when an index must be updated. In this work, we examine the appropriateness of upper-bound approximation - which have previously been shown suitable for Wand- in providing efficient retrieval for BMW. Experiments on the ClueWeb12 category B13 corpus using 5000 queries from a real search engine's query log demonstrate that BMW still provides benefits w.r.t. Wand when approximate upper bounds are used, and that, if approximations on upper bounds are tight, BMW with approximate upper bounds can provide eficiency gains w.r.t. Wand with exact upper bounds, in particular for queries of short to medium length.Source: ICTIR 2017 - 3rd ACM International Conference on the Theory of Information Retrieval, pp. 273–276, Amsterdam, The Netherlands, 1-4 October, 2017
DOI: 10.1145/3121050.3121094

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted | dl.acm.org Restricted | eprints.gla.ac.uk Restricted | CNR ExploRA Restricted


2017 Conference article Open Access OPEN

QuickScorer: efficient traversal of large ensembles of decision trees
Lucchese C., Nardini F. M., Orlando S., Perego R., Tonellotto N., Venturini R.
Machine-learnt models based on additive ensembles of binary regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. Evaluating these models is a computationally demanding task as it needs to traverse thousands of trees with hundreds of nodes each. The cost of traversing such large forests of trees significantly impacts their application to big and stream input data, when the time budget available for each prediction is limited to guarantee a given processing throughput. Document ranking in Web search is a typical example of this challenging scenario, where the exploitation of tree-based models to score query-document pairs, and finally rank lists of documents for each incoming query, is the state-of-art method for ranking (a.k.a. Learning-to-Rank). This paper presents QuickScorer, a novel algorithm for the traversal of huge decision trees ensembles that, thanks to a cache- and CPU-aware design, provides a 9 speedup over best competitors.Source: ECML PKDD - Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 383–387, Skopje, Macedonia, 18-22 September, 2017
DOI: 10.1007/978-3-319-71273-4_36

See at: arpi.unipi.it Open Access | academic.microsoft.com Restricted | arpi.unipi.it Restricted | arpi.unipi.it Restricted | dblp.uni-trier.de Restricted | iris.unive.it Restricted | link.springer.com Restricted | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted | rd.springer.com Restricted


2017 Conference article Open Access OPEN

Efficient & Effective Selective Query Rewriting with Efficiency Predictions
Macdonald C., Tonellotto N., Ounis I.
To enhance e'ectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit e'ectiveness offten have a negative impact on effciency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both e'ectiveness and effciency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the e'ectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the e'ectiveness of the search engine.Source: SIGIR '17 - 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 495–504, Shinjuku, Tokyo, Japan, 7-11 July, 2017
DOI: 10.1145/3077136.3080827

See at: Enlighten Open Access | ISTI Repository Open Access | academic.microsoft.com Restricted | core.ac.uk Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted | dl.acm.org Restricted | dl.acm.org Restricted | doi.org Restricted | eprints.gla.ac.uk Restricted | CNR ExploRA Restricted


2017 Conference article Restricted

Faster blockmaxwand with variable-sized blocks
Mallia A., Ottaviano G., Porciani E., Tonellotto N., Venturini R.
Query processing is one of the main bo.lenecks in large-scale search engines. Retrieving the top k most relevant documents for a given query can be extremely expensive, as it involves scoring large amounts of documents. Several dynamic pruning techniques have been introduced in the literature to tackle this problem, such as BlockMaxWAND, which splits the inverted index into constantsized blocks and stores the maximum document-Term scores per block; this information can be used during query execution to safely skip low-score documents, producing many-fold speedups over exhaustive methods. We introduce a re.nement for BlockMaxWANDthat uses variablesized blocks, rather than constant-sized. We set up the problem of deciding the block partitioning as an optimization problem which maximizes how accurately the block upper bounds represent the underlying scores, and describe an effcient algorithm to .nd an approximate solution, with provable approximation guarantees. .rough an extensive experimental analysis we show that our method significantly outperforms the state of the art roughly by a factor 2x. We also introduce a compressed data structure to represent the additional block information, providing a compression ratio of roughly 50%, while incurring only a small speed degradation, no more than 10% with respect to its uncompressed counterpart.Source: SIGIR '17 - 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 625–634, Shinjuku, Tokyo, Japan, 7-11 July, 2017
DOI: 10.1145/3077136.3080780

See at: academic.microsoft.com Restricted | arpi.unipi.it Restricted | core.ac.uk Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted | dl.acm.org Restricted | doi.acm.org Restricted | CNR ExploRA Restricted


2017 Conference article Open Access OPEN

Recent advances in energy efficient query processing
Catena M., Tonellotto N.
Web search companies distribute their infrastructures and operations across several, geographically distant data centers. This distributed architecture facilitates high performance query processing, which is fundamental for the success of a Web search engine. At the same time, data centers require an huge amount of electricity to operate their computing resources. In this extended abstract, we briefly discuss our recent works for improving the energy effciency of query processing systems. Firstly, we introduce a novel query forwarding algorithm which exploits green energy sources to reduce the electricity expenditure and carbon footprint of Web search engines. Then, we propose to delegate the CPU power management from a server' operative system directly to the query processing application, to reduce the energy consumption of a search engine's servers. Finally, we introduce PESOS, a scheduling algorithm which manages the CPU power consumption on a per-query basis while considering query latency constraints.Source: IIR 2017 - 8th Italian Information Retrieval Workshop, pp. 125–128, Lugano, Switzerland, 05-07 June, 2017

See at: ceur-ws.org Open Access | CNR ExploRA Open Access


2016 Report Restricted

Joint modeling of arrival process and length distribution of queries in Web search engines
Cassarà P., Colucci M., Gotta A., Tonellotto N.
This paper proposes a novel fitting procedure via non-parametric kernel- based models of the probability mass function of a discrete arrival process, derived from real traffic traces of queries to a Web search engine. Most of the adopted estimation techniques for probability mass functions are based on parameter estimations for a given family of probability distri- bution functions. Conversely, the proposed procedure, jointly with a kernel-based model of the probability distribution function, doesn't need any assumptions about membership to a families of distributions, or about parameters. The fitting procedure based on the Generalized Cross-Entropy resolves a Quadratic Programming Problem. Furthermore, the estimated probability mass function can be expressed in a closed form, as a weighted sum of kernel functions. We also examine the performance of the proposed procedure via numer- ical experiments and present an example of traffic analysis with real data traffic. Results show that our estimation of the probability mass function, closely matches the empirical probability mass function. Precisely, through the procedure, both temporal and statistical characteristics, such as auto- correlation, long-range dependence, and skewness, can be well approximated.Source: ISTI Technical reports, 2016

See at: CNR ExploRA Restricted


2016 Conference article Restricted

Exploiting green energy to reduce the operational costs of multi-center Web search engines
Blanco R., Catena M., Tonellotto N.
Carbon dioxide emissions resulting from fossil fuels (brown energy) combustion are the main cause of global warming due to the greenhouse effect. Large IT companies have recently increased their efforts in reducing the carbon dioxide footprint originated from their data center electricity consumption. On one hand, better infrastructure and modern hardware allow for a more efficient usage of electric resources. On the other hand, data-centers can be powered by renewable sources (green energy) that are both environmental friendly and economically convenient. In this paper, we tackle the problem of targeting the usage of green energy to minimize the expenditure of running multi-center Web search engines, i.e., systems composed by multiple, geographically remote, computing facilities. We propose a mathematical model to minimize the operational costs of multi-center Web search engines by exploiting renewable energies whenever available at different locations. Using this model, we design an algorithm which decides what fraction of the incoming query load arriving into one processing facility must be forwarded to be processed at different sites to use green energy sources. We experiment using real traffic from a large search engine and we compare our model against state of the art baselines for query forwarding. Our experimental results show that the proposed solution maintains an high query throughput, while reducing by up to ~25% the energy operational costs of multi-center search engines. Additionally, our algorithm can reduce the brown energy consumption by almost 6% when energy-proportional servers are employed.Source: 25th International Conference on World Wide Web, pp. 1237–1247, Montreal, Canada, 11-15 April 2016
DOI: 10.1145/2872427.2883021

See at: academic.microsoft.com Restricted | core.ac.uk Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted | dl.acm.org Restricted | dl.acm.org Restricted | CNR ExploRA Restricted