2005
Contribution to conference
Open Access
A proposal for a generic grid scheduling architecture
Tonellotto N., Wieder P., Yahyapour R.In the past years, many Grids have been deployed and became commodity systems in production environments. While several Grid scheduling systems have already been implemented, they still provide only ``ad hoc'' and domain-specific solutions to the problem of scheduling resources in a Grid. However, no common and generic Grid scheduling system has emerged yet. In this work we identify generic features of three common Grid scheduling scenarios, and we introduce a single entity that we call scheduling instance that can be used as a building block for the scheduling solutions presented. We identify the behavior that a scheduling instance must exhibit in order to be composed with other instances to build Grid scheduling systems discussed, and their interactions with other Grid functionalities. This work can be used as a foundation for designing common Grid scheduling infrastructures.Source: CoreGrid Integration Workshop, Pisa, 28-30 November 2005
See at:
ISTI Repository | CNR ExploRA
2010
Conference article
Unknown
Efficient dynamic pruning with proximity support
Tonellotto N., Macdonald C., Ounis I.Modern retrieval approaches apply not just single-term weighting models when ranking documents - instead, proximity weighting models are in common use, which highly score the co-occurrence of pairs of query terms in close proximity to each other in documents. The adoption of these proximity weighting models can cause a computational overhead when documents are scored, negatively impacting the efficiency of the retrieval process. In this paper, we discuss the integration of proximity weighting models into efficient dynamic pruning strategies. In particular, we propose to modify document-at-a-time strategies to include proximity scoring without any modifications to pre-existing index structures. Our resulting two-stage dynamic pruning strategies only consider single query terms during first stage pruning, but can early terminate the proximity scoring of a document if it can be shown that it will never be retrieved. We empirically examine the efficiency benefits of our approach using a large Web test collection of 50 million documents and 10,000 queries from a real query log. Our results show that our proposed two-stage dynamic pruning strategies are considerably more efficient than the original strategies, particularly for queries of 3 or more terms.Source: SIGIR 2010 - Workshop on Large Scale Distributed Search, pp. 31–35, Ginevra, Svizzera, Luglio 2010
See at:
CNR ExploRA
2006
Journal article
Unknown
High Level Grid Programming with ASSIST
Aldinucci M., Coppola M., Danelutto M., Tonellotto N., Vanneschi M., Zoccolo C.The development of efficient Grid applications usually requires writing huge portions of code directly at the level of abstraction
provided by the underlying Grid middleware. In this work we discuss an alternative approach, raising the level of abstraction used
when programming Grid applications. Our approach requires programmers just to describe in a qualitative way the kind of parallelism
they want to express. Then, compiler tools, loader tools and run time system take complete care of running the application on a Grid
target architecture. This allows to move most of the cumbersome tasks related to Grid targeting and management from programmer
responsibility to tools. This paper introduces the structured parallel programming environment ASSIST, whose design is aimed at raising
the level of abstraction in Grid programming and discusses how it can support transparent Grid programming while implementing
Grid adaptivity.Source: Computational Methods in Science and Technology 12 (2006): 21–32.
See at:
CNR ExploRA
2011
Journal article
Restricted
Upper bound approximations for dynamic pruning
Macdonald C., Ounis I., Tonellotto N.Dynamic pruning strategies for information retrieval systems can increase querying efficiency without decreasing effectiveness by using upper bounds to safely omit scoring documents that are unlikely to make the final retrieved set. Often, such upper bounds are pre-calculated at indexing time for a given weighting model. However, this precludes changing, adapting or training the weighting model without recalculating the upper bounds. Instead, upper bounds should be approximated at querying time from various statistics of each term to allow on-the-fly adaptation of the applied retrieval strategy. This article, by using uniform notation, formulates the problem of determining a term upper-bound given a weighting model and discusses the limitations of existing approximations. Moreover, we propose an upper-bound approximation using a constrained nonlinear maximization problem. We prove that our proposed upper-bound approximation does not impact the retrieval effectiveness of several modern weighting models from various different families. We also show the applicability of the approximation for the Markov Random Field proximity model. Finally, we empirically examine how the accuracy of the upper-bound approximation impacts the number of postings scored and the resulting efficiency in the context of several large Web test collections.Source: ACM transactions on information systems (Online) 29 (2011). doi:10.1145/2037661.2037662
DOI: 10.1145/2037661.2037662Project(s): S-CUBE Metrics:
See at:
ACM Transactions on Information Systems | CNR ExploRA
2011
Conference article
Open Access
Query efficiency prediction for dynamic pruning
Tonellotto Nicola, Ounis Iadh, Macdonald CraigDynamic pruning strategies are effective yet permit efficient retrieval by pruning - i.e. not fully scoring all postings of all documents matching a given query. However, the amount of pruning possible for a query can vary, resulting in queries with similar properties (query length, total numbers of postings) taking different amounts of time to retrieve search results. In this work, we investigate the causes for inefficient queries, identifying reasons such as the balance between informativeness of query terms, and the distribution of retrieval scores within the posting lists. Moreover, we note the advantages in being able to predict the efficiency of a query, and propose various query efficiency predictors. Using 10,000 queries and the TREC ClueWeb09 category B corpus for evaluation, we find that combining predictors using regression can accurately predict query response time.Source: 9th Workshop on Large-scale and Distributed Informational Retrieval, LSDS-IR' 11, pp. 3–8, Glasgow, UK, 24-28 October 2011
DOI: 10.1145/2064730.2064734Metrics:
See at:
www.dcs.gla.ac.uk | doi.org | dl.acm.org | CNR ExploRA
2011
Conference article
Restricted
On upper bounds for dynamic pruning
Tonellotto Nicola, Ounis Iadh, Macdonald CraigDynamic pruning strategies enhance the efficiency of search engines, by making use of term upper bounds to decide when a document will not make the final set of k retrieved documents. After discussing different approaches for obtaining term upper bounds, we propose the use of multiple least upper bounds. Experiments are conducted on the TREC ClueWeb09 corpus, to measure the accuracy of different upper bounds.Source: Advances in Information Retrieval Theory. Third International Conference, ICTIR 2011, pp. 313–317, Bertinoro, Italy, 12-14 September 2011
DOI: 10.1007/978-3-642-23318-0_29Metrics:
See at:
doi.org | www.springerlink.com | CNR ExploRA
2011
Conference article
Restricted
Effect of different docid orderings on dynamic pruning retrieval strategies
Tonellotto N., Ounis I., Macdonald C.Document-at-a-time (DAAT) dynamic pruning strategies for information retrieval systems such as textsc{MaxScore} and textsc{Wand} can increase querying efficiency without decreasing effectiveness. Both work on posting lists sorted by ascending document identifier (docid). The order in which docids are assigned -- and hence the order of postings in the posting lists -- is known to have a noticeable impact on posting list compression. However, the resulting impact on dynamic pruning strategies is not well understood. In this poster, we examine the impact on the efficiency of these strategies across different docid orderings, by experimenting using the TREC ClueWeb09 corpus. We find that while the number of postings scored by dynamic pruning strategies do not markedly vary for different docid orderings, the ordering still has a marked impact on mean query response time. Moreover, when docids are assigned by lexicographical URL ordering, the benefit to response time for is more pronounced for textsc{Wand} than for textsc{MaxScore}.Source: 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR' 11, pp. 1179–1180, Beijing, China, 24-28 July 2011
DOI: 10.1145/2009916.2010108Metrics:
See at:
dl.acm.org | doi.org | CNR ExploRA
2012
Conference article
Open Access
Scheduling queries across replicas
Freire A., Macdonald C., Tonellotto N., Ounis I., Cacheda F.For increased efficiency, an information retrieval system can split its index into multiple shards, and then replicate these shards across many query servers. For each new query, an appropriate replica for each shard must be selected, such that the query is answered as quickly as possible. Typically, the replica with the lowest number of queued queries is selected. However, not every query takes the same time to execute, particularly if a dynamic pruning strategy is applied by each query server. Hence, the replica's queue length is an inaccurate indicator of the workload of a replica, and can result in inefficient usage of the replicas. In this work, we propose that improved replica selection can be obtained by using query efficiency prediction to measure the expected workload of a replica. Experiments are conducted using 2.2k queries, over various numbers of shards and replicas for the large GOV2 collection. Our results show that query waiting and completion times can be markedly reduced, showing that accurate response time predictions can improve scheduling accuracy and attesting the benefit of the proposed scheduling algorithm.Source: 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1139–1140, Portland, OR, USA, 12-16 August 2012
DOI: 10.1145/2348283.2348508Metrics:
See at:
www.dcs.gla.ac.uk | dl.acm.org | doi.org | CNR ExploRA
2012
Conference article
Open Access
Learning to predict response times for online query scheduling
Macdonald C., Tonellotto N., Ounis I.Dynamic pruning strategies permit efficient retrieval by not fully scoring all postings of the documents matching a query - without degrading the retrieval effectiveness of the topranked results. However, the amount of pruning achievable for a query can vary, resulting in queries taking different amounts of time to execute. Knowing in advance the execution time of queries would permit the exploitation of online algorithms to schedule queries across replicated servers in order to minimise the average query waiting and completion times. In this work, we investigate the impact of dynamic pruning strategies on query response times, and propose a framework for predicting the efficiency of a query. Within this framework, we analyse the accuracy of several query efficiency predictors across 10,000 queries submitted to in-memory inverted indices of a 50-million-document Web crawl. Our results show that combining multiple efficiency predictors with regression can accurately predict the response time of a query before it is executed. Moreover, using the efficiency predictors to facilitate online scheduling algorithms can result in a 22% reduction in the mean waiting time experienced by queries before execution, and a 7% reduction in the mean completion time experienced by users.Source: 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 621–630, Portland, OR, USA, 12-16 August 2012
DOI: 10.1145/2348283.2348367Project(s): SMART Metrics:
See at:
www.dcs.gla.ac.uk | dl.acm.org | doi.org | CNR ExploRA
2012
Conference article
Restricted
Effect of dynamic pruning safety on learning to rank effectiveness
Macdonald C., Tonellotto N., Ounis I.A dynamic pruning strategy, such as Wand, enhances retrieval efficiency without degrading effectiveness to a given rank K, known as safe-to-rank-K. However, it is also possible for Wand to obtain more efficient but unsafe retrieval without actually significantly degrading effectiveness. On the other hand, in a modern search engine setting, dynamic pruning strategies can be used to efficiently obtain the set of documents to be re-ranked by the application of a learned model in a learning to rank setting. No work has examined the impact of safeness on the effectiveness of the learned model. In this work, we investigate the impact of Wand safeness through experiments using 150 TREC Web track topics. We find that unsafe Wand is biased towards documents with lower docids, thereby impacting effectivenessSource: 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1051–1052, Portland, OR, USA, 12-16 August 2012
DOI: 10.1145/2348283.2348464Metrics:
See at:
dl.acm.org | doi.org | CNR ExploRA
2013
Conference article
Open Access
Efficient and effective retrieval using selective pruning
Tonellotto N., Macdonald C., Ounis I.Retrieval can be made more efficient by deploying dynamic pruning strategies such as Wand, which do not degrade effectiveness up to a given rank. It is possible to increase the efficiency of such techniques by pruning more 'aggressively'. However, this may reduce effectiveness. In this work, we propose a novel selective framework that determines the appropriate amount of pruning aggressiveness on a per-query basis, thereby increasing overall efficiency without significantly reducing overall effectiveness. We postulate two hypotheses about the queries that should be pruned more aggressively, which generate two approaches within our framework, based on query performance predictors and query efficiency predictors, respectively. We thoroughly experiment to ascertain the efficiency and effectiveness impacts of the proposed approaches, as part of a search engine deploying state-of-the-art learning to rank techniques. Our results on 50 million documents of the TREC ClueWeb09 collection show that by using query efficiency predictors to target inefficient queries, we observe that a 36% reduction in mean response time and a 50% reduction of the response times experienced by the slowest 10% of queries can be achieved while still ensuring effectiveness.Source: WSDM 2013 - Sixth ACM International Conference on Web Search and Data Mining, pp. 63–72, Roma, Italy, 4-8 February 2013
DOI: 10.1145/2433396.2433407Metrics:
See at:
www.dcs.gla.ac.uk | dl.acm.org | doi.org | CNR ExploRA
2013
Conference article
Restricted
Hybrid query scheduling for a replicated search engine
Freire A., Macdonald C., Tonellotto N., Ounis I., Cacheda F.Search engines use replication and distribution of large indices across many query servers to achieve efficient retrieval. Under high query load, queries can be scheduled to replicas that are expected to be idle soonest, facilitated by the use of predicted query response times. However, the overhead of making response time predictions can hinder the usefulness of query scheduling under low query load. In this paper, we propose a hybrid scheduling approach that combines the scheduling methods appropriate for both low and high load conditions, and can adapt in response to changing conditions. We deploy a simulation framework, which is prepared with actual and predicted response times for real Web search queries for one full day. Our experiments using different numbers of shards and replicas of the 50 million document ClueWeb09 corpus show that hybrid scheduling can reduce the average waiting times of one day of queries by 68% under high load conditions and by 7% under low load conditions w.r.t. traditional scheduling methods.Source: ECIR 2013 - Advances in Information Retrieval. 35th European Conference on IR Research, pp. 435–446, Moscow, Russia, 24-27 March 2013
DOI: 10.1007/978-3-642-36973-5_37Metrics:
See at:
doi.org | link.springer.com | CNR ExploRA
2014
Conference article
Open Access
A self-adapting latency/power tradeoff model for replicated search engines
Freire A., Macdonald C., Tonellotto N., Ounis I., Cacheda F.For many search settings, distributed/replicated search en- gines deploy a large number of machines to ensure efficient retrieval. This paper investigates how the power consump- tion of a replicated search engine can be automatically re- duced when the system has low contention, without com- promising its efficiency. We propose a novel self-adapting model to analyse the trade-off between latency and power consumption for distributed search engines. When query volumes are high and there is contention for the resources, the model automatically increases the necessary number of active machines in the system to maintain acceptable query response times. On the other hand, when the load of the sys- tem is low and the queries can be served easily, the model is able to reduce the number of active machines, leading to power savings. The model bases its decisions on exam- ining the current and historical query loads of the search engine. Our proposal is formulated as a general dynamic decision problem, which can be quickly solved by dynamic programming in response to changing query loads. Thor- ough experiments are conducted to validate the usefulness of the proposed adaptive model using historical Web search traffic submitted to a commercial search engine. Our results show that our proposed self-adapting model can achieve an energy saving of 33% while only degrading mean query com- pletion time by 10 ms compared to a baseline that provisions replicas based on a previous day's traffic.Source: WSDM'14 - 7th ACM International Conference on Web Search and Data Mining, New York, USA, 24-28 February 2014
DOI: 10.1145/2556195.2556246Project(s): MIDAS ,
SMART Metrics:
See at:
Enlighten | terrierteam.dcs.gla.ac.uk | dl.acm.org | doi.org | CNR ExploRA
2014
Conference article
Restricted
Towards green information retrieval: studying the suitability of queueing theory in reducing power consumption
Freire A., Macdonald C., Tonellotto N., Ounis I., Cacheda F.Many efforts are being dedicated to reduce the power consumed by large data centres. However, less research has focused on developing environmentally-friendly search engines. This paper contributes to Green Information Retrieval by defining a mathematical model to automatically vary the number of powered-on servers of a replicated search engine, based on the load of the system. Following general-purpose data centres, we base our approach in Queueing Theory, in order to study its behaviour when being applied into a search engine. Results show how our model can achieve energy savings by maintaining excellent values for the latency, and it also allows us to study the shortcomings of Queueing Theory into Information Retrieval.Source: 3rd Spanish Conference on Information Retrieval, Universidade da Coruña, Spain, 19-20 June 2014
See at:
ceri2014.udc.es | CNR ExploRA
2015
Conference article
Open Access
Load-sensitive CPU Power Management for Web Search Engines
Catena M., Macdonald C., Tonellotto N.Web search engine companies require power-hungry data centers with thousands of servers to efficiently perform searches on a large scale. This permits the search engines to serve high arrival rates of user queries with low latency, but poses economical and environmental concerns due to the power consumption of the servers. Existing power saving techniques sacrifice the raw performance of a server for reduced power absorption, by scaling the frequency of the server's CPU according to its utilization. For instance, current Linux kernels include frequency governors i.e., mechanisms designed to dynamically throttle the CPU operational frequency. However, such general-domain techniques work at the operating system level and have no knowledge about the querying operations of the server. In this work, we propose to delegate CPU power management to search engine-specific governors. These can leverage knowledge coming from the querying operations, such as the query server utilization and load. By exploiting such additional knowledge, we can appropriately throttle the CPU frequency thereby reducing the query server power consumption. Experiments are conducted upon the TREC ClueWeb09 corpus and the query stream from the MSN 2006 query log. Results show that we can reduce up to ~24% a server power consumption, with only limited drawbacks in effectiveness w.r.t. a system running at maximum CPU frequency to promote query processing quality.Source: 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 751–754, Santiago del Cile, 09-13/08/2015
DOI: 10.1145/2766462.2767809Metrics:
See at:
Enlighten | ISTI Repository | dl.acm.org | doi.org | CNR ExploRA
2015
Contribution to conference
Restricted
A study on query energy consumption in web search engines
Catena M., Tonellotto N.Commercial web search engines are usually deployed on data centers, which leverage thousands of servers to eficiently answer queries on a large scale. Thanks to these distributed infrastructures, search engines can quickly serve high query volumes. However, the energy consumed by these many servers poses economical and environmental challenges for the Web search engine companies. To tackle such challenges, we advocate the importance of quantifying the energy consumption of a search engine. Therefore, in this study we experimentally analyze energy consumption on a per query basis. Our aim is to evaluate how much energy is consumed by a search server to answer a single query, i.e, its query energy consumption. To perform such measurements, experiments are conducted using the TREC ClueWeb09 collection and the MSN 2006 query log. Results suggest that solving queries require an amount of energy directly proportional to the query processing time.Source: 6th Italian Information Retrieval Workshop, Cagliari, Italy, 25-26/05/2015
See at:
ceur-ws.org | CNR ExploRA
2011
Report
Unknown
Adaptive instantiation of service workflows using a chemical approach
Tonellotto N., Giordano M., Di Napoli C.Service oriented technologies allow Service Based Applica- tions (SBAs) to be easily built by composing independent services avail- able in a network and provided by many actors under conditions that may change in time. Therefore services need to be dynamically selected and composed when an SBA is required along with parameters repre- senting the service delivery conditions. In this paper we propose to use a chemical computational approach to model the process of selecting the required service functionalities with the required conditions as an evolv- ing and always running middleware mechanism. The chemical evolving behaviour of the middleware allows to take into account environmental changes coming from both the providers and users side.Source: ISTI Technical reports, 2011
See at:
CNR ExploRA
2016
Conference article
Restricted
Exploiting green energy to reduce the operational costs of multi-center Web search engines
Blanco R., Catena M., Tonellotto N.Carbon dioxide emissions resulting from fossil fuels (brown energy) combustion are the main cause of global warming due to the greenhouse effect. Large IT companies have recently increased their efforts in reducing the carbon dioxide footprint originated from their data center electricity consumption. On one hand, better infrastructure and modern hardware allow for a more efficient usage of electric resources. On the other hand, data-centers can be powered by renewable sources (green energy) that are both environmental friendly and economically convenient. In this paper, we tackle the problem of targeting the usage of green energy to minimize the expenditure of running multi-center Web search engines, i.e., systems composed by multiple, geographically remote, computing facilities. We propose a mathematical model to minimize the operational costs of multi-center Web search engines by exploiting renewable energies whenever available at different locations. Using this model, we design an algorithm which decides what fraction of the incoming query load arriving into one processing facility must be forwarded to be processed at different sites to use green energy sources. We experiment using real traffic from a large search engine and we compare our model against state of the art baselines for query forwarding. Our experimental results show that the proposed solution maintains an high query throughput, while reducing by up to ~25% the energy operational costs of multi-center search engines. Additionally, our algorithm can reduce the brown energy consumption by almost 6% when energy-proportional servers are employed.Source: 25th International Conference on World Wide Web, pp. 1237–1247, Montreal, Canada, 11-15 April 2016
DOI: 10.1145/2872427.2883021Metrics:
See at:
dl.acm.org | doi.org | CNR ExploRA
2017
Conference article
Open Access
Upper bound approximations for BlockMaxWand
Macdonald C., Tonellotto N.BlockMaxWand is a recent advance on the Wand dynamic pruning technique, which allows efficient retrieval without any e.ectiveness degradation to rank K. However, while BMW uses docid-sorted indices, it relies on recording the upper bound of the term weighting model scores for each block of postings in the inverted index. Such a requirement can be disadvantageous in situations such as when an index must be updated. In this work, we examine the appropriateness of upper-bound approximation - which have previously been shown suitable for Wand- in providing efficient retrieval for BMW. Experiments on the ClueWeb12 category B13 corpus using 5000 queries from a real search engine's query log demonstrate that BMW still provides benefits w.r.t. Wand when approximate upper bounds are used, and that, if approximations on upper bounds are tight, BMW with approximate upper bounds can provide eficiency gains w.r.t. Wand with exact upper bounds, in particular for queries of short to medium length.Source: ICTIR 2017 - 3rd ACM International Conference on the Theory of Information Retrieval, pp. 273–276, Amsterdam, The Netherlands, 1-4 October, 2017
DOI: 10.1145/3121050.3121094Metrics:
See at:
Enlighten | ISTI Repository | dl.acm.org | doi.org | CNR ExploRA