Page 1 of 6

2005 Other Open Access

A proposal for a generic grid scheduling architecture
Tonellotto N, Wieder P, Yahyapour R
In the past years, many Grids have been deployed and became commodity systems in production environments. While several Grid scheduling systems have already been implemented, they still provide only ``ad hoc'' and domain-specific solutions to the problem of scheduling resources in a Grid. However, no common and generic Grid scheduling system has emerged yet. In this work we identify generic features of three common Grid scheduling scenarios, and we introduce a single entity that we call scheduling instance that can be used as a building block for the scheduling solutions presented. We identify the behavior that a scheduling instance must exhibit in order to be composed with other instances to build Grid scheduling systems discussed, and their interactions with other Grid functionalities. This work can be used as a foundation for designing common Grid scheduling infrastructures.

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2010 Conference article Restricted

Efficient dynamic pruning with proximity support
Tonellotto N, Macdonald C, Ounis I
Modern retrieval approaches apply not just single-term weighting models when ranking documents - instead, proximity weighting models are in common use, which highly score the co-occurrence of pairs of query terms in close proximity to each other in documents. The adoption of these proximity weighting models can cause a computational overhead when documents are scored, negatively impacting the efficiency of the retrieval process. In this paper, we discuss the integration of proximity weighting models into efficient dynamic pruning strategies. In particular, we propose to modify document-at-a-time strategies to include proximity scoring without any modifications to pre-existing index structures. Our resulting two-stage dynamic pruning strategies only consider single query terms during first stage pruning, but can early terminate the proximity scoring of a document if it can be shown that it will never be retrieved. We empirically examine the efficiency benefits of our approach using a large Web test collection of 50 million documents and 10,000 queries from a real query log. Our results show that our proposed two-stage dynamic pruning strategies are considerably more efficient than the original strategies, particularly for queries of 3 or more terms.

See at: CNR IRIS Restricted | CNR IRIS

2006 Other Restricted

C++ : esercizi di programmazione sulle classi. Manuale per un corso universitario
Lopriore L, Tonellotto N
Raccolta di esercizi sul linguaggio di programmazione C++ con particolare riferimento allo sviluppo e all'implementazione di strutture dati astratte utilizzando meccanismi orientati agli oggetti.

See at: CNR IRIS Restricted | CNR IRIS

2011 Contribution to book Restricted

Adaptive instantiation of service workflows using a chemical approach
Di Napoli C, Giordano M, Németh Z, Tonellotto N
Service oriented technologies allow Service Based Applications (SBAs) to be easily built by composing independent services available in a network and provided by many actors under conditions that may change in time. Therefore services need to be dynamically selected and composed when an SBA is required along with parameters representing the service delivery conditions. In this paper we propose to use a chemical computational approach to model the process of selecting the required service functionalities with the required conditions as an evolving and always running middleware mechanism. The chemical evolving behaviour of the middleware allows to take into account environmental changes coming from both the providers and users side.DOI: 10.1007/978-3-642-21878-1_31
Project(s): S-CUBE via OpenAIRE

Metrics:

See at: doi.org Restricted | CNR IRIS | CNR IRIS | link.springer.com

2009 Other Restricted

Setting up a Mac OS X 10.4 cluster for distributed computing
Tonellotto N, Bartoli G, Bravi M
In this document we outline a simple procedure for setting up a Mac OS X 10.4 cluster for distributed computing. The cluster will consist of eight XServe G4 headless nodes running Mac OS X 10.4 Client and a single XServe G4 node running Mac OS X 10.4 Server. The user requirements for the cluster are simple: a centralized management of the accounts on the cluster nodes and the sharing of the users' home directories, with no special security requirement. This document should be a usable reference for a system administrator to set up other Mac OS X clusters with similar requirements.

See at: CNR IRIS Restricted | CNR IRIS

2010 Conference article Restricted

Using chemical reactions to model service composition
Di Napoli C, Giordano M, Németh Z, Tonellotto N
Internet is evolving from a network of computers and information into a network of services allowing applications to be built by selecting services and composing them in a loosely coupled manner. These Service Based Applications (SBA) are composed of a number of possibly independent services that are provided by many actors under different conditions (like price, time to deliver, and so on). Service provision conditions may change in time depending on provider policies or other environmental changes, so it is necessary to organize compositions of services on demand in response to dynamic requirements and circumstances. In this paper we propose to use a chemical computational model to address this problem by decoupling the process of finding services composing an SBA requested by a user, from their actual enactment. An SBA request is described in terms of an abstract workflow where only service functionalities of the single components and their execution order (i.e. the application control flow) are specified, along with parameters representing the conditions under which the user expects the application to be delivered. The proposed approach allows to model the process of instantiating the required functionalities with actual service implementations as an evolving and always running middleware mechanism that can take into account the current state of the context when the composition is required. Furthermore, the evolutionary nature of the chemical system provides a form of adaptation since once compositions of services are computed with the available services, new compositions can be computed as soon as new services become available or the conditions of existing ones change.DOI: 10.1145/1809036.1809047
Metrics:

See at: dl.acm.org Restricted | ACM Digital Library | doi.org | CNR IRIS | CNR IRIS

2009 Conference article Restricted

A chemical metaphor to model service selection for composition of services
Di Napoli C, Giordano M, Németh Z, Tonellotto N
In the context of Internet of Services (IoS), Service Based Applications are composed of a number of possibly independent services that are available in a network and provided by many actors under different conditions (like price, time to deliver, and so on). Service provision conditions may change in time depending on provider policies, and as such they cannot be statically advertised together with the service description. In this paper we propose and investigate the possibility to use the chemical computational model to address the problem of finding compositions of services that satisfy time constraints coming from the structure of an abstract workflow against the time availability associated to each service component.

See at: CNR IRIS Restricted | CNR IRIS

2011 Journal article Restricted

Upper bound approximations for dynamic pruning
Macdonald C, Ounis I, Tonellotto N
Dynamic pruning strategies for information retrieval systems can increase querying efficiency without decreasing effectiveness by using upper bounds to safely omit scoring documents that are unlikely to make the final retrieved set. Often, such upper bounds are pre-calculated at indexing time for a given weighting model. However, this precludes changing, adapting or training the weighting model without recalculating the upper bounds. Instead, upper bounds should be approximated at querying time from various statistics of each term to allow on-the-fly adaptation of the applied retrieval strategy. This article, by using uniform notation, formulates the problem of determining a term upper-bound given a weighting model and discusses the limitations of existing approximations. Moreover, we propose an upper-bound approximation using a constrained nonlinear maximization problem. We prove that our proposed upper-bound approximation does not impact the retrieval effectiveness of several modern weighting models from various different families. We also show the applicability of the approximation for the Markov Random Field proximity model. Finally, we empirically examine how the accuracy of the upper-bound approximation impacts the number of postings scored and the resulting efficiency in the context of several large Web test collections.Source: ACM TRANSACTIONS ON INFORMATION SYSTEMS (ONLINE), vol. 29 (issue 4)
DOI: 10.1145/2037661.2037662
Project(s): S-CUBE via OpenAIRE

Metrics:

See at: ACM Transactions on Information Systems Restricted | CNR IRIS | CNR IRIS

2011 Conference article Open Access

Query efficiency prediction for dynamic pruning
Tonellotto Nicola, Ounis Iadh, Macdonald Craig
Dynamic pruning strategies are effective yet permit efficient retrieval by pruning - i.e. not fully scoring all postings of all documents matching a given query. However, the amount of pruning possible for a query can vary, resulting in queries with similar properties (query length, total numbers of postings) taking different amounts of time to retrieve search results. In this work, we investigate the causes for inefficient queries, identifying reasons such as the balance between informativeness of query terms, and the distribution of retrieval scores within the posting lists. Moreover, we note the advantages in being able to predict the efficiency of a query, and propose various query efficiency predictors. Using 10,000 queries and the TREC ClueWeb09 category B corpus for evaluation, we find that combining predictors using regression can accurately predict query response time.DOI: 10.1145/2064730.2064734
Metrics:

See at: www.dcs.gla.ac.uk Open Access | dl.acm.org Restricted | doi.org | CNR IRIS

2011 Conference article Restricted

On upper bounds for dynamic pruning
Tonellotto Nicola, Ounis Iadh, Macdonald Craig
Dynamic pruning strategies enhance the efficiency of search engines, by making use of term upper bounds to decide when a document will not make the final set of k retrieved documents. After discussing different approaches for obtaining term upper bounds, we propose the use of multiple least upper bounds. Experiments are conducted on the TREC ClueWeb09 corpus, to measure the accuracy of different upper bounds.DOI: 10.1007/978-3-642-23318-0_29
Metrics:

See at: doi.org Restricted | CNR IRIS | CNR IRIS | www.springerlink.com

2011 Conference article Restricted

Effect of different docid orderings on dynamic pruning retrieval strategies
Tonellotto N, Ounis I, Macdonald C
Document-at-a-time (DAAT) dynamic pruning strategies for information retrieval systems such as textsc{MaxScore} and textsc{Wand} can increase querying efficiency without decreasing effectiveness. Both work on posting lists sorted by ascending document identifier (docid). The order in which docids are assigned -- and hence the order of postings in the posting lists -- is known to have a noticeable impact on posting list compression. However, the resulting impact on dynamic pruning strategies is not well understood. In this poster, we examine the impact on the efficiency of these strategies across different docid orderings, by experimenting using the TREC ClueWeb09 corpus. We find that while the number of postings scored by dynamic pruning strategies do not markedly vary for different docid orderings, the ordering still has a marked impact on mean query response time. Moreover, when docids are assigned by lexicographical URL ordering, the benefit to response time for is more pronounced for textsc{Wand} than for textsc{MaxScore}.DOI: 10.1145/2009916.2010108
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR IRIS | CNR IRIS

2012 Conference article Open Access

Scheduling queries across replicas
Freire A, Macdonald C, Tonellotto N, Ounis I, Cacheda F
For increased efficiency, an information retrieval system can split its index into multiple shards, and then replicate these shards across many query servers. For each new query, an appropriate replica for each shard must be selected, such that the query is answered as quickly as possible. Typically, the replica with the lowest number of queued queries is selected. However, not every query takes the same time to execute, particularly if a dynamic pruning strategy is applied by each query server. Hence, the replica's queue length is an inaccurate indicator of the workload of a replica, and can result in inefficient usage of the replicas. In this work, we propose that improved replica selection can be obtained by using query efficiency prediction to measure the expected workload of a replica. Experiments are conducted using 2.2k queries, over various numbers of shards and replicas for the large GOV2 collection. Our results show that query waiting and completion times can be markedly reduced, showing that accurate response time predictions can improve scheduling accuracy and attesting the benefit of the proposed scheduling algorithm.DOI: 10.1145/2348283.2348508
Metrics:

See at: www.dcs.gla.ac.uk Open Access | dl.acm.org Restricted | doi.org | CNR IRIS | CNR IRIS

2012 Conference article Open Access

Learning to predict response times for online query scheduling
Macdonald C, Tonellotto N, Ounis I
Dynamic pruning strategies permit efficient retrieval by not fully scoring all postings of the documents matching a query - without degrading the retrieval effectiveness of the topranked results. However, the amount of pruning achievable for a query can vary, resulting in queries taking different amounts of time to execute. Knowing in advance the execution time of queries would permit the exploitation of online algorithms to schedule queries across replicated servers in order to minimise the average query waiting and completion times. In this work, we investigate the impact of dynamic pruning strategies on query response times, and propose a framework for predicting the efficiency of a query. Within this framework, we analyse the accuracy of several query efficiency predictors across 10,000 queries submitted to in-memory inverted indices of a 50-million-document Web crawl. Our results show that combining multiple efficiency predictors with regression can accurately predict the response time of a query before it is executed. Moreover, using the efficiency predictors to facilitate online scheduling algorithms can result in a 22% reduction in the mean waiting time experienced by queries before execution, and a 7% reduction in the mean completion time experienced by users.DOI: 10.1145/2348283.2348367
Project(s): SMART via OpenAIRE

Metrics:

See at: www.dcs.gla.ac.uk Open Access | dl.acm.org Restricted | doi.org | CNR IRIS | CNR IRIS

2012 Conference article Restricted

Effect of dynamic pruning safety on learning to rank effectiveness
Macdonald C, Tonellotto N, Ounis I
A dynamic pruning strategy, such as Wand, enhances retrieval efficiency without degrading effectiveness to a given rank K, known as safe-to-rank-K. However, it is also possible for Wand to obtain more efficient but unsafe retrieval without actually significantly degrading effectiveness. On the other hand, in a modern search engine setting, dynamic pruning strategies can be used to efficiently obtain the set of documents to be re-ranked by the application of a learned model in a learning to rank setting. No work has examined the impact of safeness on the effectiveness of the learned model. In this work, we investigate the impact of Wand safeness through experiments using 150 TREC Web track topics. We find that unsafe Wand is biased towards documents with lower docids, thereby impacting effectivenessDOI: 10.1145/2348283.2348464
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR IRIS | CNR IRIS

2013 Conference article Open Access

Efficient and effective retrieval using selective pruning
Tonellotto N, Macdonald C, Ounis I
Retrieval can be made more efficient by deploying dynamic pruning strategies such as Wand, which do not degrade effectiveness up to a given rank. It is possible to increase the efficiency of such techniques by pruning more 'aggressively'. However, this may reduce effectiveness. In this work, we propose a novel selective framework that determines the appropriate amount of pruning aggressiveness on a per-query basis, thereby increasing overall efficiency without significantly reducing overall effectiveness. We postulate two hypotheses about the queries that should be pruned more aggressively, which generate two approaches within our framework, based on query performance predictors and query efficiency predictors, respectively. We thoroughly experiment to ascertain the efficiency and effectiveness impacts of the proposed approaches, as part of a search engine deploying state-of-the-art learning to rank techniques. Our results on 50 million documents of the TREC ClueWeb09 collection show that by using query efficiency predictors to target inefficient queries, we observe that a 36% reduction in mean response time and a 50% reduction of the response times experienced by the slowest 10% of queries can be achieved while still ensuring effectiveness.DOI: 10.1145/2433396.2433407
Metrics:

See at: www.dcs.gla.ac.uk Open Access | dl.acm.org Restricted | doi.org | CNR IRIS | CNR IRIS

2013 Conference article Restricted

Hybrid query scheduling for a replicated search engine
Freire A, Macdonald C, Tonellotto N, Ounis I, Cacheda F
Search engines use replication and distribution of large indices across many query servers to achieve efficient retrieval. Under high query load, queries can be scheduled to replicas that are expected to be idle soonest, facilitated by the use of predicted query response times. However, the overhead of making response time predictions can hinder the usefulness of query scheduling under low query load. In this paper, we propose a hybrid scheduling approach that combines the scheduling methods appropriate for both low and high load conditions, and can adapt in response to changing conditions. We deploy a simulation framework, which is prepared with actual and predicted response times for real Web search queries for one full day. Our experiments using different numbers of shards and replicas of the 50 million document ClueWeb09 corpus show that hybrid scheduling can reduce the average waiting times of one day of queries by 68% under high load conditions and by 7% under low load conditions w.r.t. traditional scheduling methods.DOI: 10.1007/978-3-642-36973-5_37
Metrics:

See at: doi.org Restricted | CNR IRIS | CNR IRIS | link.springer.com

2014 Conference article Open Access

A self-adapting latency/power tradeoff model for replicated search engines
Freire A, Macdonald C, Tonellotto N, Ounis I, Cacheda F
For many search settings, distributed/replicated search en- gines deploy a large number of machines to ensure efficient retrieval. This paper investigates how the power consump- tion of a replicated search engine can be automatically re- duced when the system has low contention, without com- promising its efficiency. We propose a novel self-adapting model to analyse the trade-off between latency and power consumption for distributed search engines. When query volumes are high and there is contention for the resources, the model automatically increases the necessary number of active machines in the system to maintain acceptable query response times. On the other hand, when the load of the sys- tem is low and the queries can be served easily, the model is able to reduce the number of active machines, leading to power savings. The model bases its decisions on exam- ining the current and historical query loads of the search engine. Our proposal is formulated as a general dynamic decision problem, which can be quickly solved by dynamic programming in response to changing query loads. Thor- ough experiments are conducted to validate the usefulness of the proposed adaptive model using historical Web search traffic submitted to a commercial search engine. Our results show that our proposed self-adapting model can achieve an energy saving of 33% while only degrading mean query com- pletion time by 10 ms compared to a baseline that provisions replicas based on a previous day's traffic.DOI: 10.1145/2556195.2556246
Project(s): MIDAS via OpenAIRE

, SMART

Metrics:

2014 Conference article Restricted

Towards green information retrieval: studying the suitability of queueing theory in reducing power consumption
Freire A, Macdonald C, Tonellotto N, Ounis I, Cacheda F
Many efforts are being dedicated to reduce the power consumed by large data centres. However, less research has focused on developing environmentally-friendly search engines. This paper contributes to Green Information Retrieval by defining a mathematical model to automatically vary the number of powered-on servers of a replicated search engine, based on the load of the system. Following general-purpose data centres, we base our approach in Queueing Theory, in order to study its behaviour when being applied into a search engine. Results show how our model can achieve energy savings by maintaining excellent values for the latency, and it also allows us to study the shortcomings of Queueing Theory into Information Retrieval.

See at: ceri2014.udc.es Restricted | CNR IRIS | CNR IRIS

2015 Conference article Open Access

Load-sensitive CPU Power Management for Web Search Engines
Catena M, Macdonald C, Tonellotto N
Web search engine companies require power-hungry data centers with thousands of servers to efficiently perform searches on a large scale. This permits the search engines to serve high arrival rates of user queries with low latency, but poses economical and environmental concerns due to the power consumption of the servers. Existing power saving techniques sacrifice the raw performance of a server for reduced power absorption, by scaling the frequency of the server's CPU according to its utilization. For instance, current Linux kernels include frequency governors i.e., mechanisms designed to dynamically throttle the CPU operational frequency. However, such general-domain techniques work at the operating system level and have no knowledge about the querying operations of the server. In this work, we propose to delegate CPU power management to search engine-specific governors. These can leverage knowledge coming from the querying operations, such as the query server utilization and load. By exploiting such additional knowledge, we can appropriately throttle the CPU frequency thereby reducing the query server power consumption. Experiments are conducted upon the TREC ClueWeb09 corpus and the query stream from the MSN 2006 query log. Results show that we can reduce up to ~24% a server power consumption, with only limited drawbacks in effectiveness w.r.t. a system running at maximum CPU frequency to promote query processing quality.DOI: 10.1145/2766462.2767809
Metrics:

2015 Conference article Restricted

A study on query energy consumption in web search engines
Catena M, Tonellotto N
Commercial web search engines are usually deployed on data centers, which leverage thousands of servers to eficiently answer queries on a large scale. Thanks to these distributed infrastructures, search engines can quickly serve high query volumes. However, the energy consumed by these many servers poses economical and environmental challenges for the Web search engine companies. To tackle such challenges, we advocate the importance of quantifying the energy consumption of a search engine. Therefore, in this study we experimentally analyze energy consumption on a per query basis. Our aim is to evaluate how much energy is consumed by a search server to answer a single query, i.e, its query energy consumption. To perform such measurements, experiments are conducted using the TREC ClueWeb09 collection and the MSN 2006 query log. Results suggest that solving queries require an amount of energy directly proportional to the query processing time.Source: CEUR WORKSHOP PROCEEDINGS. Cagliari, Italy, 25-26/05/2015

See at: ceur-ws.org Restricted | CNR IRIS | CNR IRIS