2004
Conference article
Metadata Only Access
Scheduling and load balancing
Luque E, Castaños Jg, Markatos Ep, Perego RScheduling and Load Balancing techniques are key issues for the performance of applications executed in parallel and distributed environments, and for the efficient utilization of these computational resources. Research in this field has a long history and is well consolidated. Nevertheless, the evolution of parallel and distributed systems toward clusters, computational grids, and global computing environments, introduces new challenging problems that require a new generation of scheduling and load balancing algorithms. Topic 3 in Euro-Par 2004 covers all aspects related to scheduling and load balancing from application and system levels, to theoretical foundations and practical tools. All these aspects are addressed by contributed papers.
See at:
CNR IRIS
2004
Conference article
Restricted
Statistical properties of transactional databases
Palmerini P, Orlando S, Perego RMost of the complexity of common data mining tasks is due to the unknown amount of information contained in the data being mined. The more patterns and corelations are contained in such data, the more resources are needed to extract them. This is confirmed by the fact that in general there is not a single best algorithm for a given data mining task on any possible kind of input dataset. Rather, in order to achieve good performances, strategies and optimizations have to be adopted according to the dataset specific characteristics. For example one typical distinction in transactional databases is between sparse and dense datasets. In this paper we consider Frequent Set Counting as a case study for data mining algorithms. We propose a statistical analysis of the properties of transactional datasets that allows for a characterization of the dataset complexity. We show how such characterization can be used in many fields, from performance prediction to optimization.
See at:
CNR IRIS
| CNR IRIS
| portal.acm.org
2006
Conference article
Restricted
Mining frequent closed itemsets out-of-core
Lucchese C, Orlando S, Perego RExtracting frequent itemsets is an important task in many data mining applications. When data are very large, it becomes mandatory to perform the mining task by using an external memory algorithm, but only a few of these algorithms have been proposed so far. Since also the result set of all the frequent itemsets is likely to be undesirably large, condensed representations, such as closed itemsets, have recently gained a lot of attention. In this paper we discuss the limitations of the partitioning techniques adopted by external memory algorithms for extracting all the frequent itemsets, when applied to closed itemsets mining. The main issue is that the closedness of an itemset cannot be evaluated only using the local knowledge available in a single partition of the input dataset. A further step is thus needed to correctly merge the partial results. We introduce the first algorithm for mining closed itemsets out of core. The algorithm exploits a divide-et-impera approach, where the input dataset is split into smaller partitions, such that not only they can be loaded, but also they can be mined entirely into the main memory. Moreover, we devised a simple technique based on a new theoretical result that allows us to reduce the problem of merging partial solutions to an external memory sorting problem.
See at:
CNR IRIS
| CNR IRIS
| www.siam.org
2001
Conference article
Restricted
Enhancing the apriori algorithm for frequent set counting
Orlando S, Palmerini P, Perego RIn this paper we propose DCP, a new algorithm for solv- ing the Frequent Set Counting problem, which enhances Apriori. Our goal was to optimize the initial iterations of Apriori, i.e. the most time consuming ones when datasets characterized by short or medium length frequent patterns are considered. The main improvements regard the use of an innovative method for storing candidate set of items and counting their support, and the exploitation of eective pruning techniques which signicantly reduce the size of the dataset as execution progresses.
See at:
CNR IRIS
| CNR IRIS
2007
Other
Metadata Only Access
See at:
CNR IRIS
2017
Book
Open Access
Proceedings of the 8th Italian Information Retrieval Workshop
Crestani F, Di Noia T, Perego RThis volume contains the papers presented at IIR'17: 8th Italian Information Retrieval Workshop held on June 05-07, 2017 in Lugano, Switzerland. The purpose of the Italian Information Retrieval (IIR) workshop series is to provide a forum for stimulating and disseminating research in information retrieval, where Italian researchers (especially young ones) and researchers a liated with Italian institutions can network and discuss their research results in an informal way. Previously IIR workshops took place in Venice (2016), Cagliari (2015), Rome (2014), Pisa (2013), Bari (2012), Milan (2011) and Padua (2010).Source: CEUR WORKSHOP PROCEEDINGS
See at:
ceur-ws.org
| CNR IRIS
| ISTI Repository
| CNR IRIS
2018
Journal article
Open Access
From Evaluating to Forecasting Performance: How to Turn Information Retrieval, Natural Language Processing and Recommender Systems into Predictive Sciences (Dagstuhl Perspectives Workshop 17442)
Ferro N, Fuhr N, Grefenstette G, Konstan Ja, Castells P, Daly Em, Declerck T, Ekstrand Md, Geyer W, Gonzalo J, Kuflik T, Lind'En K, Magnini B, Nie Jy, Perego R, Shapira B, Soboroff I, Tintarev N, Verspoor K, Willemsen Mc, Zobel JWe describe the state-of-the-art in performance modeling and prediction for Information Retrieval (IR), Natural Language Processing (NLP) and Recommender Systems (RecSys) along with its shortcomings and strengths. We present a framework for further research, identifying five major problem areas: understanding measures, performance analysis, making underlying assumptions
explicit, identifying application features determining performance, and the development of predic- tion models describing the relationship between assumptions, features and resulting performanceSource: DAGSTUHL MANIFESTOS, vol. 7 (issue 1), pp. 96-139
DOI: 10.4230/dagman.7.1.96Metrics:
See at:
drops.dagstuhl.de
| CNR IRIS
| ISTI Repository
| CNR IRIS
1992
Other
Metadata Only Access
See at:
CNR IRIS
2021
Conference article
Restricted
Hierarchical dependence-aware evaluation measures for conversational search
Faggioli G, Ferrante M, Ferro N, Perego R, Tonellotto NConversational agents are drawing a lot of attention in the information retrieval (IR) community also thanks to the advancements in language understanding enabled by large contextualized language models. IR researchers have long ago recognized the importance o fa sound evaluation of new approaches. Yet, the development of evaluation techniques for conversational search is still an underlooked problem. Currently, most evaluation approaches rely on procedures directly drawn from ad-hoc search evaluation, treating utterances in a conversation as independent events, as if they were just separate topics, instead of accounting for the conversation context. We overcome this issue by proposing a framework for defining evaluation measures that are aware of the conversation context and the utterance semantic dependencies. In particular, we model the conversations as Direct Acyclic Graphs (DAG), where self-explanatory utterances are root nodes, while anaphoric utterances are linked to sentences that contain their missing semantic information. Then,we propose a family of hierarchical dependence-aware aggregations of the evaluation metrics driven by the conversational graph. In our experiments, we show that utterances from the same conversation are 20% more correlated than utterances from different conversations. Thanks to the proposed framework, we are able to include such correlation in our aggregations, and be more accurate when determining which pairs of conversational systems are deemed significantly different.DOI: 10.1145/3404835.3463090Metrics:
See at:
dl.acm.org
| CNR IRIS
| CNR IRIS