16 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2010 Other Unknown
CIP PSP-BPN ASSETS project: Advanced Search Service and Enhanced Technological Solutions for the Europeana Digital Library
Lucchese C., Perego R., Silvestri F., Tonellotto N.
ASSETS aims to improve the usability of the Europeana Digital Library platform by designing, implementing and deploying large-scale, scalable services for search and browsing. These services include: efficient storing and indexing, searching based on metadata and on content similarity; advanced ranking algorithms; browsing through semantic cross-links; semi-automatic ingestion of metadata requiring normalization, cleaning, knowledge extraction and mapping to a common structure.

See at: CNR ExploRA


2010 Report Unknown
VISITO - G3.1 Relazione avanzamento progetto VISITO Tuscany
Amato G., Falchi F., Bolettieri P., Lucchese C., Scopigno R., La Torre F., Minelli S., Tavanti F., Scartoni R., Salvadori S., Zanetti N., Loschiavo D.
Questo documento descrive i risultati ottenuti dal progetto VISITO-Tuscany nei primi otto mesi di lavoro. Dopo una breve panoramica degli obbiettivi generali del progetto, si evidenzieranno i risultati previsti alla fine dell'ottavo mese e si esporrà quale è stato il lavoro effettivamente sostenuto e i risultati raggiunti, in maniera dettagliata per le varie attività previste.Source: Project report, VISITO Tuscany, pp.1–35, 2010

See at: CNR ExploRA


2010 Report Unknown
VISITO - Componenti per l'estrazione delle features dalle immagini
Lucchese C., Venturini R.
La ricerca efficiente di informazioni utilizza tecniche di indicizzazione dei dati al fine di soddisfare efficientemente ed efficacemente le interrogazioni sottomesse dagli utenti. In pratica gli indici rappresentano un'astrazione delle informazioni basata su bag-of-feature, l'insieme cioè delle caratteristiche più importanti (feature) di un generico oggetto, sia esso un documento testuale o un oggettomultimedialequaleunafotografiadigitale. Esistononumerositipidifeatureassociatiadun'immagineelaloroestrazione,effettuataconappositi software,écomputazionalmentecostosa.Seilnumerodiimmaginiègrande,iltempototalerichiestosuun computertradizionalediventaproibitivo.Importante,èperòporrel'accentosulfattocheogniimmagineha uninsiemedicaratteristichechenondipendedallealtreimmagini.Conseguenzadiquesto fattoèche si possonodisegnaretecnicheefficientidifeatureextractionbasatesutecnichedicalcoloparallelo.Loscopo diquestaattivitàèquindiquellodisvilupparecomponentiincuil'estrazionedifeaturesiaresaefficiente usandotecnologiedicalcoloadalteprestazioniallostatodell'artequali,adesempio,cloudcomputing. Inquestodocumento,vengonoillustratelefeaturediinteresseperilprogettoVISITO,evienedescrittoil softwaresviluppatoperlaloroestrazione.Abbiamopreferito,essendoquestoundeliverableprettamente tecnico,redigereilrestodeldocumentoinlinguainglese.Source: Project report, VISITO Tuscany, 2010

See at: CNR ExploRA


2010 Report Unknown
VISITO - Sviluppo del componente per l'indicizzazione dei dati
Lucchese C., Venturini R.
Lo scopo del sistema per la gestione dei dati è quello di permettere il recupero veloce ed efficiente dei metadati associati ai Punti di Interesse turistico (PoI) e alle foto. In questo documento, vengono descritti i metadati associati ai PoI e alle foto e presentate le funzionalità fornite dal sistema implementato. Per ciascuna di esse vengono ampiamente descritte le strategie e gli strumenti utilizzati per la sua implementazione.Source: Project report, VISITO Tuscany, 2010

See at: CNR ExploRA


2010 Report Unknown
VISITO Tuscany - Progetto dell'architettura della piattaforma VISITO Tuscany v1
Atzori M., Bazzoni G., Bolettieri P., La Torre F., Loschiavo D., Lucchese C., Manfrin S., Martinelli F., Melani A., Naldi C., Pironi A., Rubichi A., Venturini R., Zanetti N.
Il documento è inquadrato nell'Obiettivo Operativo 2 del Progetto VISITO Tuscany, nel quale viene elaborata la progettazione dell'intero sistema. In particolare in questo documento verrà descritta l'architettura del sistema sulla base del Reference Model for Open Distributed Processing (RM-ODP)che prevede cinque viste: enterprise, information, computational, engineering e technology.Source: Project report, VISITO Tuscany, 2010

See at: CNR ExploRA


2010 Conference article Unknown
Detecting task-based query sessions using collaborative knowledge
Lucchese C., Orlando S., Perego R., Silvestri F., Tolomei G.
Our research challenge is to provide a mechanism for splitting into user task-based sessions a long-term log of queries submitted to a Web Search Engine (WSE). The hypothesis is that some query sessions entail the concept of user task. We present an approach that relies on a centroid-based and a density-based clustering algorithm, which consider queries inter-arrival times and use a novel distance function that takes care of query lexical content and exploits the collaborative knowledge collected by Wiktionary and Wikipedia.Source: 2010 International Workshop on Intelligent Web Interaction, Toronto, Canada, 31 Agosto 2010

See at: CNR ExploRA


2010 Contribution to book Restricted
Workshop Report - LSDS-IR'10
Lucchese C., Blanco R., Cambazoglou B.
The size of theWeb as well as user bases of search systems continue to grow exponentially. Consequently, providing subsecond query response times and high query throughput become quite challenging for large-scale information retrieval systems. Distributing different aspects of search (e.g., crawling, indexing, and query processing) is essential to achieve scalability in large-scale information retrieval systems. The 8th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR'10) has provided a venue to discuss the current research challenges and identify new directions for distributed information retrieval. The workshop contained two industry talks as well as six research paper presentations. The hot topics in this year's workshop were collection selection architectures, application of MapReduce to information retrieval problems, similarity search, geographically distributed web search, and optimization techniques for search efficiency.DOI: 10.1145/1924475.1924486
Metrics:


See at: dl.acm.org Restricted | ACM SIGIR Forum Restricted | CNR ExploRA


2010 Report Unknown
VISITO Tuscany - Rapporto con la specifica dettagliata delle funzionalità della piattaforma VISITO Tuscany
Bolettieri P., Benedetti L., La Torre F., Loschiavo D., Lucchese C., Lungarotti F., Salvadori S., Scopigno R., Venturini R.
Questo documento è il rapporto con la specifica dettagliata delle funzionalità svolta all'interno del Progetto VISITO Tuscany a partire dal mese 4 fino al mese 5 nell'ambito dell'Obbiettivo Operativo 2, Attività A2.1 "Rapporto con la specifica dettagliata delle funzionalità della piattaforma Visito Tuscany". Il rapporto descriverà dettagliatamente le funzionalità offerte dal sistema, ponendo particolare attenzione nell'identificare funzionalità, di notevole importanza per i potenziali utenti, che non sono tuttora fornite da altri sistemi e che potranno essere realizzate capitalizzando sulla sinergia dei membri del consorzio del progetto e sulle loro attività pregresse.Source: Project report, VISITO Tuscany, 2010

See at: CNR ExploRA


2010 Journal article Closed Access
Rights protection of trajectory datasets with nearest-neighbor preservation
Lucchese C., Vlachos M., Yu P. S., Rayan D.
Companies frequently outsource datasets to mining firms, and academic institutions create repositories or share datasets in the interest of promoting research collaboration. Still, many practitioners have reservations about sharing or outsourcing datasets, primarily because of fear of losing the principal rights over the dataset. This work presents a way of convincingly claiming ownership rights over a trajectory dataset, without, at the same time, destroying the salient dataset characteristics, which are important for accurate search operations and data-mining tasks. The digital watermarking methodology that we present distorts imperceptibly a collection of sequences, effectively embedding a secret key, while retaining as well as possible the neighborhood of each object, which is vital for operations such as similarity search, classification, or clustering. A key contribution in this methodology is a technique for discovering the maximum distortion that still maintains such desirable properties. We demonstrate both analytically and empirically that the proposed dataset marking techniques can withstand a number of attacks (such a translation, rotation, noise addition, etc) and therefore can provide a robust framework for facilitating the secure dissemination of trajectory datasetsSource: The VLDB journal 19 (2010): 531–556. doi:10.1007/s00778-010-0178-6
DOI: 10.1007/s00778-010-0178-6
Metrics:


See at: The VLDB Journal Restricted | www.springerlink.com Restricted | CNR ExploRA


2010 Contribution to conference Restricted
A generative pattern model for mining binary datasets
Lucchese C., Perego R., Orlando S.
In many application fields, huge binary datasets modeling real life-phenomena are daily produced. These datasets record observations of some events, and people are often interested in mining them in order to recognize recurrent patterns. However, the discovery of the most important patterns is very challenging. For example, these patterns may overlap, or be related only to a particular subset of the observations. Finally, the mining can be hindered by the presence of noise. In this paper, we introduce a generative pattern model, and an associated cost model for evaluating the goodness of the set of patterns extracted from a binary dataset. We pro- pose an efficient algorithm, named GPM, for the discovery of the most relevant patterns according to the model. We show that the proposed model generalizes other approaches and supports the discovery of high quality patterns.Source: 25th ACM Symposium On Applied Computing, pp. 1109–1110, Crans Montana, Switzerland, March 22,26
DOI: 10.1145/1774088.1774320
Metrics:


See at: dl.acm.org Restricted | doi.org Restricted | CNR ExploRA


2010 Conference article Restricted
Mining top-K patterns from binary datasets in presence of noise
Lucchese C., Orlando S., Perego R.
The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. In this paper we formalize the problem of discovering the Top-K patterns from binary datasets in presence of noise, as the minimization of a novel cost function. According to the Minimum Description Length principle, the proposed cost function favors succinct pattern sets that may approximately describe the input data. We propose a greedy algorithm for the discovery of Patterns in Noisy Datasets, named PaNDa, and show that it outperforms related techniques on both synthetic and realworld data.Source: Tenth SIAM International Conference on Data Mining, pp. 165–176, Columbus, Ohio, US, April 29 - May 1 2010

See at: www.siam.org Restricted | CNR ExploRA


2010 Conference article Closed Access
Document similarity self-join with MapReduce
Lucchese C., Baraglia R., De Francisci Morales G.
Given a collection of objects, the Similarity Self-Join problem requires to discover all those pairs of objects whose similarity is above a user defined threshold. In this paper we focus on document collections, which are characterized by a sparseness that allows effective pruning strategies. Our contribution is a new parallel algorithm within the MapReduce framework. This work borrows from the state of the art in serial algorithms for similarity join and MapReduce-based techniques for set-similarity join. The proposed algorithm shows that it is possible to leverage a distributed file system to support communication patterns that do not naturally fit the MapReduce framework. Scalability is achieved by introducing a partitioning strategy able to overcome memory bottlenecks. Experimental evidence on real world data shows that our algorithm outperforms the state of the art by a factor 4.5.Source: IEEE International Conference on Data Mining, pp. 731–736, Sydney, December 14-17 2010
DOI: 10.1109/icdm.2010.70
Metrics:


See at: doi.org Restricted | CNR ExploRA


2010 Other Unknown
POR FESR 2007-2013 VISITO Tuscany project: VIsual Support to Interactive TOurism in Tuscany
Lucchese C., Venturini R., Dazzi P., Ferrini R., Perego R., Baraglia R., Tonellotto N., Versienti L.
The project aims to create an infrastructure providing a user-centric fruition of the artistic and cultural heritage present in the art cities of Tuscany. The system that has to be realized will be able to manage in an integrated manner both historical-artistic information and other visitors’ interest requests. In particular, the project aims to develop advanced technologies to manage photographic materials which deal with the cultural heritage present in the Tuscan cities of art in order to realize a personalized electronic guide designed to provide better use and access to our artistic heritage. The goal is to offer novel immersive touristic services through both the new generation of mobile devices and the Internet. Pilot tests are planned to be conducted in the cities of Florence, Pisa and San Gimignano in order to test the developed system.

See at: CNR ExploRA


2010 Journal article Closed Access
Mining@home: towards a public-resource computing framework for distributed data mining
Lucchese C., Mastroianni C., Orlando S., Talia D.
Several classes of scientific and commercial applications require the execution of a large number of independent tasks. One highly successful and low-cost mechanism for acquiring the necessary computing power for these applications is the 'public-resource computing', or 'desktop Grid' paradigm, which exploits the computational power of private computers. So far, this paradigm has not been applied to data mining applications for two main reasons. First, it is not straightforward to decompose a data mining algorithm into truly independent sub-tasks. Second, the large volume of the involved data makes it difficult to handle the communication costs of a parallel paradigm. This paper introduces a general framework for distributed data mining applications called Mining@home. In particular, we focus on one of the main data mining problems: the extraction of closed frequent itemsets from transactional databases. We show that it is possible to decompose this problem into independent tasks, which however need to share a large volume of the data. We thus introduce a data-intensive computing network, which adopts a P2P topology based on super peers with caching capabilities, aiming to support the dissemination of large amounts of information. Finally, we evaluate the execution of a pattern extraction task on such network.Source: Concurrency and computation (Online) 22 (2010): 658–682. doi:10.1002/cpe.1545
DOI: 10.1002/cpe.1545
Project(s): S-CUBE via OpenAIRE
Metrics:


See at: Concurrency and Computation Practice and Experience Restricted | onlinelibrary.wiley.com Restricted | CNR ExploRA


2010 Journal article Open Access OPEN
Building a web-scale image similarity search system
Batko M., Falchi F., Lucchese C., Novak D., Perego R., Rabitti F., Sedmidubsky J., Zezula P.
As the number of digital images is growing fast and Content-based Image Retrieval (CBIR) is gaining in popularity, CBIR systems should leap towards Web- scale datasets. In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images. The first big challenge we have been facing was obtaining a collection of images of this scale with the corresponding descriptive features. We have tackled the non-trivial process of image crawling and extraction of several MPEG-7 descriptors. The result of this effort is a test collection, the first of such scale, opened to the research community for experiments and comparisons. The second challenge was to develop indexing and searching mechanisms able to scale to the target size and to answer similarity queries in real-time. We have achieved this goal by creating sophisticated centralized and distributed structures based purely on the metric space model of data. We have joined them together which has resulted in an extremely flexible and scalable solution. In this paper, we study in detail the performance of this technology and its evolvement as the data volume grows by three orders of magnitude. The results of the experiments are very encouraging and promising for future applications.Source: Multimedia tools and applications 47 (2010): 599–629. doi:10.1007/s11042-009-0339-z
DOI: 10.1007/s11042-009-0339-z
Metrics:


See at: ISTI Repository Open Access | Multimedia Tools and Applications Restricted | www.springerlink.com Restricted | CNR ExploRA


2010 Contribution to book Restricted
Preserving privacy in Web recommender systems
Perego R., Baraglia R., Lucchese C., Orlando S., Silvestri F.
The rapid growth of the Web has led to the development of new solu- tions in the Web recommender or personalization domain, aimed to assist users in satisfying their information needs. The main goal of this chapter is to survey some of the recommender system proposals appeared in the literature, and to evaluate these pro- posals from the point of view of privacy preservation. Then, as an ex- ample of privacy-preserving approach for recommendations, we present ?SUGGEST, a privacy-enhanced system that allows for creating serendip- ity recommendations without breaching users privacy. ?SUGGEST helps users to navigate though a Web site, by providing dynamically generated links to relevant pages that have not yet been visited. The knowledge base on which the model used for making recommendations is built, is incrementally updated without tracking user sessions. This feature is par- ticularly important when users do not trust the system, and do not want disclose their complete activity records or preferences. In this case, users may adopt techniques that avoid server-based session reconstruction, and that do not worsen the accuracy of the model extracted by ?SUGGEST. As an additional contribution, we show that ?SUGGEST does not allow malicious users to track or detect users activity or preferences.Source: Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques, edited by Francesco Bonchi, Yahoo! Research, Barcelona, Spain; Elena Ferrari, University of Insubria, Italy, pp. 369–389. London: CRC Press - Taylor & Francis Group, 2010

See at: www.crcpress.com Restricted | CNR ExploRA