67 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2007 Conference article Unknown
Multilingual Search for Cultural Heritage Archives via Combining Multiple Translation Resource
Jones G., Zhang Y., Newman E., Fantino F., Debole F.
The linguistic features of material in Cultural Heritage (CH) archives may be in various languages requiring a facility for effective multilingual search. The specialised language often associated with CH content introduces problems for automatic translation to support search applications. The MultiMatch project is focused on enabling users to interact with CH content across different media types and languages. We present results from a MultiMatch study exploring various translation techniques for the CH domain. Our experiments examine translation techniques for the English language CLEF 2006 Cross-Language Speech Retrieval (CL-SR) task using Spanish, French and German queries. Results compare effectiveness of our query translation against a monolingual baseline and show improvement when combining a domain-speci c translation lexicon with a standard machine translation system.Source: ACL 2007 Workshop on Language Technology for Cultural Heritage Data, pp. 81–88, Praga, Czech Republic, 28 giugno 2007

See at: CNR ExploRA

2008 Conference article Open Access OPEN
The MultiMatch project: multilingual/multimedia access to cultural heritage on the Web
Marlow J., Clough P., Ireson N., Cigarrán Recuero J., Artiles J., Debole F.
The EU-funded MultiMatch project aims to overcome language barriers, and media and distribution problems currently affecting access to on-line cultural heritage material. Partners are developing a vertical search engine able to harvest heterogeneous information from distributed sources and present it in a synthesized manner. To design such a system, user requirements were initially gathered and then translated into specific design features to ensure that the search engine developed was consistent with user needs. This paper presents these user requirements, the initial design of the MultiMatch system, and technical discussion of the system architecture and components used to turn these design implications into a working interactive prototype. Following this, we discuss user evaluation and present results from an initial user study. These are being used, in addition to other input, to drive the functionality and design of the final system.Source: Museums and the Web 2008- International conference for culture and heritage on line, Montreal, Canada, 9-12 Aprile 2008

See at: www.archimuse.com Open Access | CNR ExploRA

2019 Other Unknown
SEBD 2019 web site
Debole F.
Sito web per la conferenza SEBD 2019 27esima (SEBD - Sistemi Evoluti per Basi di Dati)

See at: CNR ExploRA | sebd2019.isti.cnr.it

2005 Journal article Open Access OPEN
An Analysis of the relative hardness of reuters-21578 subsets
Debole F., Sebastiani F.
The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years.However , the benefits that this has brought about have somehow been limited by the fact that different researchers have 'carved' different subsets out of this collection and tested their systems on one of these subsets only; systems that have been tested on different Reuters-21578 subsets are thus not readily comparable.In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers.The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested on these different subsets.Source: Journal of the American Society for Information Science and Technology (Print) 56 (2005): 584–596. doi:10.1002/asi.20147
DOI: 10.1002/asi.20147

See at: Journal of the American Society for Information Science and Technology Open Access | Journal of the American Society for Information Science and Technology Restricted | onlinelibrary.wiley.com Restricted | CNR ExploRA

2006 Conference article Open Access OPEN
The DELOS testbed for choosing a digital preservation strategy
Strodl S., Rauber A., Rauch C., Hofman H., Debole F., Amato G.
With the rapid technological changes, digital preservation, i.e. the endeavor to provide long-term access to digital objects, is turning into one of the most pressing challenges to ensure the survival of our digital artefacts. A set of strategies has been proposed, with a range of tools supporting parts of digital preservation actions. Yet, with requirements on which strategy to follow and which tools to employ being different for each setting, depending e.g. on object characteristics or institutional requirements, deciding which solution to implement has turned into a crucial decision. This paper presents the DELOS Digital Preservation Testbed. It provides an approach to make informed and accountable decisions on which solution to implement in order to preserve digital objects for a given purpose. It is based on Utility Analysis to evaluate the performance of various solutions against well-defined objectives, and facilitates repeatable experiments in a standardized laboratory setting.Source: ICADL 2006 - 9th International Conference on Asian Digital Libraries, pp. 323–332, Kyoto, Japan, 27-30/11/2006
DOI: 10.1007/11931584_35

See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com Restricted | CNR ExploRA

2007 Journal article Restricted
Evaluating preservation strategies for electronic theses and dissertations
Strodl S., Becker C., Neumayer R., Rauber A., Nicchiarelli Bettelli E., Kaiser M., Hofman H., Neuroth H., Strathmann S., Debole F., Amato G.
Digital preservation has turned into a pressing challenge for institutions having the obligation to preserve digital objects over years. A range of tools exist today to support the variety of preservation strategies such as migration or emulation. Yet, di®erent preservation requirements across institutions and settings make the decision on which solution to implement very di±cult. The Austrian National Library will have to preserve electronic theses and dissertations provided as PDF ¯les and is thus investigating potential preservation solutions. The DELOS Digital Preservation Testbed is used to evaluate various alternatives with respect to speci¯c requirements. It provides an approach to make informed and accountable decisions on which solution to implement in order to preserve digital objects for a given purpose.We analyse the performance of various preservation strategies with respect to the speci¯ed requirements for the preservation of master theses and present the results.Source: Lecture notes in computer science 4877 (2007): 238–247. doi:10.1007/978-3-540-77088-6_23
DOI: 10.1007/978-3-540-77088-6_23

See at: doi.org Restricted | www.springerlink.com Restricted | CNR ExploRA

2004 Journal article Open Access OPEN
Supervised term weighting for automated text categorization
Debole F., Sebastiani F.
Researchers from ISTI-CNR, Pisa, aim at producing better text classification methods through the use of supervised learning techniques in the generation of the internal representations of the textsSource: ERCIM news 56 (2004): 55–56.

See at: www.ercim.org Open Access | CNR ExploRA

2003 Conference article Unknown
Supervised term weighting for automated text categorization
Debole F., Sebastiani F.
The construction of a text classi.er usually involves (i) a phase of term selection, in which the most relevant terms for the classi.cation task are identi.ed, (ii) a phase of term weighting, in which document weights for the selected terms are computed, and (iii) a phase of classi.er learning, in which a classi.er is generated from the weighted representations of the training documents. This process involves an activity of supervised learning, in which information on the membership of training documents in categories is used. Traditionally, supervised learning enters only phases (i) and (iii). In this paper we propose instead that learning from training data should also a.ect phase (ii), i.e. that information on the membership of training documents to categories be used to determine term weights. We call this idea supervised term weighting (STW). As an example, we propose a number of "supervised variants" of tfidf weighting, obtained by replacing the idf function with the function that has been used in phase (i) for term selection. We present experimental results obtained on the standard Reuters-21578 benchmark with one classi.er learning method (support vector machines), three term selection functions (information gain, chi-square, and gain ratio), and both local and global term selection and weighting.Source: SAC-03, 18th ACM Symposium on Applied Computing, pp. 784–788, Melbourne, US, March 9-12, 2003

See at: CNR ExploRA

2004 Conference article Open Access OPEN
An analysis of the relative hardness of reuters-21578 subsets
Debole F., Sebastiani F.
The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, since they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last ten years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have 'carved' different subsets out of this collection, and tested their systems on one of these subsets only; systems that have been tested on different Reuters-21578 subsets are thus not readily comparable. In this paper we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative difficulty of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested on these different subsets.Source: LREC-04 - 4th International Conference on Language Resources and Evaluation, pp. 971–974, Lisbon, Portugal, 26-28 May 2004

See at: ISTI Repository Open Access | www.lrec-conf.org Open Access | CNR ExploRA

2004 Contribution to book Restricted
Supervised term weighting for automated text categorization
Debole F., Sebastiani F.
The construction of a text classifier usually involves (i) a phase of term selection, in which the most relevant terms for the classification task are identified, (ii) a phase of term weighting, in which document weights for the selected terms are computed, and (iii) a phase of classifier learning, in which a classifier is generated from the weighted representations of the training documents. This process involves an activity of supervised learning, in which information on the membership of training documents in categories is used. Traditionally, supervised learning enters only phases (i) and (iii). In this paper we propose instead that learning from the training data should also affect phase (ii), i.e. that information on the membership of training documents to categories be used to determine term weights. We call this idea supervised term weighting (STW). As an example of STW, we propose a number of supervised variants of tfidf weighting, obtained by replacing the idf function with the function that has been used in phase (i) for term selection. The use of STW allows the terms that are distributed most differently in the positive and negative examples of the categories of interest to be weighted highest. We present experimental results obtained on the standard Reuters-21578 benchmark with three classifier learning methods (Rocchio, k-NN, and support vector machines), three term selection functions (information gain, chi-square, and gain ratio), and both local and global term selection and weighting.Source: Text Mining and its Applications, edited by Spiros Sirmakessis, pp. 81–97. Heidelberg: Physica Verlag, 2004

See at: www.isti.cnr.it Restricted | CNR ExploRA

2009 Conference article Unknown
Searching and browsing film archives. The European Film Gateway Approach
Debole F., Savino P., Eckes G.
Metadata describing items in European film archives are very different so that it is difficult to have a uniform access to videos coming from many different archives. These and other relevant issues regarding interoperability among different archives are addressed within the EFG (European Film Gateway) Best Practices Network funded by the European Commission, which aims at enabling Europe's Film Archives and cinématèques to contribute their rich and valuable collections to the EUROPEANA digital library.Source: 4th International Congress on Science and Technology on the Safeguard of Cultural Heritage in the Mediterranean Basin, pp. 359–364, Cairo, Egypt, 6-8 December, 2009
Project(s): EFG1914

See at: CNR ExploRA

2003 Report Open Access OPEN
An Analysis of the Relative Hardness of Reuters-21578 Subsets
Debole F., Sebastiani F.
The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, since they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last ten years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have 'carved' different subsets out of this collection, and tested their systems on one of these subsets only; systems that have been tested on different Reuters-21578 subsets are thus not readily comparable. In this paper we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested on these different subsets.Source: ISTI Technical reports, 2003

See at: ISTI Repository Open Access | CNR ExploRA

2005 Conference article Open Access OPEN
A native XML database supporting approximate match search
Amato G., Debole F.
XML is becoming the standard representation format for metadata. Metadata for multimedia documents, as for instance MPEG-7, require approximate match search functionalities to be supported in addition to exact match search. As an example, consider image search performed by usingMPEG-7 visual descriptors. It does not make sense to search for images that are exactly equal to a query image. Rather, images similar to a query image are more likely to be searched. We present the architecture of an XML search engine where special techniques are used to integrate approximate and exact match search functionalities.Source: ECDL 2005 - 9th European Conference on Research and Advanced Technology for Digital Libraries, pp. 69–80, Vienna, Austria, 18-23/09/2005
DOI: 10.1007/11551362_7

See at: ISTI Repository Open Access | www.nmis.isti.cnr.it Open Access | doi.org Restricted | link.springer.com Restricted | CNR ExploRA

2017 Journal article Open Access OPEN
Mapping the ARIADNE catalogue data model to CIDOC CRM: Bridging resource discovery and item-level access
Aloia N., Debole F., Felicetti A., Galluccio I., Theodoridou M.
ARIADNE is a European project aiming to integrate existing archaeological research infrastructures, services and distributed datasets, and to develop new technologies and tools to improve archaeological research methodology. The ARIADNE registry contains information about resources available among the various partners of the project and the metadata repository, which contains item level information of these resources. In order to provide an advanced discovery mechanism combining both item level and registry level information we propose a mapping from the ARIADNE Catalog Data Model, the model of the ARIADNE registry, to the CIDOC CRM, the underlying model of the metadata repository. The paper will present the requirements that led to the choice of different models for the registry and the metadata repository, will elaborate on the mapping, and will propose an integrated interface for information discovery and presentation.Source: SCIRES-IT (Roma) 7 (2017): 1–8. doi:10.2423/i22394303v7n1p1
DOI: 10.2423/i22394303v7n1p1

See at: ISTI Repository Open Access | www.sciresit.it Open Access | CNR ExploRA

2003 Conference article Open Access OPEN
Tree Signatures for XML Querying and Navigation
Zezula P., Amato G., Debole F., Rabitti F.
In order to accelerate execution of various matching and navigation operations on collections of XML documents, new indexing structure, based on tree signatures, is proposed. We show that XML tree structures can be efficiently represented as ordered sequences of preorder and postorder ranks, on which extended string matching techniques can easily solve the tree matching problem. We also show how to apply tree signatures in query processing and demonstrate that a speedup of up to one order of magnitude can be achieved over the containment join strategy. Other alternatives of using the tree signatures in intelligent XML searching are outlined in the conclusionsSource: Xsym 2003, pp. 149–163, Berlin, Germany, 8 September 2003
DOI: 10.1007/978-3-540-39429-7_10

See at: www.nmis.isti.cnr.it Open Access | doi.org Restricted | link.springer.com Restricted | www.scopus.com Restricted | CNR ExploRA

2003 Conference article Open Access OPEN
YAPI: Yet Another Path Index for XML searching
Amato G., Debole F., Zezula P., Rabitti F.
As many metadata are encoded in XML, and many digital libraries need to manage XML documents, efficient techniques for searching in such formatted data are required. In order to efficiently process path expressions with wildcards on XML data, a new path index is proposed. Extensive evaluation confirms better performance with respect to other techniques proposed in the literature. An extension of the proposed technique to deal with the content of XML documents in addition to their structure is also discussed.Source: ECDL 03 - 7th European Conference on Research and Advanced Technology for Digital Libraries, pp. 176–187, Trondheim, Norway, 2-17/08/2003
DOI: 10.1007/978-3-540-45175-4_17

See at: ISTI Repository Open Access | www.nmis.isti.cnr.it Open Access | doi.org Restricted | link.springer.com Restricted | www.scopus.com Restricted | CNR ExploRA

2008 Journal article Restricted
The MultiMatch prototype: multilingual/multimedia search for cultural heritage objects
Amato G., Debole F., Peters C., Savino P.
MultiMatch is a 30 month targeted research project under the Sixth Framework Programme, supported by the unit for Content, Learning and Cultural Heritage (Digicult) of the Information Society DG. MultiMatch is developing a multimedia/multilingual search engine designed specifically for the access, organization and personalized presentation of cultural heritage information. The demonstration will present the MultiMatch system prototype.Source: Lecture notes in computer science 5173 (2008): 385–387. doi:10.1007/978-3-540-87599-4_40
DOI: 10.1007/978-3-540-87599-4_40

See at: doi.org Restricted | www.springerlink.com Restricted | CNR ExploRA

2003 Conference article Unknown
A path index for efficient XML path expression processing
Amato G., Debole F., Zezula P., Rabitti F.
XML is a de fact standard for data representation and exchange on the Internet, therefore storing and querying XML repositories has become an important issue. Several XML query languages are based on the use path expressions containing optional wildcards. This poses a new problem, given that traditional query processing approaches have been proven not to be efficient in this case. We proposed a new path index to efficiently process path expressions with wildcards on XML data. Extensive evaluation confirms better performance with respect to other techniques proposed in the literature. An extension of the proposed technique to deal with the content of XML documents in addition to their structure is also discussed.Source: SEBD 2003, pp. 21–28, Cetraro, June 24-27, 2003

See at: CNR ExploRA

2007 Other Unknown
MultiMatch - Multilingual/Multimedia Access to Cultural Heritage - EU IST FP6 Project
Amato G., Peters C. A., Savino P., Debole F.
The aim of the MultiMatch project is to enable users to explore and interact with online internet-accessible CH content, across media types and language boundaries, in ways that do justice to the multitude of existing perspectives. This will be achieved through the development of a search engine targeted for the access, organisation and personalized presentation of cultural heritage information.

See at: CNR ExploRA

2009 Conference article Unknown
MultiMatch - Multilingual / Multimedia Access to Cultural Heritage
Amato G., Debole F., Peters C., Savino P.
Our shared cultural heritage (CH) is an essential part of our European identity, transcending cultural and language barriers. The aim of the MultiMatch project is to enable users to explore and interact with online internet-accessible CH content, across media types and language boundaries, in ways that do justice to the multitude of existing perspec- tives. This has been achieved through the development of a search engine targeted for the access, organisation and personalized presentation of cul- tural heritage information. MultiMatch aims at complex, heterogeneous digital object retrieval and presentation.Source: 5th Italian Research Conference on Digital Libraries, pp. 162–165, Padova, Italy, 29-30 January 2009

See at: CNR ExploRA