170 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2010 Conference article Unknown
PP-Index: using permutation prefixes for efficient and scalable similarity search (Extended Abstract)
Esuli A.
The Permutation Prefix Index (PP-Index) is a data structure that allows to perform efficient approximate similarity search. It is a permutation-based index, which is based on representing any indexed object with "its view of the surrounding world", i.e., a list of the elements of a set of reference objects sorted by their distance order with respect to the indexed object. In its basic formulation, the PP-Index is biased toward efficiency. We show how the effectiveness can reach optimal levels just by adopting two "boosting" strategies: multiple index search and multiple query search, which both have nice parallelization properties. We study both the efficiency and the effectiveness properties of the PP-Index, experimenting with collections of sizes up to one hundred million objects, represented in a very high-dimensional similarity space.Source: 18th Italian Symposium on Advanced Database Systems, pp. 318–325, Rimini, Italy, 20-23 June 2010

See at: CNR ExploRA


2009 Conference article Unknown
PP-Index: using permutation prefixes for efficient and scalable approximate similarity search
Esuli A.
We present the Permutation Prefix Index (PP-Index), an index data structure that allows to perform efficient approximate similarity search. The PP-Index belongs to the family of the permutation-based indexes, which are based on representing any indexed object with "its view of the surrounding world", i.e., a list of the elements of a set of reference objects sorted by their distance order with respect to the indexed object. In its basic formulation, the PP-Index is strongly biased toward efficiency, treating effectiveness as a secondary aspect. We show how the effectiveness can easily reach optimal levels just by adopting two "boosting" strategies: multiple index search and multiple query search. Such strategies have nice parallelization properties that allow to distribute the search process in order to keep high efficiency levels. We study both the efficiency and the effectiveness properties of the PP-Index. We report experiments on collections of sizes up to one hundred million images, represented in a very high-dimensional similarity space based on the combination of ve MPEG-7 visual descriptors.Source: 7th Workshop on Large-Scale Distributed Systems for Information Retrieval, pp. 17–24, Boston, USA, 23 luglio 2009

See at: CNR ExploRA


2008 Contribution to conference Open Access OPEN
Annotating WordNet synsets by sentiment-related information: issues and potential solutions
Esuli A.
Many works in sentiment analysis have focused on the problem of subjectivity detection, at various levels: from terms (or term senses), as in the automatic annotation of lexical resources, to fragments of text, as in opinion extraction, to entire documents, as in sentiment classification. At all these levels, the two dimensions that have been investigated more actively are polarity ("positive/negative") and force ("strong/mild/weak" expression of positivity or negativity). In the SentiWordNet project we made a first attempt at automatically adding information concerning these two dimensions to WordNet. In another, more recent research we have explored a further dimension of subjective language, i.e, attitude type, which distinguishes, for example, between moral appreciation ("honest") and aesthetic appreciation ("beautiful"). We think that endowing WordNet with annotations pertaining to these three dimensions (polarity + force + attitude type) would make WordNet an even more invaluable resource for sentiment analysis. Adding this information to WordNet would not be an easy task, for at least two reasons. One is the sheer size of the resource; this might call, at least initially, for a semi-automatic approach, on the line of the SentiWordnet or of the "WordNet Evocation" projects. The other is the choice of the taxonomy of sentiment types, which needs to compromise between conceptual subtlety and real-world applicability. For our recent work on attitude type we have adopted a taxonomy of attitude types originally defined in Martin and White's Appraisal Theory; however, other potentially interesting alternatives have been developed, e.g. in the EU-funded Simple project. However, we conjecture that even this three-dimensional specification of the sentiment-related properties of synsets might not be sufficient for application purposes, at least for some parts of speech. For example, it is conceivable that a verb's polarity should not be characterized as positive or negative tout court, but that a distinction should be made as to which semantic role of the verb such polarity is bestowed upon. For instance, the verbs "torture" and "discard" both have a negative slant; however, while "torture" casts a negative character on the subject of the action (and on the action itself), "discard" typically casts a negative character on the direct object of the action. Such distinctions should be accounted for in a lexicon, especially in order to make it useful for opinion extraction applications.Source: Fourth Global WordNet Conference, Szeged, Hungary, 22-25 gennaio 2008

See at: ISTI Repository Open Access | www.inf.u-szeged.hu Open Access | CNR ExploRA


2008 Contribution to journal Open Access OPEN
Automatic generation of lexical resources for opinion mining: models, algorithms and applications
Esuli A.
Opinion mining is a recent discipline at the crossroads of Information Retrieval and of Computational Linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. It has a rich set of applications, ranging from tracking users' opinions about products or about political candidates as expressed in online forums, to customer relationship management. Functional to the extraction of opinions from text is the determination of the relevant entities of the language that are used to express opinions, and their opinion-related properties. For example, determining that the term beautiful casts a positive connotation to its subject. In this thesis we investigate on the automatic recognition of opinion-related properties of terms. This results into building opinion-related lexical resources, which can be used into opinion mining applications.Source: SIGIR forum 42 (2008): 105–106.

See at: www.sigir.org Open Access | CNR ExploRA


2009 Conference article Restricted
MiPai: using the PP-Index to build an efficient and scalable similarity search system
Esuli A.
MiPai is an image search system that provides visual similarity search and text-based search functionalities. The similarity search functionality is implemented by means of the Permutation Prefix Index (PP-Index), a novel data structure for approximate similarity search. The text-based search functionality is based on a traditional inverted list index data structure. MiPai also provides a combined visual similarity/text search function.Source: Second International Workshop on Similarity Search and Applications, pp. 146–148, Prague, Czech Republic, 29-30 Agosto 2009
DOI: 10.1109/sisap.2009.14
Metrics:


See at: doi.org Restricted | ieeexplore.ieee.org Restricted | CNR ExploRA


2010 Software Unknown
MP-Boost++
Esuli A.
MPBoost++ is a C++ implementation of MPBoost a variant of the multi-label AdaBoost.MH algorithm that improves its efficacy and efficiency by performing a multiple pivot selection at each boosting iteration.

See at: CNR ExploRA | www.esuli.it


2008 Doctoral thesis Open Access OPEN
Automatic generation of lexical resources for opinion mining: models, algorithms and applications
Esuli A.
Opinion mining is a recent discipline at the crossroads of Information Retrieval and of Computational Linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. It has a rich set of applications, ranging from tracking users' opinions about products or about political candidates as expressed in online forums, to customer relationship management. Functional to the extraction of opinions from text is the determination of the relevant entities of the language that are used to express opinions, and their opinion-related properties. For example, determining that the term beautiful casts a positive connotation to its subject. In this thesis we investigate on the automatic recognition of opinion-related properties of terms. This results into building opinion-related lexical resources, which can be used into opinion mining applications. We start from the (relatively) simple problem of determining the orientation of subjective terms. We propose an original semi-supervised term classification model that is based on the quantitative analysis of the glosses of such terms, i.e. the definitions that these terms are given in on-line dictionaries. This method outperforms all known methods when tested on the recognized standard benchmarks for this task. We show how our method is capable to produce good results on more complex tasks, such as discriminating subjective terms (e.g., good) from objective ones (e.g., green), or classifying terms on a fine-grained attitude taxonomy. We then propose a relevant refinement of the task, i.e., distinguishing the opinion-related properties of distinct term senses. We present SentiWordNet, a novel high-quality, high-coverage lexical resource, where each one of the 115,424 senses contained in WordNet has been automatically evaluated on the three dimensions of positivity, negativity, and objectivity. We propose also an original and effective use of random-walk models to rank term senses by their positivity or negativity. The random-walk algorithms we present have a great application potential also outside the opinion mining area, for example in word sense disambiguation tasks. A result of this experience is the generation of an improved version of SentiWordNet. We finally evaluate and compare the various versions of SentiWordNet we present here with other opinion-related lexical resources well-known in literature, experimenting their use in an Opinion Extraction application. We show that the use of SentiWordNet produces a significant improvement with respect to the baseline system, not using any specialized lexical resource, and also with respect to the use of other opinion-related lexical resources

See at: etd.adm.unipi.it Open Access | CNR ExploRA


2010 Report Unknown
Use of permutation prefixes for efficient and scalable approximate similarity search
Esuli A.
We present the Permutation Prefix Index (PP-Index), an index data structure that allows to perform efficient approximate similarity search. The PP-Index belongs to the family of the permutation-based indexes, which are based on representing any indexed object with ``its view of the surrounding world'', i.e., a list of the elements of a set of reference objects sorted by their distance order with respect to the indexed object. In its basic formulation, the PP-Index is strongly biased toward efficiency. We show how the effectiveness can easily reach optimal levels just by adopting two ``boosting'' strategies: multiple index search and multiple query search, which both have nice parallelization properties. We study both the efficiency and the effectiveness properties of the PP-Index, experimenting with collections of sizes up to one hundred million objects, represented in a very high-dimensional similarity space.Source: ISTI Technical reports, 2010

See at: CNR ExploRA


2012 Journal article Open Access OPEN
Use of permutation prefixes for efficient and scalable approximate similarity search
Esuli A.
We present the Permutation Prefix Index (this work is a revised and extended version of Esuli (2009b), presented at the 2009 LSDS-IR Workshop, held in Boston) (PP-Index), an index data structure that supports efficient approximate similarity search. The PP-Index belongs to the family of the permutation-based indexes, which are based on representing any indexed object with "its view of the surrounding world", i.e., a list of the elements of a set of reference objects sorted by their distance order with respect to the indexed object. In its basic formulation, the PP-Index is strongly biased toward efficiency. We show how the effectiveness can easily reach optimal levels just by adopting two "boosting" strategies: multiple index search and multiple query search, which both have nice parallelization properties. We study both the efficiency and the effectiveness properties of the PP-Index, experimenting with collections of sizes up to one hundred million objects, represented in a very high-dimensional similarity space.Source: Information processing & management 48 (2012): 889–902. doi:10.1016/j.ipm.2010.11.011
DOI: 10.1016/j.ipm.2010.11.011
Metrics:


See at: ISTI Repository Open Access | Information Processing & Management Restricted | www.sciencedirect.com Restricted | CNR ExploRA


2013 Report Open Access OPEN
The User Feedback on SentiWordNet
Esuli A.
With the release of SentiWordNet 3.0 the related Web interface has been restyled and improved in order to allow users to submit feedback on the SentiWordNet entries, in the form of the suggestion of alternative triplets of values for an entry. This paper reports on the release of the user feedback collected so far and on the plans for the future.Source: ISTI Technical reports, 2013

See at: ISTI Repository Open Access | swn.isti.cnr.it Open Access | CNR ExploRA


2014 Software Unknown
MiPai
Esuli A.
This is the repository for the MiPai project, which provides a reference implementation of the Permutation Prefix Index (PP-Index), along with index and search example programs for various data types.

See at: CNR ExploRA


2014 Software Unknown
TreeBoost
Esuli A.
TreeBoost is a Java implementation of TreeBoost.MH a variant of the multi-label AdaBoost.MH algorithm that exploit the hierarchical relation among categories to improve both the efficacy and efficiency of the classifier.

See at: CNR ExploRA


2015 Journal article Open Access OPEN
Optimizing text quantifiers for multivariate loss functions.
Esuli A., Sebastiani F.
We address the problem of quantification, a supervised learning task whose goal is, given a class, to estimate the relative frequency (or prevalence) of the class in a dataset of unlabeled items. Quantification has several applications in data and text mining, such as estimating the prevalence of positive reviews in a set of reviews of a given product or estimating the prevalence of a given support issue in a dataset of transcripts of phone calls to tech support. So far, quantification has been addressed by learning a general-purpose classifier, counting the unlabeled items that have been assigned the class, and tuning the obtained counts according to some heuristics. In this article, we depart from the tradition of using general-purpose classifiers and use instead a supervised learning model for structured prediction, capable of generating classifiers directly optimized for the (multivariate and nonlinear) function used for evaluating quantification accuracy. The experiments that we have run on 5,500 binary high-dimensional datasets (averaging more than 14,000 documents each) show that this method is more accurate, more stable, and more efficient than existing state-of-the-art quantification methods.Source: ACM transactions on knowledge discovery from data 9 (2015). doi:10.1145/2700406
DOI: 10.1145/2700406
DOI: 10.48550/arxiv.1502.05491
Project(s): SoBigData via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | ACM Transactions on Knowledge Discovery from Data Open Access | ISTI Repository Open Access | dl.acm.org Restricted | ACM Transactions on Knowledge Discovery from Data Restricted | doi.org Restricted | CNR ExploRA


2016 Conference article Open Access OPEN
ISTI-CNR at SemEval-2016 Task 4: quantification on an ordinal scale
Esuli A.
This paper details on the participation of ISTI-CNR to task 4 of Semeval 2016. Among the five subtasks, special attention has been paid to the five-point scale quantification subtask. The quantification method we propose is based on the observation that a standard document-by-document regression method usually has a bias towards assigning high prevalence labels. Our method models such bias with a linear model, in order to compensate it and to produce the quantification estimates.Source: SemEval 2016 - 10th International Workshop on Semantic Evaluation, pp. 92–95, San Diego, USA, 16-17 June 2016
DOI: 10.18653/v1/s16-1011
Metrics:


See at: aclanthology.org Open Access | www.aclweb.org Open Access | doi.org Restricted | CNR ExploRA


2021 Software Unknown
TwiGet
Esuli A.
TwiGet is a python package for the management of the queries on filtered stream of the Twitter API, and the collection of tweets from it. It can be used as a command line tool (twiget-cli) or as a python class (TwiGet).Project(s): AI4Media via OpenAIRE

See at: github.com | CNR ExploRA


2022 Journal article Open Access OPEN
ICS: total freedom in manual text classification supported by unobtrusive machine learning
Esuli A.
We present the Interactive Classification System (ICS), a web-based application that supports the activity of manual text classification. The application uses machine learning to continuously fit automatic classification models that are in turn used to actively support its users with classification suggestions. The key requirement we have established for the development of ICS is to give its users total freedom of action: they can at any time modify any classification schema and any label assignment, possibly reusing any relevant information from previous activities. We investigate how this requirement challenges the typical scenarios faced in machine learning research, which instead give no active role to humans or place them into very constrained roles, e.g., on-demand labeling in active learning processes, and always assume some degree of batch processing of data. We satisfy the "total freedom" requirement by designing an unobtrusive machine learning model, i.e., the machine learning component of ICS as an unobtrusive observer of the users, that never interrupts them, continuously adapts and updates its models in response to their actions, and it is always available to perform automatic classifications. Our efficient implementation of the unobtrusive machine learning model combines various machine learning methods and technologies, such as hash-based feature mapping, random indexing, online learning, active learning, and asynchronous processing.Source: IEEE access 10 (2022): 64741–64760. doi:10.1109/ACCESS.2022.3184009
DOI: 10.1109/access.2022.3184009
Project(s): AI4Media via OpenAIRE, ARIADNEplus via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: IEEE Access Open Access | ieeexplore.ieee.org Open Access | ISTI Repository Open Access | ZENODO Open Access | CNR ExploRA


2023 Journal article Open Access OPEN
The interactive classification system
Esuli A.
ISTI-CNR released a new web application for the manual and automatic classification of documents. Human annotators collaboratively label documents with machine learning algorithms that learn from annotators' actions and support the activity with classification suggestions. The platform supports the early stages of document labelling, with the ability to change the classification scheme on the go and to reuse and adapt existing classifiers.Source: ERCIM news (2023): 34–35.
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: ercim-news.ercim.eu Open Access | ISTI Repository Open Access | CNR ExploRA


2009 Conference article Open Access OPEN
Active learning strategies for multi-label text classification
Esuli A., Sebastiani F.
Active learning refers to the task of devising a ranking function that, given a classifier trained from relatively few training examples, ranks a set of additional unlabeled examples in terms of how much further information they would carry, once manually labeled, for retraining a (hopefully) better classifier. Research on active learning in text classification has so far concentrated on single-label classification; active learning for multi-label classification, instead, has either been tackled in a simulated (and, we contend, non-realistic) way, or neglected tout court. In this paper we aim to fill this gap by examining a number of realistic strategies for tackling active learning for multi-label classification. Each such strategy consists of a rule for combining the outputs returned by the individual binary classifiers as a result of classifying a given unlabeled document. We present the results of extensive experiments in which we test these strategies on two standard text classification datasets.Source: ECIR'09 - 31st European Conference on Information Retrieval, pp. 102–113, Toulouse, France, 7-9/04/2009
DOI: 10.1007/978-3-642-00958-7_12
Metrics:


See at: nmis.isti.cnr.it Open Access | ISTI Repository Open Access | doi.org Restricted | link.springer.com Restricted | CNR ExploRA


2009 Journal article Open Access OPEN
Automatically determining attitude type and force for sentiment analysis
Argamon S., Bloom K., Esuli A., Sebastiani F.
Recent work in sentiment analysis has begun to apply fine-grained semantic distinctions between expressions of attitude as features for textual analysis. Such methods, however, require the construction of large and complex lexicons, giving values for multiple sentiment-related attributes to many different lexical items. For example, a key attribute is what type of attitude is expressed by a lexical item; e.g., beautiful expresses appreciation of an object's quality, while evil expresses a negative judgement of social behavior. In this paper we describe a method for the automatic determination of complex sentiment-related attributes such as attitude type and force, by applying supervised learning to WordNet glosses. Experimental results show that the method achieves good effectiveness, and is therefore well-suited to contexts in which these lexicons need to be generated from scratch.Source: Lecture notes in computer science 5603 (2009): 218–231. doi:10.1007/978-3-642-04235-5_19
DOI: 10.1007/978-3-642-04235-5_19
Metrics:


See at: nmis.isti.cnr.it Open Access | doi.org Restricted | www.springerlink.com Restricted | CNR ExploRA


2009 Conference article Unknown
Encoding ordinal features into binary features for text classification
Esuli A., Sebastiani F.
We propose a method by means of which supervised learning algorithms that only accept binary input can be extended to use ordinal (i.e., integer-valued) input. This is much needed in text classification, since it becomes thus possible to endow these learning devices with term frequency information, rather than just information on the presence/absence of the term in the document. We test two different learners based on ``boosting'', and show that the use of our method allows them to obtain effectiveness gains. We also show that one of these boosting methods, once endowed with the representations generated by our method, outperforms an SVM learner with tfidf-weighted input.Source: 31st European Conference on Information Retrieval - ECIR'09, pp. 771–775, Toulouse, FR, 7-9 April 2009

See at: CNR ExploRA