8 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2017 Journal article Open Access OPEN
Natural language requirements processing: a 4D vision
Ferrari A, Dell'Orletta F, Esuli A, Gervasi V, Gnesi S
Natural language processing (NLP) and requirements engineering (RE) have had a long relationship, yet their combined use isn't well established in industrial practice. This situation should soon change. The future evolution of the application of NLP technologies in RE can be viewed from four dimensions: discipline, dynamism, domain knowledge, and datasets.Source: IEEE SOFTWARE, vol. 34 (issue 6), pp. 28-35

See at: CNR IRIS Open Access | ieeexplore.ieee.org Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2013 Conference article Restricted
Mining commonalities and variabilities from natural language documents
Ferrari A, Spagnolo Go, Dell'Orletta F
A company who wishes to enter an established marked with a new, competitive product is required to analyse the product solutions of the competitors. Identifying and comparing the features provided by the other vendors might greatly help during the market analysis. However, mining common and variant features of from the publicly available documents of the competitors is a time consuming and error-prone task. In this paper, we suggest to employ a natural language processing approach based on textit{contrastive analysis} to identify commonalities and variabilities from the brochures of a group of vendors. We present a first step towards a practical application of the approach, in the the context of the market of Communications-Based Train Control (CBTC) systems.

See at: dl.acm.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2017 Dataset Metadata Only Access
T4SA: Twitter for Sentiment Analysis
Carrara F, Cimino A, Cresci S, Dell'Orletta F, Falchi F, Vadicamo L, Tesconi M
T4SA is intended for training and testing image sentiment analysis approaches. It contains little less than a million tweets, corresponding to about 1.5M images. We initially collected about 3.4M tweets corresponding to about 4M images. We classified the sentiment polarity of the texts (as described in Section 4) and we selected the tweets having the most confident textual sentiment predictions to build our Twitter for Sentiment Analysis (T4SA) dataset. The dataset is publicly available at: http://www.t4sa.it/

See at: CNR IRIS Restricted | www.t4sa.it Restricted


2014 Conference article Restricted
Measuring and improving the completeness of natural language requirements
Ferrari A, Dell'Orletta F, Spagnolo Go, Gnesi S
[Context and motivation] System requirements specifications are normally written in natural language. These documents are required to be complete with respect to the input documents of the requirements definition phase, such as preliminary specifications, transcripts of meetings with the customers, etc. In other terms, they shall include all the relevant concepts and all the relevant interactions among concepts expressed in the input documents. [Question/Problem] Means are required to measure and improve the completeness of the requirements with respect to the input documents. [Principal idea/results] To measure this completeness, we propose two metrics that take into account the relevant terms of the input documents, and the relevant relationships among terms. Furthermore, to improve the completeness, we present a natural language processing tool named Completeness Assistant for Requirements (CAR), which supports the definition of the requirements: the tool helps the requirements engineer in discovering relevant concepts and interactions. [Contribution] We have performed a pilot test with CAR, which shows that the tool can help improving the completeness of the requirements with respect to the input documents. The study has also shown that CAR is actually useful in the identification of specific/alternative system behaviours that might be overseen without the tool. © 2014 Springer International Publishing Switzerland.

See at: CNR IRIS Restricted | CNR IRIS Restricted | link.springer.com Restricted


2015 Conference article Open Access OPEN
CMT and FDE: tools to bridge the gap between natural language documents and feature diagrams
Ferrari A, Spagnolo G O, Gnesi S, Dell'Orletta F
A business subject who wishes to enter an established technological market is required to accurately analyse the features of the products of the different competitors. Such features are normally accessible through natural language (NL) brochures, or NL Web pages, which describe the products to potential customers. Building a feature model that hierarchically summarises the different features available in competing products can bring relevant benefits in market analysis. A company can easily visualise existing features, and reason about aspects that are not covered by the available solutions. However, designing a feature model starting from publicly available documents of existing products is a time consuming and error-prone task. In this paper, we present two tools, namely Commonality Mining Tool (CMT) and Feature Diagram Editor (FDE), which can jointly support the feature model definition process. CMT allows mining common and variant features from NL descriptions of existing products, by leveraging a natural language processing (NLP) approach based on contrastive analysis, which allows identifying domain-relevant terms from NL documents. FDE takes the commonalities and variabilities extracted by CMT, and renders them in a visual form. Moreover, FDE allows the graphical design and refinement of the final feature model, by means of an intuitive GUIProject(s): LEARN PAD via OpenAIRE

See at: dl.acm.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2017 Conference article Open Access OPEN
Cross-media learning for image sentiment analysis in the wild
Vadicamo L, Carrara F, Falchi F, Cimino A, Dell'Orletta F, Cresci S, Tesconi M
Much progress has been made in the field of sentiment analysis in the past years. Researchers relied on textual data for this task, while only recently they have started investigating approaches to predict sentiments from multimedia content. With the increasing amount of data shared on social media, there is also a rapidly growing interest in approaches that work "in the wild", i.e. that are able to deal with uncontrolled conditions. In this work, we faced the challenge of training a visual sentiment classifier starting from a large set of user-generated and unlabeled contents. In particular, we collected more than 3 million tweets containing both text and images, and we leveraged on the sentiment polarity of the textual contents to train a visual sentiment classifier. To the best of our knowledge, this is the first time that a cross-media learning approach is proposed and tested in this context. We assessed the validity of our model by conducting comparative studies and evaluations on a benchmark for visual sentiment analysis. Our empirical study shows that although the text associated to each image is often noisy and weakly correlated with the image content, it can be profitably exploited to train a deep Convolutional Neural Network that effectively predicts the sentiment polarity of previously unseen images.

See at: CNR IRIS Open Access | ieeexplore.ieee.org Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
AI "news" content farms are easy to make and hard to detect: a case study in Italian
Puccetti G., Rogers A., Alzetta C., Dell'Orletta F., Esuli A.
Large Language Models (LLMs) are increasingly used as 'content farm' models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. We show that fine-tuning Llama (v1), mostly trained on English, on as little as 40K Italian news articles, is sufficient for producing news-like texts that native speakers of Italian struggle to identify as synthetic. We investigate three LLMs and three methods of detecting synthetic texts (log-likelihood, DetectGPT, and supervised classification), finding that they all perform better than human raters, but they are all impractical in the real world (requiring either access to token likelihood information or a large dataset of CFM texts). We also explore the possibility of creating a proxy CFM: an LLM fine-tuned on a similar dataset to one used by the real 'content farm'. We find that even a small amount of fine-tuning data suffices for creating a successful detector, but we need to know which base LLM is used, which is a major challenge. Our results suggest that there are currently no practical methods for detecting synthetic newslike texts ‘in the wild’, while generating them is too easy. We highlight the urgency of more NLP research on this problem.

See at: aclanthology.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2020 Conference article Open Access OPEN
Using NLP to support terminology extraction and domain scoping: report on the H2020 DESIRA project
Bacco Fm, Brunori G, Dell'Orletta F, Ferrari A
The ongoing phenomenon of digitisation is changing social and work life, with tangible effects on the socio-economic context. Understanding the impact, opportunities, and threats of digital transformation requires the identication of viewpoints from a large diversity of stakeholders, from policy makers to domain experts, and from engineers to common citizens. The DESIRA (Digitisation: Economic and Social Impacts in Rural Areas) EU H2020 project1 considers rural areas, with a strong focus on agricultural and forestry activities, and aims at assessing the impact of digital technologies in those domains by involving a large number of stakeholders, all across Europe, around 20 focal questions. Given the involvement of stakeholders with diverse background and skills, a primary goal of the project is to develop domain-specic and interactive reference taxonomies (i.e., structured classications of terms) to facilitate common understanding of technologies in use in each domain at today. The taxonomies, which aims at easing the learning of the meaning of technical and domain-specic terms, are going to be exploited by the stakeholders in 20 Living Labs built around the focal questions. This report paper focuses on the semi-automatic development of the taxonomies through natural language processing (NLP) techniques based on context-specic term extraction. Furthermore, we crawl Wikipedia to enrich the taxonomies with additional categories and denitions. We plan to validate the taxonomies through fieeld studies within the Living Labs.Project(s): DESIRA via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted