2017
Conference article  Restricted

Towards a dataset for natural language requirements processing

Ferrari A., Spagnolo G. O., Gnesi S.

NAtural language processing  Natural language requirements  Requirements classifications  Requirements document 

[Context and motivation] The current breakthrough of natural language processing (NLP) techniques can provide the requirements engineering (RE) community with powerful tools that can help addressing specic tasks of natural language (NL) requirements analysis, such as traceability, ambiguity detection and requirements classification, to name a few. [Question/problem] However, modern NLP techniques are mainly statistical, and need large NL requirements datasets, to support appropriate training, test and validation of the techniques. The RE community has experimented with NLP since long time, but datasets were often proprietary, or limited to few software projects for which requirements were publicly available. Hence, replication of the experiments and generalization have always been an issue. [Principal idea/results] Our near future commitment is to provide a publicly available NL requirements dataset. [Contribution] To this end, we are collecting requirements documents from the Web, and we are representing them in a common XML format. In this paper, we present the current version of the dataset, together with our agenda concerning formatting, extension, and annotation of the dataset.

Source: joint REFSQ Workshops, Doctoral Symposium, Research Method Track, and Poster Track, 27/02/2017



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:382379,
	title = {Towards a dataset for natural language requirements processing},
	author = {Ferrari A. and Spagnolo G.  O. and Gnesi S.},
	booktitle = {joint REFSQ Workshops, Doctoral Symposium, Research Method Track, and Poster Track, 27/02/2017},
	year = {2017}
}
CNR ExploRA

Bibliographic record

Also available from

ceur-ws.orgRestricted