Corbara S., Chulvi Ferriols B., Rosso P., Moreo A.
Political speech Text distortion Authorship identification
Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information.
Source: NLDB 2022 - 27th International Conference on Applications of Natural Language to Information Systems, pp. 394–402, Valencia, Spagna, 15-17/6/2022
@inproceedings{oai:it.cnr:prodotti:472052, title = {Investigating topic-agnostic features for authorship tasks in Spanish political speeches}, author = {Corbara S. and Chulvi Ferriols B. and Rosso P. and Moreo A.}, doi = {10.1007/978-3-031-08473-7_36}, booktitle = {NLDB 2022 - 27th International Conference on Applications of Natural Language to Information Systems, pp. 394–402, Valencia, Spagna, 15-17/6/2022}, year = {2022} }