2021
Conference article  Open Access

Garbled-word embeddings for jumbled text

Sperduti G., Moreo A., Sebastiani F.

Garbled-Word Embeddings  Garbled Words  Misspellings  Distributional Semantic Models 

"Aoccdrnig to a reasrech at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny itmopnrat tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe". We investigate the extent to which this phenomenon applies to computers as well. Our hypothesis is that computers are able to learn distributed word representations that are resilient to character reshuffling, without incurring a significant loss in performance in tasks that use these representations. If our hypothesis is confirmed, this may form the basis for a new and more efficient way of encoding character-based representations of text in deep learning, and one that may prove especially robust to misspellings, or to corruption of text due to OCR. This paper discusses some fundamental psycho-linguistic aspects that lie at the basis of the phenomenon we investigate, and reports on a preliminary proof of concept of the above idea.

Source: IIR 2021 - 11th Italian Information Retrieval Workshop, Bari, Italy, 13-15/09/21



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:457946,
	title = {Garbled-word embeddings for jumbled text},
	author = {Sperduti G. and Moreo A. and Sebastiani F.},
	booktitle = {IIR 2021 - 11th Italian Information Retrieval Workshop, Bari, Italy, 13-15/09/21},
	year = {2021}
}
CNR ExploRA

Bibliographic record

ISTI Repository

Published version Open Access

Also available from

ceur-ws.orgOpen Access