2024
Conference article  Open Access

Preprocessing of recto-verso printed documents based on neural networks for text analysis

Savino P, Tonazzini A

Optical character recognition  Recto-verso documents  Shallow multilayer neural networks  Ancient document text analysis  Degraded document binarization 

Among the many and varied damages affectingancient documents, the penetration of ink from one side of thepage to the other is one of the most frequent and invasive. In thiswork, we are interested in binarizing such degraded documents,for the application of OCR or other automatic text analysistools, which can help philologists and palaeographers in texttranscription. We previously proposed a data model that roughlydescribes this damage for front-to-back documents, and used itto generate an artificial training set that can teach a shallowneural network how to classify pixels on both sides into clean orcorrupt. We show that this joint processing of the two sides of thedocument can significantly improve binarization and thereforeOCR and other text analysis tasks, compared to the separateprocessing of the single sides, using the same information.


Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:490208,
	title = {Preprocessing of recto-verso printed documents based on neural networks for text analysis},
	author = {Savino P and Tonazzini A},
	doi = {10.1109/cist56084.2023.10409970},
	year = {2024}
}