2 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2026 Journal article Open Access OPEN
Enhancing token boundary detection in disfluent speech
Srivastava Manu, Ferro Marcello, Pirrelli Vito, Coro Gianpaolo
This paper presents an open-source Automatic Speech Recognition (ASR) pipeline optimised for disfluent Italian read speech, designed to enhance both transcription accuracy and token boundary precision in low-resource settings. The study aims to address the difficulty that conventional ASR systems face in capturing the temporal irregularities of disfluent reading, which are crucial for psycholinguistic and clinical analyses of fluency. Building upon the WhisperX framework, the proposed system replaces the neural Voice Activity Detection module with an energy-based segmentation algorithm designed to preserve prosodic cues such as pauses and hesitations. A dual-alignment strategy integrates two complementary phoneme-level ASR models to correct onset–offset asymmetries, while a bias-compensation post-processing step mitigates systematic timing errors. Evaluation on the READLET (child read speech) and CLIPS (adult read speech) corpora shows consistent improvements over baseline systems, confirming enhanced robustness in boundary detection and transcription under disfluent conditions. The results demonstrate that the proposed architecture provides a general, language-independent framework for accurate alignment and disfluency-aware ASR. The approach can support downstream analyses of reading fluency and speech planning, contributing to both computational linguistics and clinical speech research.Source: INTELLIGENT SYSTEMS WITH APPLICATIONS, vol. 29
DOI: 10.1016/j.iswa.2025.200614
Project(s): READLET
Metrics:


See at: CNR IRIS Open Access | www.sciencedirect.com Open Access | CNR IRIS Restricted


2025 Journal article Restricted
Oral text reading as a multi-sensory task
Marzi C., Nadalini A., Lento A., Srivastava M., Todesco A., Pirrelli V., Ferro M.
Reading aloud involves the complex interplay of visual, motor and lexical processes. While eye movements have been extensively investigated in the reading literature, less is known about the coordination of voice, eye and finger movements in oral and finger-point reading. Here we propose a multimodal perspective on these dynamics, emphasising the contribution of integrating eye-tracking, finger-tracking, and voice recording to a more comprehensive understanding of reading proficiency. Our results show that finger and eye movements are strongly coupled in early readers. Conversely, skilled readers show a more flexible coordination of sensorimotor signals and a more adaptive sensitivity to prosodic structures, with voice articulation slowing at key structural points, such as chunk heads and sentence-final boundaries. These findings provide novel insights into how multimodal coordination evolves with reading expertise, contributing to a more fine-grained understanding of reading fluency.Source: LINGUE E LINGUAGGIO, vol. XXIV (issue 1), pp. 141-156
DOI: 10.1418/117447
Metrics:


See at: CNR IRIS Restricted | CNR IRIS Restricted | www.rivisteweb.it Restricted