Cardillo F. A., Debole F., Frontini F., Aelami M., Chahinian N., Conrad S.
Wastewater networks Computation and Language (cs.CL) LLMs for NER Named Entity Recognition Multilingual NLP FOS: Computer and information sciences [SDU.STU.HY] Sciences of the Universe [physics]/Earth Sciences/Hydrology [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing Annotation projection Urban hydrology Domain-specific corpus Computer Science - Computation and Language
Efficient wastewater and stormwater management is mandatory for sustainable cities. Extracting structured knowledge from reports and regulations is challenging due to domain-specific terminology and multilingual contexts. This work focuses on domain-specific Named Entity Recognition (NER) as a first step towards effective relation and information extraction to support decision making. A multilingual benchmark is crucial for evaluating these methods. This study develops a French-Italian domain-specific text corpus for wastewater management. It evaluates state-of-the-art NER methods, including LLM-based approaches, to provide a reliable baseline for future strategies and explores automated annotation projection in view of an extension of the corpus to new languages.
Source: COLLOQUIUM IN INFORMATION SCIENCE AND TECHNOLOGY, pp. 226-231. Marrakech, Morocco, 2025
Publisher: Institute of Electrical and Electronics Engineers
@inproceedings{oai:iris.cnr.it:20.500.14243/562981,
title = {Novel benchmark for NER in the wastewater and stormwater domain},
author = {Cardillo F. A. and Debole F. and Frontini F. and Aelami M. and Chahinian N. and Conrad S.},
publisher = {Institute of Electrical and Electronics Engineers},
doi = {10.1109/cist65886.2025.11224095 and 10.48550/arxiv.2506.01938},
booktitle = {COLLOQUIUM IN INFORMATION SCIENCE AND TECHNOLOGY, pp. 226-231. Marrakech, Morocco, 2025},
year = {2025}
}