2024
Conference article  Open Access

LongDoc summarization using instruction-tuned large language models for food safety regulations

Rocchietti G., Rulli C., Randl K., Muntean C., Nardini F. M., Perego R., Trani S., Karvounis M., Janostik J.

Finetuning  Food safety regulations  Large Language Models  Summarization 

We design and implement a summarization pipeline for regulatory documents, focusing on two main objectives: creating two silver standard datasets using instruction-tuned large language models (LLMs) and finetuning smaller LLMs to perform summarization of regulatory text. In the first task, we employ state-of-the-art models, Cohere C4AI Command-R-4bit and Llama-3-8B, to generate summaries of regulatory documents. These generated summaries serve as ground-truth data for the second task, where we finetune three general-purpose LLMs to specialize in high-quality summary generation for specific documents while reducing the computational requirements. Specifically, we finetune two Google Flan-T5 models using datasets generated by Llama-3-8B and Cohere C4AI, and we create a quantized (4-bit) version of Google Gemma 2-B based on summaries from Cohere C4AI. Additionally, we initiated a pilot activity involving legal experts from SGS-Digicomply to validate the effectiveness of our summarization pipeline.

Source: CEUR WORKSHOP PROCEEDINGS, vol. 3802, pp. 33-42. Udine, Italy, 5-6/09/2024



Back to previous page
BibTeX entry
@inproceedings{oai:iris.cnr.it:20.500.14243/525278,
	title = {LongDoc summarization using instruction-tuned large language models for food safety regulations},
	author = {Rocchietti G. and Rulli C. and Randl K. and Muntean C. and Nardini F.  M. and Perego R. and Trani S. and Karvounis M. and Janostik J.},
	booktitle = {CEUR WORKSHOP PROCEEDINGS, vol. 3802, pp. 33-42. Udine, Italy, 5-6/09/2024},
	year = {2024}
}

EFRA
Extreme Food Risk Analytics


OpenAIRE