1–6 Jun 2025
Taipei International Convention Center (TICC)
Asia/Taipei timezone

Application of large language models for the extraction of information from particle accelerator technical documentation

THPM120
5 Jun 2025, 15:30
2h
Exhibiton Hall A _Magpie (TWTC)

Exhibiton Hall A _Magpie

TWTC

Poster Presentation MC6.D13 Machine Learning Thursday Poster Session

Speaker

Dr Mariusz Sapinski (Paul Scherrer Institute)

Description

The large set of technical documentation of legacy accelerator systems, coupled with the retirement of experienced personnel, underscores the urgent need for efficient methods to preserve and transfer specialized knowledge. This paper explores the application of large language models (LLMs), to automate and enhance the extraction of information from particle accelerator technical documents. By exploiting LLMs, we aim to address the challenges of knowledge retention, enabling the retrieval of domain expertise embedded in legacy documentation.
We present initial results of adapting LLMs to this specialized domain. Our evaluation demonstrates the effectiveness of LLMs in extracting, summarizing, and organizing knowledge, significantly reducing the risk of losing valuable insights as personnel retire. Furthermore, we discuss the limitations of current LLMs, such as interpretability and handling of rare domain-specific terms, and propose strategies for improvement. This work highlights the potential of LLMs to play a pivotal role in preserving institutional knowledge and ensuring continuity in highly specialized fields.

Region represented Europe
Paper preparation format LaTeX

Author

qing dai (Paul Scherrer Institute)

Co-authors

Adam Grycner (Google (Switzerland)) Dr Mariusz Sapinski (Paul Scherrer Institute) Rasmus Ischebeck (Paul Scherrer Institute)

Presentation materials

There are no materials yet.