Speaker
Description
Effective knowledge management is essential to minimize downtime and maintain institutional memory in large-scale accelerator facilities. We present APS RAG, a domain-aware Retrieval-Augmented Generation (RAG)* system currently deployed at the Advanced Photon Source (APS), designed to synthesize operational intelligence and facilitate semantic data retrieval from various dispersed databases. The system consolidates over 10,000 unique documents from four live databases: the BELY scientific electronic logbook, operational Microsoft Teams chat, the Integrated Content Management System (ICMS), and Work Request system. By employing the latest frontier LLMs via Argonne’s ARGO AI platform, APS RAG integrates a specialized query preprocessing pipeline that performs temporal parsing, domain acronym resolution, multi-query expansion, and final response generation.
To ensure high precision, a hybrid retrieval architecture is utilized, combining dense vector and keyword search. The results are aggregated using Reciprocal Rank Fusion (RRF) and refined through cross-encoder reranking to maximize relevance**. An 800-question evaluation dataset was built using InPars methodology***, supplemented with qualitative user feedback. The final responses from APS RAG have inline citations embedded which displays the source document chunk and a web accessible link to the original document. Future developments include multimodal integration and agentic knowledge graph capabilities****.
Footnotes
P. Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," in Proc. NeurIPS’20, 2020, vol. 33, pp. 9459-9474.
G. V. Cormack, C. L. A. Clarke, and S. Buettcher, "Reciprocal rank fusion outperforms condorcet and individual rank learning methods," in Proc. SIGIR'09, 2009, pp. 758–759.
L. Bonifacio et al., in Proc. SIGIR `22, Madrid, Spain, 2022, pp 2387–2392
**S. Pan et al., IEEE Trans. Knowl. Data Eng., 2024, vol. 36, no. 7, pp. 3580-3599.
Funding Agency
Work supported by the U. S. Department of Energy, Office of Science, under Contract No. DE-AC02-06CH11357.
| In which format do you inted to submit your paper? | LaTeX |
|---|---|
| Preprint marking on your proceeding paper | I do not wish my paper to be marked as preprint. |