Better software observability using Tango Controls with OpenTelemetry - experience at MAX IV

WEAG006
24 Sept 2025, 10:15
15m
Grand Ballroom (Palmer House Hilton Chicago)

Grand Ballroom

Palmer House Hilton Chicago

17 East Monroe Street Chicago, IL 60603, United States of America
Contributed Oral Presentation MC10: Software Architecture & Technology Evolution WEAG MC10 Software Architecture and Technology Evolution

Speaker

Marcelo Alcocer (MAX IV Laboratory)

Description

Distributed software systems are complex and the interactions across multiple machines can be difficult to debug and monitor. Log messages are not enough for observability. We need more information about the communication between applications, how each one is executing, and its internal state. In practice, applications can be made more observable using software frameworks such as OpenTelemetry. The Tango Controls framework has built-in support for OpenTelemetry in C++ and Python since version 10.0.0. We are using it operationally at the MAX IV synchrotron. We provide examples of the traces, trends, and other data available when running at scale on a beamline with hundreds of devices. We report on the compute and performance impact for client and server software applications, as well as practical issues. For the backend servers that ingest and query the telemetry data (running Grafana Tempo for traces and Grafana Loki for logs) we report on the compute resources required.

Author

Anton Joubert (MAX IV Laboratory)

Co-authors

Mr Benjamin Bertrand (MAX IV Laboratory) Lin Zhu (MAX IV Laboratory)

Presentation materials

There are no materials yet.