Prometheus system monitoring stack

THPD009
25 Sept 2025, 16:15
1h 30m
Palmer House Hilton Chicago

Palmer House Hilton Chicago

17 East Monroe Street Chicago, IL 60603, United States of America
Poster Presentation MC06: Control System Infrastructure and Cyber Security THPD Posters

Speaker

Trymore Gatsi (National Research Foundation)

Description

The MeerKAT radio telescope, a 64-dish instrument located in South Africa, represents a significant leap in Southern Hemisphere radio astronomy, providing unprecedented sensitivity prior to its integration with the Square Kilometer Array (SKA). The operational efficiency of complex projects like MeerKAT relies heavily on robust Control and Monitoring (CAM) systems, which is underpinned by over fifty Linux server infrastructures. Ensuring the stability and performance of these servers is paramount to maximizing scientific output. This paper details the implementation and benefits of a Prometheus-based monitoring stack, designed to provide comprehensive surveillance of the MeerKAT CAM system’s hardware, operating systems and CAM services. The system proactively detects issues , triggers alerts, speeds remediation, and improves troubleshooting, ultimately minimizing downtime and protecting critical data.
The monitoring stack for CAM was deployed with Ansible. The Node Exporter application extracts the operating system metrics such as node_memory_MemFree_bytes, node_filesystem_free_bytes as well as our custom CAM services metrics such as karoo_camlog_status ,karoo_vault_status and feeds them into Prometheus. Prometheus processes these metrics and generates alerts using python scripts that the Alertmanager application then uses to generate notifications and alarms via the Mattermost messaging application. Grafana transforms stored metrics in Prometheus into interactive dashboard.

Author

Trymore Gatsi (National Research Foundation)

Co-authors

Nomcebo Makhoba (National Research Foundation) Xitsembiso Baloyi (South African Radio Astronomy Observatory)

Presentation materials

There are no materials yet.