Speaker
Description
Particle accelerators generate vast amounts of historical data from logs, yet learning-based control often still relies on risky online optimisation. To better utilise this data and avoid online exploration, we present an offline reinforcement learning (RL) workflow. First, we use XSuite to generate high-fidelity trajectories for steering tasks across representative scenarios, including optics variations, alignment errors and jitter, yielding a synthetic dataset of expert and non-expert behaviour. Second, we learn an uncertainty-aware, Koopman-stabilised world model from this data, in which nonlinear beam dynamics are lifted into a latent space with approximately linear, spectrally constrained evolution and a regularised residual term. This structure provides numerically stable long-horizon rollouts and estimates of epistemic uncertainty in latent space.
The resulting surrogate environment enables model-based offline RL, where policies are optimised entirely on pre-generated data while epistemic uncertainty is used to detect distribution shift and enforce soft safety constraints. We benchmark these offline RL policies against a PPO agent trained directly in simulation. Results show that policies trained purely offline on the Koopman world model can match online PPO performance without requiring any interaction with the real machine. This demonstrates a safe, reproducible pathway for turning historical accelerator data into effective learning-based control policies.
| In which format do you inted to submit your paper? | LaTeX |
|---|