Speaker
Description
Most accelerator control systems assume that the effect of an action can be evaluated locally and immediately. While
greedy approaches work in near-linear regimes and Bayesian Optimisation (BO) is now standard for black-box tuning,
both are essentially static optimisers and struggle in dynamic tasks with delayed consequences, where even adaptive
BO remains time-myopic and lacks explicit temporal credit assignment for system memory and long-range machine
evolution. We investigate three relevant forms of delayed consequences: explicit action latency (field settling delays
response), magnetic hysteresis (output depends on change history), and ballistic amplification (small upstream kicks grow
through nonlinear optics and apertures, causing downstream loss). Using a high-fidelity XSuite model of the AWAKE
electron line, we benchmark a reinforcement learning controller against an inverse-response greedy optimiser and BO.
The learning-based method anticipates delayed effects and avoids failure regions where both baselines fail, indicating
that delayed-consequence regimes are a key class of accelerator control problems where horizon-aware model-based or
learning-based methods clearly outperform current practice.
| In which format do you inted to submit your paper? | LaTeX |
|---|---|
| Preprint marking on your proceeding paper | I wish my paper to be marked as preprint. |