Lost Connections 10/08/25
Hi everyone,
Today (October 8th 2025) at around 14:30ET, we experienced an unexpected system downtime. While investigating, we found that the failure stemmed from a failure on the file server associated with the ada GPU cluster (ada-rstor). In order to make the chip hardware available to users, we have temporarily disabled all network connections between the chip cluster and this ada-rstor file server.
Please know that we are currently working to understand the underlying issue with the ada-rstor file server. We hope to be able to make the ada-rstor volumes available ASAP.
While we know this action disproportionately affects users of the chip-gpu cluster and we are appreciative of users' patience as we navigate this issue.
In the meantime, the chip cluster compute hardware along with any volumes should be available to all users. If you notice any issues with the use of chip unrelated to ada-rstor volumes (file paths starting with "/umbc/ada"), please report them via RT with as much information as possible.
Max Breitmeyer
HPC Specialist
Posted: October 8, 2025, 5:58 PM