← Back to News List

Issue with some 2020 GPU Nodes on chip

To all chip-gpu users,

Following up on our previous downtime communication, the majority of the chip cluster is back online and operating normally. However, we are experiencing an ongoing issue with a subset of our 2020 GPU nodes, which remain in the “drng” or “drain” states, unusable by new Slurm jobs.

Our team is currently on-site investigating, as some of these specific machines failed to reboot properly into the upgraded environment, while others came back online without issue. Again, no other machines on the cluster are affected.

The updates applied during this downtime were necessary to meet critical security and audit requirements. As we continue to secure our infrastructure, keeping aging hardware compatible with modern software environments becomes increasingly difficult.

As we endeavor to bring these machines back online to meet the needs of researchers, we want to keep a few ideas top of mind… (1) GPU servers generally have a shorter operational lifespan than similar CPU nodes (e.g., same manufacturer, server-grade, component density, etc) & (2) these 2020 GPU machines are six years old and outside the bounds of official hardware warranty/support. 

We understand this extended downtime for these specific resources is frustrating. Our team is on-site and working to resolve the boot issues for the remaining offline nodes, and we will provide another update as soon as we know more.

Thank you for your continued patience.

Note that DoIT Research Computing and Data staff are still working to assess and resolve filesystem issues related to RRStor (Ceph) performance, see this posting for details.

As always, remember to submit any issues you notice via a descriptive RT ticket: https://rtforms.umbc.edu/rt_authenticated/doit/DoIT-support.php?auto=Research%20Computing

and check out our documentation here: https://umbc.atlassian.net/wiki/spaces/faq/pages/1082589207/UMBC+HPCF+-+chip

Thanks for reading,
Roy Prouty
Assistant Director for Research Computing
UMBC DoIT
Tags:

Posted: February 26, 2026, 8:53 AM