New Compute Nodes and Research Storage Systems Deployed
Enhancing UMBC's Research Computing Capabilities
We are excited to announce that DoIT’s Research Computing team has successfully deployed over 60 new CPU and GPU nodes along with integrating the high-performance computing environment with the Retriever Research Storage system (R-RStor). The infrastructure enhancements support research activities ranging from artificial intelligence development to atmospheric science analysis.
This investment, totaling over $2 million, modernizes our research computing infrastructure by replacing aging hardware and introducing major updates to the management and security to the cluster. These updates will aid researchers in meeting security requirements in research grants and improve the capabilities and access to high-performance computing needs.
Key achievements:
51 new CPU nodes on chip: 13 high-memory and 38 regular-memory nodes, providing a total of 3264 CPU cores and 32TB of RAM.
10 new GPU nodes on chip: Eight LS40S nodes and two H100 nodes, adding 320 CPU cores and 36 GPUs.
Retriever Research Storage System (R-RStor): Added 2.5 PetaBytes of storage, leveraging the Ceph distributed file system to enhance data mobility for research workflows, as part of the NSF Grant awarded to DoIT last year.
Modernized software environment: Transition to Red Hat Enterprise Linux 9 (RHEL9), improving system reliability, security, and compatibility with modern software stacks.
Redundant head nodes: Implemented across clusters to eliminate single points of failure, ensuring a more reliable and secure system.
First HPC Bootcamp: Delivered on November 12, 2024, as part of the NSF SCIPE grant, providing hands-on GPU training for over 20 participants from iHARP, IMET, and UMCES.
The project has significantly enhanced UMBC’s capacity to support high-performance computing (HPC), providing a more secure and robust environment for interdisciplinary and inter-institutional collaborations.
Team:
This project would not have been possible without the DoIT Research Computing and Unix teams, particularly Roy Prouty, HPC Specialist, Greg Ballantine, Research Computing Systems Administrator and Andy Leeds, Coordinator of Research Computing. Staff in Unix, Network Engineering, and many DoIT Student Employees, including Max Breitmeyer, Ryan Cather, Phil Henry, Danielle Esposito, and Beamlak Bekele, played a huge role. A special thanks to Vandana Janeja, iHARP Project Lead, and Sai Vikas Amaraneni, iHARP Graduate Student, for their support throughout the process.
UMBC Faculty Contributors have played a significant role in the initial specifications, testing, and providing insight along the way, as did the NSF SCIPE Grant Team and the entire UMBC Research Community, including iHARP, IMET, and UMCES.
We would like to extend our gratitude to everyone who contributed to this project's success!
Looking Ahead:
The DoIT Research Computing team will continue to monitor and support the new systems post-deployment.
Address optimization needs as researchers test and rebuild software stacks for the new hardware environment.
Expand programming opportunities with additional HPC Bootcamps to train new and experienced users in maximizing the HPC environment.
Support hiring efforts for the Assistant Director of Research Computing and remaining system administration positions.
If you have any questions, need support, or encounter any issues with the new compute nodes or R-RStor system, please submit a ticket so they can assist you.
Unix Students, left to right: Danielle Esposito, Max Breitmeyer, Phil Henry, Beamlak Bekele
Installing 51 new CPU nodes in the bwtech Research Park Data Center
Posted: February 13, 2025, 2:44 PM
