UMBC Email Problems on February 9
Results of Review of Email Problems on February 9th
Last Wednesday afternoon, Feburary 9th, DoIT experienced a mail system error on one of UMBC's major mail servers used for sending mail. This caused mail trying to be sent from UMBC to be delayed and it might have resulted in mail in being destined for UMBC to be deferred by as much as 8 hours.
The problem resulted when an outside vendor network engineer was troubleshooting a problem with our campus firewall management server. He was working on a problem where these firewall management devices were not properly sending mail notifications. These notifications are required by the legislative and USM auditors. Their engineer made a configuration error that caused the firewall management server to flood our mail servers with thousands of mail messages. The mail server is designed to defer mail from the sending host when it sees a host flooding it with mail; however the engineer had also misconfigured the hostname to be to be the same as one of our major mail systems, causing the sending mail servers to defer mail from that legitimate system as well.
We got notification of the error when users contacted the help desk. They paged our the system administrators responsible for mail who got this resolved; however it took a little time to figure out why this strange situation occurred and get the network firewall management server reconfigured.
The impact of this was that for those using UMBC email servers. There was about a one in four chance mail being sent out or received was deferred. Service was restored about 4 hours after it was reported. Students and others using our Google Mail for UMBC were not impacted but some mail being sent from Blackboard to class lists was delayed. By early morning on the 10th all email systems had returned to normal.
DoIT understands that email is a critical campus service. We have taken steps to work with the external vendor and also reworked our own mail processing scripts so that this kind of situation can't happen again. We are very sorry for any inconvenience this caused.
Tags:
Posted: February 15, 2011, 6:25 AM