Modern technology enables us to accomplish so much, but when there's a disruption, we're reminded of our dependence, and it's painful! Last Thursday and Friday, many users were affected by delays in sending and receiving emails. The systems at Rackspace, one of the leading cloud providers in the world, became overloaded and disrupted services for many users, including a number of Ekaru clients.
We have been tracking the situation closely, and spoken to many of our users, and the situation was mostly resolved on Friday. We've provided updates on our website and social media pages, and we're always available by phone for questions. Thank you for your patience and understanding!
None of the leading providers are immune from disruptions and downtime. Office 365 had three disruptions in June, and Amazon Web Services had a major outage in February. In an twist of irony, Down Detector, which monitors web outages was itself down as a result of the Amazon outage. Although Amazon, Rackspace, Office 365 are known for excellent up times, there is always the chance of an exception.
Here is a summary from Rackspace regarding the timeline that affected users:
- "We identified an issue with system performance during the week of July 3. While minor, we were concerned that it had the potential to grow worse over time. After discussing the issue with our vendors and consultants, we elected to perform proactive maintenance designed to double our total capacity. The additional capacity would provide plenty of headroom and prevent the issue.
- On the evening of Wednesday, July 12, the capacity was ready, and engineers initiated the process to re balance users across the new hardware.
- An unknown bug associated with moving users caused high system load across affected systems, and impacted only those users who had been moved.
- During the time of impact, when an affected user tried to access their mailbox, they would have seen access errors, incomplete message listings, or other errors. Mail delivery was also slowed for a time, and this would have affected a broader set of users.
- Email messages were not lost or destroyed, but would have been inaccessible by some users during the time of impact."
More analysis is underway, but for now, the important thing is that performance has been restored.
We know how stressful events like this are for all users, and we are here to help in any way we can, but in some ways this is the technology equivalent of being stuck on an airplane at the gate with a mechanical delay or weather delay - sometimes it's just out of our control! Work was performed around the clock to restore services.