A CEO (who has now moved on to greater things) of a large and popular Web service once was relating to me a tale of one of his major embarrassing outages and mentioned something I now always look out for:
“It take two things to fail to really cause trouble.” (or something like that)
Y’know: you’re prepared and when the poop hits the fan, you’re respond swimingly. But then something else happens at the same time and it all spirals out of control. I think you all must have stories of such combo-punches.
Well, seems like the folks at Rackspace had a really krummy Monday.* First some stuff went down and then came back. And then a few hours later, a truck took out their power, leading to a second incident related to the power that led to even more servers going down.
Yep, stuff happens.
Rackspace posted a public letter to their customers explaining the outage (link, to which, in the article below). Read more below.
Link: Quick, Plug The Internet Back In: Major Rackspace Outage:
Rackspace’s generators kicked in but, as we’ve seen before, lots of other things can then go wrong. In this case, two chillers within the data center failed to start back up, and a number of servers were taken offline to avoid damage from overheating.
*Heh, I got a tip off on the story from one of my tweeps, @djacobs.