ntang (ntang) wrote,
ntang
ntang

  • Mood:

LJ outage: what really happened?

Version one:

Power outage in the SOMA neighborhood in San Francisco. Despite data centers having multiple levels of redundancy and generally having enough fuel to go "off the grid" for at least a day and still stay up and running, LJ's data center (365 Main) goes offline (also taking out several other sites). Little or no mention is made of the backup generators or local fuel or other things that are supposed to make this sort of outage impossible. Maybe they were eaten by gremlins.

Read about it here:
http://news.livejournal.com/101880.html
http://mashable.com/2007/07/24/san-francisco-power-outage-takes-down-craigslist-technorati-sixapart-more/
http://venturebeat.com/2007/07/24/san-franciscos-tech-sites-down-amid-power-outage/
http://www.techcrunch.com/2007/07/24/key-sites-offline-today-major-sf-power-outage-may-be-to-blame/



Version two:

An employee of the data center that hosts LJ (365 Main) along with several other sites (Craigslist, Yelp, etc.), gets drunk and goes on an angry rampage and shuts down and/or breaks part of the data center. Over 40 racks go down for hours as they recover.

Read about it here:
http://valleywag.com/tech/breakdowns/a-drunk-employee-kills-all-of-the-websites-you-care-about-282021.php
http://valleywag.com/tech/breakdowns/365-main-outage-causes-aftershocks-in-web-world-282072.php
http://valleywag.com/tech/sightings/-282077.php



Version three: (i.e. what most likely actually happened)

Unfortunately, the story's pretty boring. A local power outage in the SOMA neighborhood caused the backup generators to automatically kick on at 365 Main... or rather, to *try* to kick on. The automated system failed, and so they had to turned on manually, 45 minutes after the outage began. Basically, just a case of technical incompetence, it looks like.

Another part of the reason it took so long to get everything up and running again was because their security system, i.e. a guard checking in each person individually, couldn't keep up with the number of engineers and admins lined up outside of the door. There's a good chance no one at LJ could actually reach the servers for quite a while even after they arrived.

Read about it here:
http://www.datacenterknowledge.com/archives/2007/Jul/24/generator_failures_caused_365_main_outage.html

See a picture of the crowd waiting to get in here:
http://valleywag.com/tech/breaking/angry-mob-gathers-outside-sf-datacenter-282053.php
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 2 comments