Unplanned Outage 1/4/2012

Follow

Comments

5 comments

  • Avatar
    Richie Ward

    Outage started at approx 9:10 pm est...we are working closely with our operations staff to ensure a recovery as quickly as we possibly can.  We appreciate your patience and will be in constant communication as more facts come to light.

  • Avatar
    Jovi Jovanelly

    Still working with our operations group on this issue.  We are putting all efforts into this to get resolution.

  • Avatar
    Jovi Jovanelly

    We are making some progress on getting the application back and functional.  We need to address a few other key things and verify and then I'll give the ok.

  • Avatar
    Jovi Jovanelly

    I apologize for the lag...we were coming back up near 2am EST and then had another failure.  Right now, you guys are back in business.  All features/functions should be working properly.  We're in process of doing root cause analysis, but I can promise all of you guys that we will find the cause of this extended downtime and work to ensure no further interruptions occur.

    Jovi

  • Avatar
    Duncan McCreery

    Following up on this issue, the root cause was a hardware failure by a SAN (Storage Area Network) that serves as physical storage for virtual machines.  To protect against similar problems in the future, working in conjunction with our data center, we've doubled the SAN capacity.  This change was put into place earlier this week without any service disruption or downtime.

    Unrelated to the SAN, there were sporadic performance slow-downs early last week.  On Tuesday of this week we rolled out a new configuration and immediately saw increased performance and stability across the application.  Since that time, we haven't seen production slow-downs or interruptions.  

    Our systems and engineering teams are keeping a close eye on both of these changes and will continue to make improvements moving forward.

Please sign in to leave a comment.