D9 Hosting Downtime
- Wednesday, 21st May, 2008
- 00:00am
As I'm sure you have already noticed there was quite a long outage that affected all sites hosted on D9.
In short this was down to a massive storm that swept across the Atlanta area last night, a lightning strike hit the datacentre causing a power outage that brought down all servers, including good 'ol Bullseye! (The server that your site his hosted on)
After a long wait the server has now finished the file check and we are able to bring everyone back online. Here's a more detailed explanation of what happened, as provided by our datacentre:
--Start Datacentre Statement--
"Severe storm cells came through North Georgia Region this evening. AtlantaNAP experienced an over current fault outage on one of our 2 main feeds. The feed is the original feed that has the most load currently connected to it. The amount of systems connected to the load is the amount of lightning and over current that will try to be passed to the system – i.e. if you don’t have very much load on it - like our new feed is currently only at 1/6th load - then current does not try to flow to it very much. Our first system is currently at 65% load so it tried to absorb much more of the lightning strike than the other one and hence the main breaker going into over current fault.
I have spoken with all of our key electrical engineers associated with the building at this point. According to Georgia power / our PSSI and Cummins engineers – we likely took a lightning strike to the utility very near the facility which caused an over current fault on our main incoming breaker on our first set of switchgear. The breaker is designed to trip in the event of this kind of fault to protect the gear (your computers) inside the building from being burned up by the lightning strike.
When this type of fault happens - the computer that is the brains of the swithgear will not start the generators until an engineer verifies where the fault is. This is because a fault inside the wiring plant could also cause this kind of over current in the event of a main short if a feeder wire of main current in the building were to become damaged.
In that case it would be very dangerous to turn the power back on manually or to force a manual start of the gen sets and push current to the system with a fault remaining. Lives and machinery could be lost.
We dispatched several of our staff visually to inspect for faults – (we did not want to turn something on and have it fry everyone’s gear) and found none and verified it was likely a lightning strike and manually started the generators to restore power. Unfortunately the ups system is only designed to carry that load for 10 minutes which was not enough time for us to safely verify and do a manual start.
This is apparently a rare event – to get a direct utility strike like this – that close that does not get dissipated before it hits us. The farther away from your site the strike occurs - the more other load and grounds it has to dissipate before it gets to you.
The good news is we did not burn up any equipment.
Some of you did not lose power because you were connected to the other lightly loaded feed coming in and it was not enough load source to overwhelm the breaker since it is only 18% loaded at this point.
Some of you lost network connectivity because downstream feeder switches that your computers are connected to are only single power supply units.
We are in the process of examining a facility wide network upgrade that will move to a newer chassis based solution throughout the facility - we started looking at this as a way to offer new services capability that many f you have been asking for - it is a costly upgrade and will bring redundancy but also brings some pitfalls as well since you have more connections into a single chassis. We are still looking at this currently and will keep you up to date as to the direction we decide to move.
They have told me that under normal operating conditions there is really nothing we could have done and we should simply be glad we had good equipment installed that kept our computers from being fried.
I am thankful that I am not looking at a lot of damaged equipment that could not simply be turned back on - that would be a disaster I do not want to deal with. At this point it seems like the new switchgear with over current protection was a good investment."
--End Datacentre Statement--
Please accept our apologies for any inconvenience this outage has caused. Unfortunately there's not much you can do to protect yourself from freak events like this!
As stated, all sites should now be back up and running as normal. If you do find any errors with any of your sites, please don't hesitate to open a support ticket and we will be happy to help you.
Thanks for reading, and again please accept our apologies.
Regards,
D9 Hosting
In short this was down to a massive storm that swept across the Atlanta area last night, a lightning strike hit the datacentre causing a power outage that brought down all servers, including good 'ol Bullseye! (The server that your site his hosted on)
After a long wait the server has now finished the file check and we are able to bring everyone back online. Here's a more detailed explanation of what happened, as provided by our datacentre:
--Start Datacentre Statement--
"Severe storm cells came through North Georgia Region this evening. AtlantaNAP experienced an over current fault outage on one of our 2 main feeds. The feed is the original feed that has the most load currently connected to it. The amount of systems connected to the load is the amount of lightning and over current that will try to be passed to the system – i.e. if you don’t have very much load on it - like our new feed is currently only at 1/6th load - then current does not try to flow to it very much. Our first system is currently at 65% load so it tried to absorb much more of the lightning strike than the other one and hence the main breaker going into over current fault.
I have spoken with all of our key electrical engineers associated with the building at this point. According to Georgia power / our PSSI and Cummins engineers – we likely took a lightning strike to the utility very near the facility which caused an over current fault on our main incoming breaker on our first set of switchgear. The breaker is designed to trip in the event of this kind of fault to protect the gear (your computers) inside the building from being burned up by the lightning strike.
When this type of fault happens - the computer that is the brains of the swithgear will not start the generators until an engineer verifies where the fault is. This is because a fault inside the wiring plant could also cause this kind of over current in the event of a main short if a feeder wire of main current in the building were to become damaged.
In that case it would be very dangerous to turn the power back on manually or to force a manual start of the gen sets and push current to the system with a fault remaining. Lives and machinery could be lost.
We dispatched several of our staff visually to inspect for faults – (we did not want to turn something on and have it fry everyone’s gear) and found none and verified it was likely a lightning strike and manually started the generators to restore power. Unfortunately the ups system is only designed to carry that load for 10 minutes which was not enough time for us to safely verify and do a manual start.
This is apparently a rare event – to get a direct utility strike like this – that close that does not get dissipated before it hits us. The farther away from your site the strike occurs - the more other load and grounds it has to dissipate before it gets to you.
The good news is we did not burn up any equipment.
Some of you did not lose power because you were connected to the other lightly loaded feed coming in and it was not enough load source to overwhelm the breaker since it is only 18% loaded at this point.
Some of you lost network connectivity because downstream feeder switches that your computers are connected to are only single power supply units.
We are in the process of examining a facility wide network upgrade that will move to a newer chassis based solution throughout the facility - we started looking at this as a way to offer new services capability that many f you have been asking for - it is a costly upgrade and will bring redundancy but also brings some pitfalls as well since you have more connections into a single chassis. We are still looking at this currently and will keep you up to date as to the direction we decide to move.
They have told me that under normal operating conditions there is really nothing we could have done and we should simply be glad we had good equipment installed that kept our computers from being fried.
I am thankful that I am not looking at a lot of damaged equipment that could not simply be turned back on - that would be a disaster I do not want to deal with. At this point it seems like the new switchgear with over current protection was a good investment."
--End Datacentre Statement--
Please accept our apologies for any inconvenience this outage has caused. Unfortunately there's not much you can do to protect yourself from freak events like this!
As stated, all sites should now be back up and running as normal. If you do find any errors with any of your sites, please don't hesitate to open a support ticket and we will be happy to help you.
Thanks for reading, and again please accept our apologies.
Regards,
D9 Hosting