Hermes Server Issues
- Tuesday, 16th February, 2021
- 10:04am
The newest updates will be posted at the top of this page
18th Feb: 06.29 EST: The restoration of websites has completed and sites have been back online for around 1 hour. Any clients using a 3rd party DNS provider or custom (non-D9) nameservers will need to update their A records to:
213.166.86.58
More details on the restoration is due to be emailed to all impacted clients in the next 30 mins.
18th Feb: 03:45 EST: Last night we took the decision to provision a second replacement server in our UK facility as we have better quality hardware available to us at shorter notice in the hope that the restoration would complete sooner. We are pleased to report that this has made a vast improvement to the restoration times and we hope to have service restored by 6am EST. As the server is in a different facility an IP address change will be involved (for the vast majority of customers no action will be needed) but we thought this would be better than waiting for another 24 hours for the original restore to complete.
17th Feb, 12.12 EST: The restoration has now been running for around 12 hours (we got out time zones mixed up in the previous update!) and just over 30% of accounts have been restored to the new hardware. We will continue to monitor the restoration and make any needed tweaks to the settings to ensure everything is restored as quickly as it possibly can be.
17th Feb, 01:55 EST: The restoration of the backups onto a new server has now been running for around 6.5 hours and is around 10% completed. When all of the data has been restored from the backups we will move the IP addresses over from the current server to the new server and sites will begin to load.
16th Feb, 16:40: We have just received the logins for the replacement server and will proceed with the setup. We will then start the backup restoration process. We will update this page as and when we have more info.
16th Feb, 11:52 EST: We are still waiting for the datacentre to provision a new server so we can begin to restore backups. We heard from them around 90 minutes ago that the server was undergoing final testing but as of yet it hasn't been delivered to us.
16th Feb, 08:21 EST: Unfortunately our efforts to fix the file system corruption on the server are proving to be futile and MySQL remains down. At the present time our only option is to wait for the datacentre to provision the new hardware for us so that we can restore data from our backups. We have reached out to their management team and asked them if they can speed up the delivery for us but as yet we haven't received a response. We will update this page as soon as we get more information.
16th Feb, 04:55 EST: Over the past few hours we have observed some file system corruption on the server caused by a bad hard drive and this is causing services to fail, namely the MySQL service. Other services are up and running without any issues (HTML driven websites, emails, DNS) but MySQL driven websites will show an error message.
We are doing all we can to restore service but depending on how severe the file system corruption is we may need to move to new hardware and restore data from our backups.
As it currently stands this is where we are at:
1. We have our senior admins trying to fix the problem to bring up MySQL driven websites on the current hardware
2. We have ordered a replacement server with the datacentre and have asked for the setup to be expedited although they have informed us this could take up to 48 hours due to having less staff "on the shop floor" to comply with safe social distancing measures, for us this time delay is unacceptable but we are left with little option other than to wait for them to do the build.
As soon as the new hardware is provisioned we will begin restoring websites from our backups. When the restoration has been completed we will then sync any accessible data from the old hardware over to the new and then migrate IP addresses across to the new server.
We appreciate that nobody likes it when their websites are down, least of all us, but as you can imagine, we are becoming swamped with support tickets so we would request that you visit this page as any updates we have we will post here. Submitting multiple tickets about the issue slows down our ability to fix the issue.
FAQ's
======
Q: Do you have an ETA?
A: It is not possible to give an ETA as there are too many unknowns that are out of our control at present. Namely the time it takes the datacentre to provision the new hardware and how long (if at all) we are able to recover from the file system corruption.
Q: How did this happen?
A: The cause appears to have been a failing drive in the RAID 10 array. We attempted to replace the drive last week but the datacentre informed us the RAID array wouldn't take the new drive so we waited for them to get an exact match in stock. They attempted to add the new drive again last night but as before the RAID array wouldn't accept it. They then re-added the original failing drive to the server and this is where the file system corruption appears to have taken place.
Apologies again for the inconvenience caused but rest assured we are doing all in our power to get things back up online as soon as possible.