...
Time (CET) | |
---|---|
15:06 | Foreman triggered vm remove action. |
15:35 | Dick Visser contacted us about problems with wordpress1.geant.org and filesender-prod.geant.org because the Nagios monitoring that he still runs for these services from the University of Amsterdam alerted that these system are not accessible |
15:41 | Konstantin Lepikhov confirmed that hosts are missing in VMware |
15:44 | Konstantin Lepikhov identified the issue and started investigation on VMware cluster |
16:43 | Konstantin Lepikhov contacted Qaiser in Slack to confirm backup existence |
16:44 | Dick Visser contacted Qaiser Ahmed on his mobile phone, no answer |
16:45 | DevOps confirmed that there are no backups or extra copies on VMware storage |
17:00 | Konstantin Lepikhov called Qaiser Ahmed in Slack, no response. |
17:00 | Dick Visser confirmed that he has backups on server at Amsterdam university (those are daily backups taken directly by VMs itself). |
18:26 | Qaiser Ahmed confirmed on #devops channel that whole folder called AMS_UBUNTU on vmware cluster is not backed up and there's no data left. |
18:30 | Dick Visser recreated new VMs in the VMWare cluster and started the restore process |
20:30 | Dick Visser restored the backup and brought all sites online. |
20:45 | Konstantin Lepikhov made an official announcement on the #it and #general Slack channels about the incident and the resolution. |
21:00 | Dick Visser started restore of filesender-prod.geant.org. |
21:50 | Dick Visser finished restore of filesender-prod.geant.org, with the exception of user files as these aren't backed up due to privacy issues, the fact this is a demonstration service. |
Total downtime: 5:39 hours.
Current situation
All data on server wordpress1.geant.org restored from backup taken at midnight 2018-02-18 means there was an unrecoverable data loss for everything which where posted between 00:00 till 2pm.
...