Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: update downtime

...

Time (CET)
15:06Foreman triggered vm remove action.
15:35

Dick Visser contacted us about problems with wordpress1.geant.org and filesender-prod.geant.org because the Nagios monitoring that he still runs for these services from the University of Amsterdam alerted that these system are not accessible

15:41

Konstantin Lepikhov confirmed that hosts are missing in VMware

15:44

Konstantin Lepikhov identified the issue and started investigation on VMware cluster

16:43

Konstantin Lepikhov contacted Qaiser in Slack to confirm backup existence

16:44

Dick Visser contacted Qaiser Ahmed on his mobile phone, no answer

16:45

DevOps confirmed that there are no backups or extra copies on VMware storage

17:00

Konstantin Lepikhov called Qaiser Ahmed in Slack, no response.

17:00

Dick Visser confirmed that he has backups on server at Amsterdam university (those are daily backups taken directly by VMs itself).

18:26

Qaiser Ahmed confirmed on #devops channel that whole folder called AMS_UBUNTU on vmware cluster is not backed up and there's no data left.

18:30

Dick Visser recreated new VMs in the VMWare cluster and started the restore process

20:30

Dick Visser restored the backup and brought all sites online.

20:45

Konstantin Lepikhov made an official announcement on the #it and #general Slack channels about the incident and the resolution.

21:00

Dick Visser started restore of filesender-prod.geant.org.

21:50

Dick Visser finished restore of filesender-prod.geant.org, with the exception of user files as these aren't backed up due to privacy issues, the fact this is a demonstration service.

Total downtime: 5:39 hours.

Current situation

All data on server wordpress1.geant.org restored from backup taken at midnight 2018-02-18 means there was an unrecoverable data loss for everything which where posted between 00:00 till 2pm.

...