Incident description
The server that runs all the Wordpress site (wordpress1.geant.org) became unreachable at 12:10:59 CET
Incident severity: CRITICAL
Data loss: NO
Monitoring alerted: YES
Timeline
Time (CET) | |
---|---|
12:10 | Apache server stop accepting incoming requests |
12:12 | Chris Atherton reported on #it channel that site aac-project.eu is not working correctly |
12:21 | Konstantin Lepikhov confirmed the issue with wordpress1 site on #devops channel |
12:23 | Dick Visser connected to VM via console and confirmed that network is down (gateway not reachable) |
12:29 | Massimiliano Adamo have restarted network service inside VM, after that everything started working and network came up. |
12:30 | Konstantin Lepikhov announced that problem fixed. |
Total downtime: 20 minutes.
Analysis
As part of BAU and the handover of his responsibilities, Dick Visser was working on migrating a VM from the University of Amsterdam, into the GEANT VMware cluster in Frankfurt.
...
The reason for this needs to be further investigated, because DRS moving VMs around is common practise, and this should not impact VMs at all.
Logs/screendump
DRS migration:
Monitoring alerted: YES