Page History

...

Date	Time	Notes
06/03/2020	11:21	First critical alert received. Decision to review and see how fast the partition review would consume space
06/03/2020	17:10	Alert was reviewed again and found to be consuming more space than expected
06/03/2020	17:30	Logged on and added new disk via VMware UI. Logged onto server and attempted to extend the existing LVM in the ususal manner. The server produced errors when the physical volume was created with a message about a missing UUID which pvs confirmed. Remediation to retrieve the situation were unsuccessful and a reboot was requested to confirm if a device rescan would fix the issue or provide more information.
06/03/2020	19:54	Emergency ticket to perform a reboot at 21:00 and was approved by NOC.
06/03/2020	21:00	Unfortunately the VM did not boot so we were forced to restore from backup. Mutliple fsck options were tried but were not successful
06/03/2020	21:30	Restore from backup was requested.
06/03/2020	22:25	After issues with the restore a good VM version was restored and booted.
06/03/2020	22:30	Investigated the mass of relay logs in /var/lib/mysql
06/03/2020	22:33	Logged in to mysql on vie and reset the id default value in data_template_data_rra, poller_item, data_input_data tables
06/03/2020	22:53	Logged into prod-cacti01-fra-de.geant.net to fixe the replication break from restore showing older binlog entry than expeted.
06/03/2020	23:14	Notified NOC that this fix will need review on Monday but that replication was fixed. No notification at this point that anything was broken.
		Cacti runs "unison" to perform a two ways synchronization. Unison stopped working the first time as a consequence of the filesystem corruption, and didn't work with the restored system, because the two VMs were not in sync. We have removed the DB created by Unison and started Unison from scratch on both systems and the sync started working again.

Page tree

Versions Compared

Old Version 5

New Version 6

Key