Incident Management Process (draft)

Establish who are the affected users and stakeholders

A starting input for this list can be the list of stakeholders here: Service Catalogue

Start with one of the previous Incident Reports as a template: Incidents
Save the new Incident Report here as a new child page
Basic information:
- Timeline (how/when it was identified, when service was restored, etc)
- Other information
- Optional future mitigations
If it's taking a long time to resolve the issue we must update the users every 3-4 hours, Linda Ness can probably help/advise with this.

Severity

Data Loss

Service	Start Date	End Date	Severity	Data Loss	Incident Page
DNS	27 Feb 2019	27 Feb 2019	CRITICAL	NO	DNS Outage 2019-02-27
SharePoint	08 Jan 2020	08 Jan 2020	CRITICAL	NO	SharePoint Outage 2020-01-08
SharePoint	17 Jan 2020	17 Jan 2020	MED	NO	RSS Feed in Jobs page Geant.org was down - 17/01/2020
BRIAN	27 Jan 2020	27 Jan 2020	CRITICAL	YES	Brian Outage 2020-01-26
Cacti	06 Mar 2020	10 Mar 2020	CRITICAL	YES	Cacti production incident - 06-03-2020
Cacti	22 Jul 2020	29 Jul 2020	CRITICAL	YES	Cacti Production Instance - July 2020
HAProxy	17 Apr 2021	17 Apr 2021	CRITICAL	NO	Haproxy Outage 2021-03-17
ProxySQL	10 Jul 2021	12 Jul 2021	CRITICAL	YES	ProxySQL Outage 2021-07-12
EMS	12 Mar 2022	14 Mar 2022	CRITICAL	NO	EMS - 2022-03-14 - Service Outage
EMS/DNS	20 Apr 2022	20 Apr 2022	MED	NO	EMS - 2022-04-20 - Service Degradation