Incidents

Created by Konstantin Lepikhov, last modified by Ian Galpin on May 05, 2022

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Incident Management Process (draft)

Establish who are the affected users and stakeholders

A starting input for this list can be the list of stakeholders here: Service Catalogue

Communicate information about the incident to the affected users and stakeholders

Do this before taking any other action

The relevant team members should look into the issue

First priority is to restore service

Create an Incident Report

Start with one of the previous Incident Reports as a template: Incidents
Save the new Incident Report here as a new child page
Basic information:
- Timeline (how/when it was identified, when service was restored, etc)
- Other information
- Optional future mitigations
If it's taking a long time to resolve the issue we must update the users every 3-4 hours, Linda Ness can probably help/advise with this.

Index

Severity

CRITICAL Complete service outage
MED Partial service degradation
LOW Virtually no user impact

Data Loss

YES Data has been lost
NO No data was lost

Service	Start Date	End Date	Severity	Data Loss	Incident Page



EMS	12 Mar 2022	14 Mar 2022	CRITICAL	NO	EMS - 2022-03-14 - Service Outage
EMS	20 Apr 2022	20 Apr 2022	MED	NO	EMS - 2022-04-20 - Service Degradation
HAProxy	17 Apr 2022	17 Apr 2022			Haproxy Outage 2021-03-17

No labels