Incident Management Process (draft)
Establish who are the affected users and stakeholders
- A starting input for this list can be the list of stakeholders here: Service Catalogue
Communicate information about the incident to the affected users and stakeholders
- Do this before taking any other action
The relevant team members should look into the issue
- First priority is to restore service
Create an Incident Report
- Start with one of the previous Incident Reports as a template: Incidents
- Save the new Incident Report here as a new child page
- Basic information:
- Timeline (how/when it was identified, when service was restored, etc)
- Other information
- Optional future mitigations
- If it's taking a long time to resolve the issue we must update the users every 3-4 hours, Linda Ness can probably help/advise with this.
Index
Severity
- CRITICAL Complete service outage
- MED Partial service degradation
- LOW Virtually no user impact
Data Loss
- YES Data has been lost
- NO No data was lost
Service | Start Date | End Date | Severity | Data Loss | Incident Page |
---|---|---|---|---|---|
EMS |
|
| CRITICAL | NO | EMS - 2022-03-14 - Service Outage |
EMS |
|
| MED | NO | EMS - 2022-04-20 - Service Degradation |
HAProxy |
|
| Haproxy Outage 2021-03-17 | ||