Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Incident Management Process (draft)

Establish who are the affected users and stakeholders

  • A starting input for this list can be the list of stakeholders here: Service Catalogue

Communicate information about the incident to the affected users and stakeholders

  • Do this before taking any other action

The relevant team members should look into the issue

  • First priority is to restore service

Create an Incident Report

  • Start with one of the previous Incident Reports as a template: Incidents
  • Save the new Incident Report here as a new child page
  • Add the incident page to the Index table at the bottom of the page
  • Basic information:
    • Timeline (how/when it was identified, when service was restored, etc)
    • Other information
    • Optional future mitigations
  • If it's taking a long time to resolve the issue we must update the users every 3-4 hours, Linda Ness can probably help/advise with this.

Index

Severity

  • Status
    colourRed
    titleCritical
     Complete service outage
  • Status
    subtletrue
    colourYellow
    titleMed
    Partial service degradation
  • Status
    colourBlue
    titleLow
     
    Virtually no user impact

Data Loss

  • Status
    colourRed
    titleYes
     Data has been lost
  • Status
    subtletrue
    colourBlue
    titleNo
     No data was lost
ServiceStart Date
App EffectedCauseResultJuly 29, 2020Cacti - unavailableIT decommissioned a server that was required by Crowd.Server was brought back up
End DateSeverityData LossIncident Page
WordPress

 

 

Status
colourRed
titleCritical

Status
colourRed
titleYes

Production wordpress site outage 2018-02-13
WordPress

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

Production wordpress site outage 2018-02-22
WordPress

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

Production wordpress site outage 2018-03-25
Dashboard

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

Production Dashboard Outage 2018-06-18
Staff IDP

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

Sympa

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

Production Sympa Service Outage 2018-08-03
Dashboard

 

 

Status
colourRed
titleCritical

Status
colourRed
titleYes

Production Dashboard Outage 2018-07-11
DNS

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

DNS Outage 2019-02-27
SharePoint

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

SharePoint Outage 2019-02-07
Dashboard

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

Production Dashboard Outage 2019-07-16
Dashboard

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

Production Dashboard Outage 2019-07-27
SharePoint

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

SharePoint Outage 2020-01-08
SharePoint

 

 

Status
subtletrue
colourYellow
titleMed

Status
subtletrue
colourBlue
titleNo

RSS Feed in Jobs page Geant.org was down - 17/01/2020
BRIAN

 

Status
colourRed
titleCritical

Status
colourRed
titleYes

Brian Outage 2020-01-26
Cacti

 

 

Status
colourRed
titleCritical

Status
colourRed
titleYes

Cacti production incident - 06-03-2020
Cacti

 

 

Status
colourRed
titleCritical

Status
colourRed
titleYes

Cacti Production Instance - July 2020
HAProxy

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

Haproxy Outage 2021-03-17
ProxySQL

 

 

Status
colourRed
titleCritical

Status
colourRed
titleCritical

ProxySQL Outage 2021-07-12
EMS

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

EMS - 2022-03-14 - Service Outage
EMS(DNS)

 

 

Status
subtletrue
colourYellow
titleMed

Status
subtletrue
colourBlue
titleNo

EMS - 2022-04-20 - Service Degradation
Dashboard

 

 

Status
colourRed
titleCritical

Status
colourRed
titleYes

Production Dashboard - 2022-05-15 - Service Outage
PostgreSQL(VMWare)

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

PostgreSQL - 2022-05-30 - Wide-scale Service Outage
BRIAN

 

 

Status
colourRed
titleCritical

Status
subtletrue
colourBlue
titleNo

BRIAN - 2022-05-30/31 - Service Outage
BRIAN

 

 

Status
colourRed
titleCritical

Status
colourRed
titleYes

BRIAN - 2023-02-26/27 - Service Outage
BRIAN

 

 

Status
subtletrue
colourYellow
titleMed

Status
subtletrue
colourBlue
titleNo

BRIAN 2023-11-16/17 Data Collection Outage


All Incident Documents

Children Display
sortcreation
reversetrue