...
Date | Time | Description | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21:55:53 | First error in indico.log of redis being unavailable: ConnectionError: Error -2 connecting to master.production-events-redis.service.ha.geant.net:6379. Name or service not known. | |||||||||||||||
| 10:42 | First user query about EMS login problem (Slack #general) | |||||||||||||||
| 11:14 | Ian Galpin identified the dns resolution problem
| |||||||||||||||
| 12:06 | Service degradation incident email sent out to product owner (Steffie Bosman) | |||||||||||||||
| 12:12 | Massimiliano Adamo identified a problem with PowerDNS
consul DNS resolution seemed to work:
| |||||||||||||||
|
| 12:30 | Massimiliano Adamo resolved the PowerDNS issue by disabling the packetcache config option:
The following GitHub issue might explain the issue: https://github.com/PowerDNS/pdns/issues/8160 |
|
|
| |||||||||||
| 13:01 | Service restored email sent out to product owner (Steffie Bosman) |
...
- Additional monitoring (Sensu checks) will be added added
- These will check that specific hostnames resolve
- This is an action item for DevOps team
- the issue was solved by disabling Packet Cache on PowerDNS, which is enabled by default: https://docs.powerdns.com/recursor/settings.html#disable-packetcache