...
Date | Time | Description | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21:55:53 | First error in indico.log of redis being unavailable: ConnectionError: Error -2 connecting to master.production-events-redis.service.ha.geant.net:6379. Name or service not known. | ||||||||||||||
| 10:42 | First user query about EMS login problem (Slack #general) | ||||||||||||||
| 11:14 | Ian Galpin identified the dns resolution problem
| ||||||||||||||
| 12:06 | Service degradation incident email sent out to product owner (Steffie Bosman) | ||||||||||||||
| 12:12 | Massimiliano Adamo identified a problem with powerDNS PowerDNS
consul DNS resolution seemed to work:
| ||||||||||||||
| 12:30 | Massimiliano Adamo resolved the PowerDNS issue by disabling the packetcache config option:
The following GitHub issue might explain the issue: https://github.com/PowerDNS/pdns/issues/8160 | ||||||||||||||
| 13:01 | Service restored email sent out to product owner (Steffie Bosman) |
...
- Additional monitoring (Sensu checks) will be added added
- These will check that specific hostnames resolve
- This is an action item for DevOps team
- the issue was solved by disabling Packet Cache on PowerDNS, which is enabled by default: https://docs.powerdns.com/recursor/settings.html#disable-packetcache