Purpose of the Document
GÉANT presents a large number of services which are used by the community and internally. These publicly exposed services have, in most cases, further dependencies on hidden services such as authentication systems and databases. The expectation is that these services should be available 100% but in actuality they occasionally fail.
The following describes structure and facilities that provide service resilience and reliability in GÉANT.
Components
Consul: provides service discovery
Infoblox: DNS for sevices
Install the consul agent on all servers which have services to be discovered
Have a quorum of consul servers
Consul server creates/generates DNS zonefile which is pushed to infoblox.
Problems being addressed
A number of customer facing services have single points of failure, or where redundancy has been employed, manual intervention is required to access a specific redundant copy, such as where an end-user looks at either the 01 or 02 version of a service, or one of these alternative services is accessed via DNS CNAME entry which needs to updated manually. This is also true of infrastructure services, such as databases, authentication servers or other dependent services.
...
Generally, all service deployment can be done in the context of redundant services providing high availability. The impact to SWD is very low. Services should be deployed such that they are essentially the primary in every case.
...
Design goals
Provide an infrastructure which supports automatic service failover.
...
The consul servers create, update and push a DNS zone file. This should occur frequently enough as is reasonable to minimise a service DNS query miss. Perhaps every 5 minutes or more frequently?
Components
Consul: provides service discovery
Infoblox: DNS for sevices
Install the consul agent on all servers which have services to be discovered
Have a quorum of consul servers
Consul server creates/generates DNS zonefile which is pushed to infoblox.