You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

The following describes structure and facilities that provide service resilience and reliability in GÉANT.

Components

Consul: provides service discovery

Infoblox: DNS for sevices

Install the consul agent on all servers which have services to be discovered

Have a quorum of consul servers

Consul server creates/generates DNS zonefile which is pushed to infoblox.

Problems being addressed

A number of customer facing services have single points of failure, or where redundancy has been employed, manual intervention is required to access a specific redundant copy, such as where an end-user looks at either the 01 or 02 version of a service, or one of these alternative services is accessed via DNS CNAME entry which needs to updated manually.  This is also true of infrastructure services, such as databases, authentication servers or other dependent services.

Examples:

  • Primary and Backup dashboard: prod-newdboard01 and prod-newdboard02. A CNAME - dashboard-primary - points at either the 01 or 02 instance. 
  • Crowd authentication: prod-crowd and uat-crowd contain identical information, but systems (such as jira) are configured to use one or the other.

Generally, all service deployment can be done in the context of redundant services providing high availability.  The impact to SWD is very low. Services should be deployed such that they are essentially the primary in every case.  

Goal

Automate service failover

Create a scalable infrastructure: deploy services independent on location.  Services should auto-register and be discoverable, and auto-deregister.

Minimise downtime through redundancy

Automate service recovery tolerant of hardware failure or outages

Structure

Each server will run the consul agent and include a config listing the services it runs and how to monitor them (to test if they are serviceable).  This should be maintained in puppet.

The consul servers create, update and push a DNS zone file.  This should occur frequently enough as is reasonable to minimise a service DNS query miss.  Perhaps every 5 minutes or more frequently?

 

  • No labels