...
Two productions dashboard servers exist but users access a specific dashboard via a DNS CNAME dashboard-primary.geant.net. If the server to which this the CNAME refers fails for any reason, an operator needs to manually adjust the value of the CNAME to refer the the point to he other dashboard.
Crowd authentication
Many systems, such as JIRA and Dashboard, depend on the Crowd server for user authentication and access control, which has failed occasionally in past and prevented user access to Dashboard or JIRA. Providing a second Crowd server which can be failed-over to is relatively straight-forward (indeed, the uat-crowd server is configured identically to the prod-crowd) but failing-over to it is still a manual task in the current infrastructure.
A number of other services, such as cacti, poller, compendium, and generally most of the newer servers have redundant copies deployed (as 01 and 02 production servers) but
Two Crowd servers exists (prod-crowd, and uat-crowd) and contain identical information, but systems (such as Dashboard, JIRA, and others) are configured to use one or the other for authentication.
...
Provide a zero downtime scheduled maintenance framework: in a system which employs redundant services, maintenance should can be scheduled such that there is no loss of service, and that users are unaware that maintenance is on-going.
...