...
- When the metarefresh config is changed, for example URLs are added/deleted/changed, the Nagios configuration needs to be changed as well, the Nagios process needs to be reloaded, and a forced check of the services needs to be scheduled.
- The name of the service would be the src URL, but this contains lots of 'illegal' characters as far a Nagios is concerned. A first attempt to fix work around this was by stripping the path from the URL. This means that the name would be just the base URL, which makes sense to admin, but does not (yet?) contain illegal characters. However, this turned out to be a problem because metadata is polled from several sources that reside on the same base URL (http://my.org/saml/idp1, http://my.org/saml/idp2, etc). The base URL is the same in this case, which is also not allowed by Nagios. The final work-around was to append a hash of the URL to the base URL. This way it is easy to distinguish what domain/IdP has an issue, and the string will be unique for each URL
- Because the way metarefresh is implemented, the URL state file is being overwritten for each URL, and only at the end will it yield the full status array. This causes a race condition with the Nagios check, it can happen that the Nagios check for a specific URL happens during the metarefresh, when there is no status information yet on that URL. A work-around was implemented by scheduling a simple cp command at the end the end of metarefresh cronjob. This will make sure the state file will always have all the URLs.
Because the logic of metarefresh is used, the same errors can be detected. This comes down to any connection errors, and all HTTP response codes other than 200 (OK) and 304 (Not Modified).
A few basic tests were done to confirm the functionality, such as having a URL return an HTTP 500 error:
Based on the experiences, some recommendations:
- Using the current metarefresh module has it's drawbacks. Notably, the race condition that exists in writing the state file is not a problem for the Conditional-GET state, is a real problem for the Nagios state file, and it not easy to fix. So it will be worth doing separate checks.
- blah