What is Health Monitoring?
Introduction
Similar to your actual health, health monitoring is important to maintain uptime and quality of services. With many platforms boasting “health monitoring,” you may ask: why would I even need this over traditional status monitoring?
The answer is simple: health monitoring can allow for actions to be taken if, for example, an origin has gone offline. Such “actions” can prevent users from noticing, or experiencing downtime. Health checks also permit proactive actions to be taken: an example could be a server whose load (CPU utilization) is rapidly rising — a single health check can detect this trend and move new users to another server (or even deploy more resources on scalable platforms).
Whether a health check detects an origin going offline, or a server's CPU usage rising too quickly, health checks play an important role in load balancing and failover checks.
Types of “Health Checks”
Health monitoring uses a variety of checks to monitor availability to server load:
- Server Agents (webhooks)
With “server agents,” monitoring occurs with a small program configured to run either as a daemon or under a timed CRON job. The program can check many different pieces of information on a server; for example, RAM usage, CPU usage — even disk usage. More advanced monitoring agents can even forward logs and interact with other software on the server to determine whether additional (or occasionally less) hardware is required to maintain a service.
The one disadvantage of such health checks is that they require the use of custom (or 3rd party) software on a server. While this can potentially be mitigated through the use of Docker, or another transparent container solution, it requires the most work to configure and is often chosen only if HTTP/ping requests are insufficient.
- Ping or HTTP requests
With “ping” checks and/or “HTTP requests,” the provider will send out occasional “bumps” to origins to ensure that they are “alive.” This is often configurable as well; with some health monitoring services, “ping” tests can be sent from multiple locations around the world. Others even offer the ability to send custom HTTP requests to origin endpoints to ensure that they are still working.
Either way, this option is the easiest to configure and requires that the CDN, or provider, keep monitoring for any sudden downtime.
Uses for Health Monitoring
As mentioned previously, health monitoring offers a plethora of benefits to webmasters:
- Failover
- When an origin is detected to be offline, it can be removed from rotation and all subsequent requests will be routed to remaining servers.
- Load Balancing
- When an origin is detected to be offline (e.g. the monitoring agent is no longer responding), subsequent requests can be rerouted
- When an agent returns a higher than normal load, or underutilization, servers can be pulled/deployed to meet demand
- If existing servers are able to handle the load, a load balancer can begin to distribute requests to servers under a “list” to ensure a user’s quality of service
Conclusion
Health monitoring plays an important role in maintaining the quality of services for users. Even when services are pushed to their limits, high-quality monitoring will allow for appropriate scaling — furthermore, simpler configurations of health checks (ping, HTTP GET, etc.) can still ensure that users are never routed to a server that is offline or unable to process a request.