ELI5: what is failover?

When something important is on the line, there are usually safety nets to make sure the right things happen.

Skydivers often have back-up parachutes, lifts have emergency brakes, and important computer and networking systems have a failover process.

But what exactly is failover, how does it work, and why is it so important? Here’s an ELI5 overview.

What is failover?

Failover is a procedure in which a fault or failure in one system results in the transfer of control to a backup, redundant, or standby system. This backup system might be a redundant server, a hardware component, or a network, for instance.  

In other words, when a component involved in running a system fails, a failover happens. The system is then smoothly passed to a backup component. Typically, a failover switch happens automatically, without warning.

Failover capability is incredibly important for mission-critical systems, or any system that needs 24/7 availability and high reliability.

A failover is a bit like your spare parachute. It’s there to avert disaster.

How does it work?

Failover automation works, most commonly, by using a “heartbeat” system that connects two systems. (The main system, and the failsafe one.)

As long as there’s a ‘pulse’ or signal between the main system and the second, backup system, the second system will remain offline. When something does go wrong with the pulse of the first component, the second will take over.

Some failover systems are intentionally non-automatic. Instead, they’re what’s called “automated with manual approval”. This means that the failover needs human authorisation before the back-up system can take over. (Which it then will do automatically.)

What is failover used on?

Failover is commonly used to provide back up for servers and networks, particularly when they have a need for 24/7 availability.

For instance, a SaaS provider may benefit from failover if the server hosting their software goes down. The server has failed, but the software won’t, because the backup server is on the case.

But failover procedures are not limited to big systems. You can have a failover set on a personal device, too. For instance, such a device could have a trigger to protect itself should a processor or even a battery cell fail.

Why is failover important?

For many, keeping extra, redundant parts around ‘just in case’ might seem like a waste of resources; an undue financial burden. So, what makes having a failover procedure ready and waiting worth the costs?

The main reason is that a failover provides safety and security. It’s an extra defence against unplanned, disruptive downtime, and complete system failure. So, it reduces (or even removes altogether) the impact of a failure on your customers and/or users.

In other words, your teams won’t need to manage a catastrophic outage because your primary server failed.

Plus, because your system doesn’t go down, neither does your cyber security precautions. So, a failover can help to protect databases and the information stored within.

What is failover?

To sum up, a failover is an automatic process that happens when a primary system fails or stops, for whatever reason. It allows a backup system or component to take over crucial tasks and keep important services running smoothly.

So, does your system need a failover procedure set in place?

Useful links

The EU AI guidelines explained

Should we leave automation to the workplace?

Using automation for cyber protection