Chapter 6. High Availability (HA)
Table of Contents
- 6.1. Overview
- 6.2. Configuration of ThinLinc for HA Operations
- 6.3. Validating HA Operation
- 6.4. Testing Correct Failover Behavior
- 6.5. Recovering from hardware failures
- 6.6. ThinLinc HA and Other Services
- 6.6.1. User database
- 6.6.2. Home Directories and other File Resources
- 6.6.3. Printing
- 6.7. Detailed Instructions on Heartbeat Configuration
This chapter describes how to setup ThinLinc with High Availability (from now on referred to as "HA") for the VSM server, providing protection against the single point of failure that the hardware running the VSM server normally is.
This chapter describes the HA implementation avabilable in ThinLinc version 1.4 and later. The HA implementation in ThinLinc 1.2 and 1.3 is deprecated.
The basic principle behind this setup is to have two equal machines, both capable of running VSM server. If one of the machines goes down for some reason, the other machine will take over and serve VSM server requests with no or short interruption of service.
In a standard ThinLinc setup, there is a single point of failure - the machine running the VSM server. If the VSM server is down, no new ThinLinc connections can be made, and reconnections to existing sessions can't be established. Existing connections to VSM agent machines still running will however continue to work. A ThinLinc cluster of medium size with one machine running as VSM server and three VSM agent machines is illustrated in Figure 6.1
Here the incoming connections are handled by the VSM server which distributes the connections to the three VSM agent machines. If the VSM server goes down, no new connections can occur. The VSM server is a single point of failure.
In order to eliminate the single point of failure, we configure the VSM server in a HA configuration where two machines share the responsibility for keeping the service running.
The two machines are in constant contact with each other, each checking if the other one is up and running. If one of the machines goes down for some reason, for example hardware failure, the other machine detects the failure and automatically takes over the service with only a short interruption for the users. No action is needed from the system administrator.
In a HA setup, as illustrated in Figure 6.2 two equal machines are used to keep the VSM server running. One of the machines is primary, the other one is secondary. The primary machine is normally handling VSM server requests, but if it fails, the secondary machine kicks in. When the primary machine comes online again, it takes over again. That is, in normal operation, it's always the primary machine that's working, the secondary is just standby, receiving information from the primary about new and deleted sessions, maintaining its own copy of the session database.
Both machines have an unique hostname and an unique IP address, but there is also a third IP address that is active only on the node currently responsible for the VSM server service. This is the so called HA address, the address the clients are connecting to.
The software used by the machines to keep track of the other machine's status is Heartbeat, the industry standard for HA on Linux, also available for Solaris.

