Bug 7312 - support replacing a HA node on a system with sessions
Summary: support replacing a HA node on a system with sessions
Status: NEW
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: 1.3.1
Hardware: PC Unknown
: P2 Normal
Target Milestone: LowPrio
Assignee: Bugzilla mail exporter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-18 15:42 CET by Pierre Ossman
Modified: 2022-04-26 07:44 CEST (History)
0 users

See Also:
Acceptance Criteria:


Attachments

Description Pierre Ossman cendio 2019-02-18 15:42:44 CET
Our current high availability support only handles when a node disappears for a while, then comes back unharmed. I.e. we have no method of syncing information that was lost.

A common scenario would be one node being completely destroyed (fire, file system corruption, etc.). The node would then be replaces by a fresh install of ThinLinc. The system should then be able to re-populate this fresh node with data from the surviving node.

As a bonus it would be nice to detect lost sessions as part of a crash and sync those as well (or trigger a complete resync).
Comment 1 Pierre Ossman cendio 2019-02-18 15:44:05 CET
A workaround would be to temporarily stop vsmserver on the surviving node and copying over /var/lib/vsm/sessions to the freshly installed node.

That will still risk some noise in the logs though as the surviving node will try to push changes that are already in the session database as it tries to get rid of its HA backlog.
Comment 3 Pierre Ossman cendio 2022-04-26 07:44:32 CEST
We've actually documented this workaround in the TAG since 2005:

https://www.cendio.com/resources/docs/tag/HA-recover.html#recovering-from-catastrophic-failure

Note You need to log in before you can comment on or make changes to this bug.