Bug 7660 - Corrupt session database or HA change database gets overwritten preventing manual recovery
Summary: Corrupt session database or HA change database gets overwritten preventing ma...
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.13.0
Assignee: William Sjöblom
URL:
Keywords: ossman_tester, prosaic
Depends on:
Blocks:
 
Reported: 2021-03-11 16:34 CET by William Sjöblom
Modified: 2024-02-01 14:09 CET (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments

Description William Sjöblom cendio 2021-03-11 16:34:29 CET
The sessions database (/var/lib/vsm/sessions) and HA changes database (/var/lib/vsm/changes) are overwritten if they cannot be read or parsed by vsmserver, making manual recovery of these, potentially valuable files, practically impossible. The consequence of this is loss of user sessions and HA nodes getting out of sync.

Ideally we want to handle this in a better, preferably by terminating and letting the system administrator either remove or recover the file(s) manually.

Steps to reproduce:
> # systemctl stop vsmserver
> # echo "this file is not healthy" > /var/lib/vsm/sessions
> # systemctl start vsmserver
> <start a new session>
(the same applies to /var/lib/vsm/changes)

We currently get one of these error logs in /var/log/vsmserver.log in case of the above scenario:
> End of file reading /var/lib/vsm/sessions - was there an error writing sessions file to disk?
> Error unpacking session database /var/lib/vsm/sessions. No session database loaded
> Error decoding session database /var/lib/vsm/sessions. No session database loaded
> End of file reading /var/lib/vsm/changes - was there an error writing HA changes file to disk?
> Error unpacking HA changes database /var/lib/vsm/changes. No HA changes loaded
> Error decoding HA changes database /var/lib/vsm/changes. No HA changes loaded
Comment 1 Pierre Ossman cendio 2021-04-13 16:48:13 CEST
Note that many errors cause a crash of vsmagent instead of starting with an empty database. See bug 5631.
Comment 4 William Sjöblom cendio 2021-07-15 16:35:22 CEST
We currently have three failure modes when loading the HA and session
database:
1. End of file was reached when parsing the database. This mode can be
   triggered by emptying the database file, for example:
   `> <database file>`
2. The database file cannot be parsed by pickle. This mode can be triggered
   by writing garbage to the database file, for example:
   `echo "ARGHHH!" > <database file>`
3. The database files contain python2 bytestrings with characters outside the
   ASCII-range. This mode can be triggered by writing a pickled non-ASCII string
   to the database file, for example:
   `echo -e "S'\\xf6l'\np0\n." > <database file>`.

For these three failure modes we now expect that `vsmserver` terminates without
overwriting the database and provide helpful messages in the logs explaining the
situation.

Note that the HA and sessions database are separate and defaults to
these two paths:
- Sessions: `/var/lib/vsm/sessions`
- HA: `/var/lib/vsm/changes`

Also, note that the HA database is only loaded when ThinLinc is running with HA
enabled.
Comment 7 William Sjöblom cendio 2021-07-16 09:32:09 CEST
Tested on Fedora 33 and everything seems to work as expected. Marking as resolved.
Comment 10 Pierre Ossman cendio 2021-07-20 11:15:38 CEST
Tested corrupting the databases in various ways on Ubuntu 20.04.

Screwed up data:

> 2021-07-20 09:03:15 ERROR vsmserver: Error loading session database: Error unpacking /var/lib/vsm/sessions (pickle data was truncated)
> 2021-07-20 09:03:15 ERROR vsmserver: Session database needs manual recovery.
> 2021-07-20 09:03:15 ERROR vsmserver: Exiting

Empty file:

> 2021-07-20 09:05:16 ERROR vsmserver: Error loading session database: End of file reading /var/lib/vsm/sessions
> 2021-07-20 09:05:16 ERROR vsmserver: Session database needs manual recovery.
> 2021-07-20 09:05:16 ERROR vsmserver: Exiting

Bad string data:

> 2021-07-20 09:14:51 ERROR vsmserver: Error loading session database: Error decoding string in /var/lib/vsm/sessions ('ascii' codec can't decode byte 0xe3 in position 5: ordinal not in range(128))
> 2021-07-20 09:14:51 ERROR vsmserver: Session database needs manual recovery.
> 2021-07-20 09:14:51 ERROR vsmserver: Exiting
Comment 11 Pierre Ossman cendio 2021-07-20 12:06:42 CEST
Also tested corrupt HA changes file on the same machine.

Screwed up data:

> 2021-07-20 10:05:29 ERROR vsmserver: Error loading HA changes database: Error unpacking /var/lib/vsm/changes (pickle data was truncated)
> 2021-07-20 10:05:29 ERROR vsmserver: HA changes database needs manual recovery.
> 2021-07-20 10:05:29 ERROR vsmserver: Exiting

Empty file:

> 2021-07-20 10:06:18 ERROR vsmserver: Error loading HA changes database: End of file reading /var/lib/vsm/changes
> 2021-07-20 10:06:18 ERROR vsmserver: HA changes database needs manual recovery.
> 2021-07-20 10:06:18 ERROR vsmserver: Exiting
Comment 12 Alvin 2023-02-14 03:42:20 CET
(In reply to William Sjöblom from comment #0)
> The sessions database (/var/lib/vsm/sessions) and HA changes database
> (/var/lib/vsm/changes) are overwritten if they cannot be read or parsed by
> vsmserver, making manual recovery of these, potentially valuable files,
> practically impossible. The consequence of this is loss of user sessions and
> HA nodes getting out of sync.
> 
> Ideally we want to handle this in a better, preferably by terminating and
> letting the system administrator either remove or recover the file(s)
> manually.
> 
> Steps to reproduce:
> > # systemctl stop vsmserver
> > # echo "this file is not healthy" > /var/lib/vsm/sessions
> > # systemctl start vsmserver
> > <start a new session>
> (the same applies to /var/lib/vsm/changes)
> 
> We currently get one of these error logs in /var/log/vsmserver.log in case
> of the above scenario:
> > End of file reading /var/lib/vsm/sessions - was there an error writing sessions file to disk?
> > Error unpacking session database /var/lib/vsm/sessions. No session database loaded
> > Error decoding session database /var/lib/vsm/sessions. No session database loaded
> > End of file reading /var/lib/vsm/changes - was there an error writing HA changes file to disk?
> > Error unpacking HA changes database /var/lib/vsm/changes. No HA changes loaded
> > Error decoding HA changes database /var/lib/vsm/changes. No HA changes loaded

Hi what is the solution for 
End of file reading /var/lib/vsm/sessions - was there an error writing sessions file to disk?
when the vsmserver failed to load because of this and the /var/lib/vsm/sessions file is empty?
Comment 13 Samuel Mannehed cendio 2024-02-01 14:09:44 CET
Hi Alvin, sorry that your question got forgotten and that we have not given you an answer. This bugzilla isn't the ideal place for discussing how to recover from errors.

I realize that more than a year has passed, so if you're still having problems — please create a post in our community: https://community.thinlinc.com/

Note You need to log in before you can comment on or make changes to this bug.