Bug 5489

Summary: Verify all sessions on an agent in one call
Product: ThinLinc Reporter: Karl Mikaelsson <derfian@cendio.se>
Component: VSM ServerAssignee: Henrik Andersson <hean01@cendio.se>
Status: CLOSED FIXED QA Contact: Bugzilla mail exporter <bugzilla-qa@cendio.se>
Severity: Normal    
Priority: P2 CC: ossman@cendio.se, samuel@cendio.se, thoni56@cendio.se
Version: 2.0.0Keywords: relnotes
Target Milestone: 4.8.0   
Hardware: PC   
OS: Unknown   
Acceptance Criteria:

Description From cendio 2015-04-08 13:35:24
We've got quite a few problems with the number of session verification calls
that's needed to work with the default 10 minute session_update_delay setting
in clusters with lots of users.

Instead of verifying all sessions with individual calls to agents per session,
we could do better by adding a XMLRPC call/handler that checks all sessions on
an agent at the same time.

This approach could cut down the number of calls required during each
session_update_delay from the number of sessions to the number of agents.
------- Comment #1 From cendio 2016-08-10 14:06:37 -------
There is a consensus about a principal design which involves one job on the
server that iterates over the sessions in the session database and groups
sessions per agent and then asks each agent to verify its sessions.
------- Comment #7 From cendio 2016-12-19 14:01:22 -------
The test TestSessionOnRemovedAgent caught a change in behaviour in the new
code; we now verify existing sessions right away after the server starts.
Previously we would do so after a delay.

We need to decide if we want to keep this behaviour or not. One possible
problem could be that we are racing with the agent when starting up and might
mark those sessions as unverified.
------- Comment #9 From cendio 2016-12-19 14:18:08 -------
We've also failed to implement VerifySessionsCall.handle_known_errors(), which
the test TestSessionOnDeadAgent has detected.
------- Comment #45 From cendio 2017-01-19 16:56:18 -------
Works well. We've tested:

 - Periodic check: alive, dead, timeout
 - Reconnect: alive, dead
 - Shadow: alive, dead
 - HA: no scenario found (see bug 6146), we have unit tests though
 - tlwebadm: alive, dead, connect/disconnect
------- Comment #46 From cendio 2017-01-20 10:47:48 -------
Also checked socket usage and it seems to be doing fine on a single port per
agent (or less). I had a master/agent pair and 100 sessions on the agent. Only
port 1023 was used on the master.