Bugzilla – Bug 5451
Avoid writing the session database on all session changes
Last modified: 2015-09-30 14:59:11
You need to
before you can comment on or make changes to this bug.
In large clusters with lots of concurrent users, we have a session file that
can grow by about 2-4Kb per user session. Since this file is pickled down to
disk every time sessions are created or closed, this could lead to unnecessary
amounts of disk IO when lots of sessions are created/closed.
The upside to this is that we in case of crashes never lose session data
(unless the crash is caused by writing the file, which means we'll lose the
data for the event that triggered a write).
We should consider moving to a model where a session change event only marks
the file as "dirty" and a timer takes care of writing the file every N seconds.
This will give us a larger window where we can lose session data but it will
mean that ThinLinc can scale to a larger number of concurrent users.
As part of the get_load XMLRPC call, vsmagent appends a list of dead sessions
which vsmserver then can clean up automatically, without having to verify the
session through a proper verify_session call.
vsmserver then loops over the list of dead sessions and calls remove_session
for each dead session. This will trigger a write of the session store for each
reported session. The proposed fix for this bug would make sure that this loop
is no longer a bottleneck.
Time estimation ranges from "review and commit attached patches" to "review and
solve the problem another way".
With the report for the devmeeting written and presented, patches committed,
broken autotests fixed, new autotests added and test runs shows that everything
works as intended, I believe I'm done.
The delay seems to work fine. I tested creating hundreds of sessions and then
killing them. I would then either a) wait for load info to update, or b) verify
the sessions by bringing up their details in tlwebadm. In both cases it only
made a single write some time after the sessions were removed.
I could also see that the database was written directly on each new session, so
the delay was only for removed sessions.
Also verified that it writes on exit. Good enough for me.