Bug 4429 - Fix the load balancer
Summary: Fix the load balancer
Status: NEW
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: 3.4.0
Hardware: PC Unknown
: P2 Normal
Target Milestone: MediumPrio
Assignee: Peter Åstrand
URL:
Keywords: focus_loadbalancer
Depends on: 1174 4771 5268
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-15 15:46 CEST by Aaron Sowry
Modified: 2023-12-01 14:28 CET (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Aaron Sowry cendio 2012-10-15 15:46:25 CEST
I'm sure there was already at least one bug for this, but I can't find anything relating to the general problem (although see bugs #2196 and #1174).

Our load balancing algorithm is not great. There are a number of problems:

1) Bogomips is a strange way to measure CPU performance. Throw in things like hyperthreading and it becomes even more problematic.

2) The general algorithm needs to be reviewed. Just because one server can support 4000 more sessions and another can only support 1000 more, doesn't mean that we should never start sessions on the weaker server. This also assumes that our rating figure is meaningful in this regard.

3) The existing_users_weight parameter is backwards, i.e. the higher the value the less each user matters.

4) Load is also affected by I/O, which isn't necessarily relevant to what we're checking
Comment 2 Aaron Sowry cendio 2012-10-23 16:47:01 CEST
Moving to NearFuture, so that we remember to revisit this after 4.0.0. See issue 13747.
Comment 3 Pierre Ossman cendio 2015-03-18 15:27:50 CET
(In reply to comment #0)
> 1) Bogomips is a strange way to measure CPU performance. Throw in things like
> hyperthreading and it becomes even more problematic.
> 

Bug 4771.

> 4) Load is also affected by I/O, which isn't necessarily relevant to what we're
> checking

As mentioned, bug 1174.
Comment 4 Pierre Ossman cendio 2015-03-18 15:31:36 CET
See also bug 5268. It has a rough prototype for changing the load balancer to simply pick the agent with the fewest number of users (not sessions, nor thinlinc users) on it. Note that it needs work as it doesn't consider varying machine capabilities, lots of logins in a short time, nor putting all sessions for a single user on the same agent.

(there is also still the fundamental question of what the basic principle of the load balancer should be)
Comment 7 Pierre Ossman cendio 2017-06-12 11:14:13 CEST
We've had some more internal discussion about this, and we've tried to summarise the issues and feedback we've gotten:

 * It's difficult to understand (and configure)
 * It can be overly lopsided if servers differ in (perceived) capacity
 * It doesn't spread risk
 * Some would like their own, arbitrary conditions for selecting agents

Our current system is based on the principle of giving every user as much resources as possible, but it assumes a) that the system measures everything relevant, b) the admin knows the resource usage and configures it accordingly.

Changes that could be made:

 * The systems tunes itself (addresses b)
 * Equal number of sessions (or users) per agent (addresses a, or balance risk instead of load)
 * Weighted number of sessions per agent (compromise between current model and simpler one)
 * Allow a user script to select the agent (let the customer solve the problem)
Comment 9 Aaron Sowry cendio 2019-08-21 02:20:08 CEST
A point of interest, perhaps: a comment from X2Go's config file about how their load values are calculated.

# The load factor calculation uses this algorithm:
#
#                  ( memAvail/1000 ) * numCPUs * typeCPUs
#    load-factor = -------------------------------------- + 1
#                        loadavg*100 * numSessions
#
# (memAvail in MByte, typeCPUs in MHz, loadavg is (system load *100 + 1) as
# positive integer value)
#
# The higher the load-factor, the more likely that a server will be chosen
# for the next to be allocated X2Go session.

We also have seen evidence that some customers are using cgroups to limit resource usage per-user. This may be worth thinking about with regards to load-balancing too, as it offers a certain degree of predictability about future resource consumption.
Comment 13 Pierre Ossman cendio 2020-10-09 09:56:20 CEST
Also note bug 284, which may or may not be relevant depending on what happens here.
Comment 17 Peter Wirdemo 2022-01-30 11:33:21 CET
I don't think that one "load balancer strategy" will fit all customers.

You could implement a couple of different and let the customer choose which to use.
Implement a variable loadbalance_strategy in vsmserver.hconf.
This could contain a single or a list of strategies

loadbalance_strategy=usercount,default

If the "best" agents have the same usercount, use the default strategy to select the best agent among these equal agents.

The loadbalance_strategy could be default(The current version), X2GO, usercount, memusage or others "standards" like round-robin, least-connection, source-ip-hash,load

As a bonus I would like to be able to add "custom" where the customer supplies a script(s) to be executed on agents. 

loadbalance_strategy=custom,default
loadbalance_custom=/usr/local/bin/myloadbalance.pl,/usr/local/bin/myotherbalancer.pl

Note You need to log in before you can comment on or make changes to this bug.