Bug 7531 - Load balancer crash if network error on first poll
Summary: Load balancer crash if network error on first poll
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.12.1
Assignee: Frida Flodin
URL:
Keywords: nikle_tester, relnotes
Depends on:
Blocks:
 
Reported: 2020-07-09 10:56 CEST by Pierre Ossman
Modified: 2021-01-21 12:45 CET (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Pierre Ossman cendio 2020-07-09 10:56:47 CEST
If there is a network error on the first poll of an agent then we get this crash:

> 2020-07-08 12:03:05 WARNING vsmserver.loadinfo: [Errno 101] ENETUNREACH talking to VSM Agent tl.cendio.se:904 in request for loadinfo. Marking as down.
> 2020-07-08 12:03:05 ERROR vsmserver: Exception in error handler for <thinlinc.vsm.call_getload.GetLoadCall at 0x7fe3b0206e10>: <type 'exceptions.AttributeError'> loadbalancer Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 240, in handle_error
>     O0ooO0Oo00o = self . handle_known_errors ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/call_getload.py", line 35, in handle_known_errors
>     self . parent . loadbalancer . update_loadinfo ( self . url , None )
>   File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 439, in __getattr__
>     raise AttributeError , attr
> AttributeError: loadbalancer

Unfortunately because of bug 7530 this wedges that agent in a permanently downed state.
Comment 1 Pierre Ossman cendio 2020-07-09 11:01:28 CEST
Also see bug 4243, which is similar but not quite as severe.

I suspect this is getting worse because of bug 4290 as we are likely starting before the network is up now.
Comment 5 Frida Flodin cendio 2021-01-14 16:18:53 CET
Fixed now. Tester need to make sure there are no errors in vsmserver.log when starting vsmserver without network. Make sure that the load update cycle continues as expected. 

Also note that it is bug 7530 that makes sure we don't lose the agent forever, even with the error.
Comment 8 Niko Lehto cendio 2021-01-18 10:42:18 CET
Reproduced this issue on 4.12.0 by utilizing 'unshare'.
Tested on RHEL8 server with nightly (build 6718).

No errors shown in vsmserver.log and the update cycle continues as expected.
Also, relnotes looks good.

Note You need to log in before you can comment on or make changes to this bug.