7531 – Load balancer crash if network error on first poll

Bug 7531 - Load balancer crash if network error on first poll

Summary: Load balancer crash if network error on first poll

Status:	CLOSED FIXED

Alias:	None

Product:	ThinLinc
Classification:	Unclassified
Component:	VSM Server (show other bugs)
Version:	trunk
Hardware:	PC Unknown

Importance:	P2 Normal
Target Milestone:	4.12.1
Assignee:	Frida Flodin

URL:
Keywords:	nikle_tester, relnotes

Depends on:
Blocks:

Reported:	2020-07-09 10:56 CEST by Pierre Ossman
Modified:	2021-01-21 12:45 CET (History)
CC List:	2 users (show)

See Also:
Acceptance Criteria:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Pierre Ossman cendio

2020-07-09 10:56:47 CEST

If there is a network error on the first poll of an agent then we get this crash:

> 2020-07-08 12:03:05 WARNING vsmserver.loadinfo: [Errno 101] ENETUNREACH talking to VSM Agent tl.cendio.se:904 in request for loadinfo. Marking as down.
> 2020-07-08 12:03:05 ERROR vsmserver: Exception in error handler for <thinlinc.vsm.call_getload.GetLoadCall at 0x7fe3b0206e10>: <type 'exceptions.AttributeError'> loadbalancer Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 240, in handle_error
>     O0ooO0Oo00o = self . handle_known_errors ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/call_getload.py", line 35, in handle_known_errors
>     self . parent . loadbalancer . update_loadinfo ( self . url , None )
>   File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 439, in __getattr__
>     raise AttributeError , attr
> AttributeError: loadbalancer

Unfortunately because of bug 7530 this wedges that agent in a permanently downed state.

Comment 1 Pierre Ossman cendio

2020-07-09 11:01:28 CEST

Also see bug 4243, which is similar but not quite as severe.

I suspect this is getting worse because of bug 4290 as we are likely starting before the network is up now.

Comment 5 Frida Flodin cendio

2021-01-14 16:18:53 CET

Fixed now. Tester need to make sure there are no errors in vsmserver.log when starting vsmserver without network. Make sure that the load update cycle continues as expected. 

Also note that it is bug 7530 that makes sure we don't lose the agent forever, even with the error.

Comment 8 Niko Lehto cendio

2021-01-18 10:42:18 CET

Reproduced this issue on 4.12.0 by utilizing 'unshare'.
Tested on RHEL8 server with nightly (build 6718).

No errors shown in vsmserver.log and the update cycle continues as expected.
Also, relnotes looks good.

Note You need to log in before you can comment on or make changes to this bug.