Bug 2670 - Don't list objects as writable unless neccesary - avoid complex select loop
Summary: Don't list objects as writable unless neccesary - avoid complex select loop
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: trunk
Hardware: PC Linux
: P2 Normal
Target Milestone: 4.16.0
Assignee: Pierre Ossman
URL:
Keywords: linma_tester, prosaic
Depends on:
Blocks:
 
Reported: 2008-01-29 19:22 CET by Erik Forsberg
Modified: 2023-11-28 09:31 CET (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Erik Forsberg cendio 2008-01-29 19:22:37 CET
With VSM Server rev 14337 and earlier, at high load and with some
bad luck, the following traceback was sometimes triggered:

2008-01-29 11:15:51 ERROR vsmserver.extcmd: Unhandled exception getting list of groups for user13: exceptions.NotImplementedError Unhandled write event on <thinlinc.vsm.extproc.GetUserGroups instance at 0x8e37b6c>:['/opt/thinlinc/libexec/tl-initgroups', u'user13']:35 Traceback (most recent call last):
  File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 115, in iii1i
    obj . handle_write_event ( fd )
  File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 317, in handle_write_event
    raise NotImplementedError ( "Unhandled write event on %s:%s:%d" % ( repr ( self ) , repr ( self . args ) , fd ) )
NotImplementedError: Unhandled write event on <thinlinc.vsm.extproc.GetUserGroups instance at 0x8e37b6c>:['/opt/thinlinc/libexec/tl-initgroups', u'user13']:35
.
2008-01-29 11:15:51 ERROR vsmserver: Killing process with pid 3486
Traceback (most recent call last):
  File "/opt/thinlinc/sbin/vsmserver", line 20, in ?
    VSMServer ( sys . argv )
  File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 141, in __init__
    self . loop ( )
  File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 381, in loop
    self . _select_loop ( oO000O0oo )
  File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 245, in _select_loop
    iii1i ( oO0 , OOOo0OO000 )
  File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 120, in iii1i
    obj . handle_error ( fd )
  File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 351, in handle_error
    self . handle_close ( fd )
  File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 282, in handle_close
    self . handle_exit ( )
  File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 440, in handle_exit
    Ii11Ii = [ int ( i1iiIII ) for i1iiIII in self . stdout_data . strip ( ) . split ( " " ) ]
ValueError: invalid literal for int(): 

The root cause for this problem was that the following could happen:

 1) The list of readable fd's as returned by select.select was iterated over.

 2) A readable object with fd X registered that was also in the
    writable list would find that it was no longer needed, and
    unregister itself.
 
 3) Another readable object would start up a third object, that
    registered its FD, and that fd was by bad luck X.

 4) When the list of writable fd's were iterated over, the wrong
    object would be found when looking up fd X in fd_map, and if that
    object was not equipped with a handle_write function, the
    traceback would occur.

A good long-term solution to this is to make sure that the writable()
function only returns True when absolutely neccesary. As a short term
solution, I've made unregister_fd remove fd's that are unregistered
from the list of writable objects.
Comment 2 Pierre Ossman cendio 2023-09-11 12:31:26 CEST
We don't have our own select loop any more as of bug 7636. It's likely that this issue is resolved.
Comment 3 Pierre Ossman cendio 2023-11-20 13:41:36 CET
This issue is indeed solved now.

The select loop is now handled by Python's asyncio, so the complexity is definitely out of our hands.

The listed scenario is also safe under asyncio as it builds a queue of all callbacks after calling select(). That means any changes in callbacks for different fd:s won't take effect until the next iteration of the loop.

In the scenario listed above, the original write callback will be called¹, not the new one.

¹ Which will likely crash as the fd is now closed. But that's a different issue and isn't necessarily the event loop's fault.
Comment 4 Linn cendio 2023-11-28 09:31:36 CET
When testing, I decided to try and recreate the scenario described in comment 0 but in asyncio. Unfortunately, I did not succeed in recreating it.

However, the code in asyncio looks appropriate for this scenario and we have fully swapped from using our own implementation to using asyncio. Because of this, I believe this bug can be considered done, even without recreating this edge case. Closing.

Note You need to log in before you can comment on or make changes to this bug.