Bug 4950 - impossible to restart web service on crash (stray tlstunnel)
Summary: impossible to restart web service on crash (stray tlstunnel)
Status: CLOSED WORKSFORME
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Misc (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.12.0
Assignee: Karl Mikaelsson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-07 09:41 CET by Pierre Ossman
Modified: 2020-06-18 10:16 CEST (History)
0 users

See Also:
Acceptance Criteria:


Attachments

Description Pierre Ossman cendio 2014-01-07 09:41:42 CET
Our web based services (currently tlwebadm and tlwebaccess) are really two servers, not one. There's the python process (tlwebadm/tlwebaccess), and there's the tlstunnel process.

The service scripts only track the first one though. In the event of a crash, the tlstunnel process is left running. This blocks the sockets and a restart of the service is insufficient to fix things.

One fix would be to make the service script clever enough to track both daemons and kill as needed on restart. Might also want to consider if we want to delegate starting/stopping tlstunnel completely to the service script rather than having the python process control it.
Comment 1 Pierre Ossman cendio 2014-01-07 09:44:31 CET
Example failure scenario:

# kill -9 931 (tlwebadm)
# /etc/init.d/tlwebadm status
ThinLinc Web Administration is stopped
# /etc/init.d/tlwebadm restart
Shutting down ThinLinc Web Administration
Starting ThinLinc Web Administration
# tail /var/log/tlwebadm.log
2014-01-07 09:20:10 INFO tlwebadm[931]: ThinLinc Web Administration version 4.1.1post build 4182 starting...
2014-01-07 09:20:10 INFO tlwebadm[931]: ThinLinc Web Administration running as PID 931 on port 1010.
2014-01-07 09:20:10 INFO tlwebadm[933]: ThinLinc TLS Service ready on port 1010.
2014-01-07 09:36:51 INFO tlwebadm[3615]: ThinLinc Web Administration version 4.1.1post build 4182 starting...
2014-01-07 09:36:51 ERROR tlwebadm[3615]: Could not bind to AF_UNIX socket /var/run/tlwebadm.sock. Check that there are no other processes using this socket. Exiting...
Comment 2 Peter Åstrand cendio 2014-01-07 13:05:24 CET
IMHO, the fact that an extra process (tlstunnel) is used should be considered an implementation detail and not "visible" on a higher level, ie on the service level. Thus, I believe we should try to fix the bug in the service code instead. For example, the service might need to record the tlstunnel pid in a file on disk, and check for stray tlstunnel processes on startup.
Comment 3 Pierre Ossman cendio 2014-02-17 14:22:08 CET
We could solve this by having a file descriptor open between the python service and tlstunnel. tlstunnel should then be able to do select() and that fd and detect if the python service has crashed.
Comment 4 Pierre Ossman cendio 2020-06-16 13:11:36 CEST
This got fixed in r31160 for bug 5044. We no longer have a listening tlstunnel process that can get lost.

Note You need to log in before you can comment on or make changes to this bug.