Bug 7161 - Can not reconnect to old session upon upgrade RHEL7
Summary: Can not reconnect to old session upon upgrade RHEL7
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Agent (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.9.0
Assignee: Pierre Ossman
URL:
Keywords: derfian_tester, relnotes, samuel_tester
Depends on:
Blocks:
 
Reported: 2018-04-23 18:38 CEST by Henrik Andersson
Modified: 2022-04-22 07:59 CEST (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Henrik Andersson cendio 2018-04-23 18:38:45 CEST
Reproduce;

Installing 4.8.1 and create a session, upgrade ThinLinc to 4.9.0 an reconnect to the session and it will fail. Creating a new session.

#tail /var/log/vsmserver.log 
2018-04-23 18:16:33 INFO vsmserver.license: Updating license data from disk to memory
2018-04-23 18:16:33 INFO vsmserver.license: License summary: 5 concurrent users. Hard limit of 6 concurrent users. 
2018-04-23 18:16:33 INFO vsmserver.session: Loaded 1 sessions for 1 users from file
2018-04-23 18:16:56 INFO vsmserver.session: Session 127.0.0.1:1 for cendio has terminated. Removing.
2018-04-23 18:16:56 INFO vsmserver.session: User with uid 1000 (cendio) requested a new session
2018-04-23 18:16:57 INFO vsmserver: VSM Agent 127.0.0.1 successfully created a new session for cendio
2018-04-23 18:18:44 INFO vsmserver.session: Session 127.0.0.1:10 for cendio has terminated. Removing.
2018-04-23 18:18:44 INFO vsmserver.session: User with uid 1000 (cendio) requested a new session
2018-04-23 18:18:46 INFO vsmserver: VSM Agent 127.0.0.1 successfully created a new session for cendio
2018-04-23 18:18:46 WARNING vsmserver.session: Failed to get client ip for the session


#tail /var/log/vsmagent.log 
2018-04-23 18:13:04 INFO vsmagent: My public hostname is 10.47.253.181
2018-04-23 18:13:57 INFO vsmagent.session: Verified connectivity to newly started Xvnc for cendio
2018-04-23 18:16:24 INFO vsmagent: Got SIGTERM, signaling process to quit
2018-04-23 18:16:24 INFO vsmagent: Terminating. Have a nice day!
2018-04-23 18:16:27 INFO vsmagent: VSM Agent version 4.9.0 build 5758 started
2018-04-23 18:16:27 INFO vsmagent: My public hostname is 10.47.253.181
2018-04-23 18:16:56 WARNING vsmagent.session: Broken session for user cendio, tl-session pid 6729 is not tl-session
2018-04-23 18:16:57 INFO vsmagent.session: Verified connectivity to newly started Xvnc for cendio
2018-04-23 18:18:44 WARNING vsmagent.sessions: Broken session for user cendio, tl-session process 26653 does not exist
2018-04-23 18:18:45 INFO vsmagent.session: Verified connectivity to newly started Xvnc for cendio
Comment 1 Henrik Andersson cendio 2018-04-23 18:39:23 CEST
tl-session with pid 6729 is running:

# ps -ax | grep tl-session
  4841 ?        S      0:00 tl-session: cendio
  6729 ?        S      0:00 tl-session: cendio

# ls -la /proc/6729/exe 
lrwxrwxrwx. 1 root root 0 Apr 23 18:19 /proc/6729/exe -> /opt/thinlinc/libexec/tl-session;5ade066b (deleted)
Comment 2 Henrik Andersson cendio 2018-04-23 18:39:58 CEST
The code in handler_verifysessions.py does not handle the additional string ";5ade066b" which we have not seen before...
Comment 3 Henrik Andersson cendio 2018-04-23 18:42:06 CEST
(In reply to comment #0)
> Reproduce;
> 
> Installing 4.8.1 and create a session, upgrade ThinLinc to 4.9.0 an reconnect
> to the session and it will fail. Creating a new session.
> 

Red Hat Enterprise Linux Server release 7.5 (Maipo)
Comment 4 Karl Mikaelsson cendio 2018-04-24 10:49:53 CEST
(In reply to comment #3)
> (In reply to comment #0)
> > Reproduce;
> > 
> > Installing 4.8.1 and create a session, upgrade ThinLinc to 4.9.0 an reconnect
> > to the session and it will fail. Creating a new session.
> > 
> 
> Red Hat Enterprise Linux Server release 7.5 (Maipo)


Same problem on RHEL 7.4 after upgrading 4.8.0 to 4.9.0.

> # cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.4 (Maipo)

> # readlink /proc/1942/exe
> /opt/thinlinc/libexec/tl-session;5adeeecd (deleted)
Comment 5 Pierre Ossman cendio 2018-04-24 10:52:20 CEST
So that suffix seems to come from rpm:

lib/fsm.c:    rasprintf(&tid, ";%08x", (unsigned)rpmtsGetTid(ts));

However this is used for the new file, not the existing one. So the sequence is:

 1. Unpack tl-session;12345678
 2. mv tl-session;12345678 tl-session

So I'm not sure how we ended up with a running process pointing to that file.
Comment 6 Karl Mikaelsson cendio 2018-04-24 11:11:26 CEST
Looks like a kernel bug in RHEL, present in at least 7.4 and on.

First:
> # cp /usr/bin/python /usr/bin/mypython
> # cp /usr/bin/python /usr/bin/mypython2
> # mypython

In another terminal:

> # pid=$(pidof mypython)
> # readlink /proc/$pid/exe
> /usr/bin/mypython
> # mv /usr/bin/mypython2 /usr/bin/mypython
> # readlink /proc/$pid/exe
> /usr/bin/mypython2 (deleted)

error:     ^^^^^^^^^


On Fedora 27, I get the expected:

> # readlink /proc/$pid/exe
> /usr/bin/mypython
> # mv /usr/bin/mypython2 /usr/bin/mypython
> # readlink /proc/$pid/exe
> /usr/bin/mypython (deleted)
Comment 7 Karl Mikaelsson cendio 2018-04-24 12:09:54 CEST
(In reply to comment #6)
> Looks like a kernel bug in RHEL, present in at least 7.4 and on.

RHEL 7.0 is unaffected.

> # uname -r
> 3.10.0-123.6.3.el7.x86_64

> # cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.0 (Maipo)

> # ./bug7161.sh 
> /proc/10667/exe before moving:
> /tmp/tmp.OIgGHDSqqr/sleep
> /proc/10667/exe after moving:
> /tmp/tmp.OIgGHDSqqr/sleep (deleted)
Comment 8 Karl Mikaelsson cendio 2018-04-24 12:11:56 CEST
(In reply to comment #7)
> > # ./bug7161.sh 

FWIW; bug7161.sh:

> #!/bin/bash
> 
> set -e
> 
> DIR=$(mktemp -d)
> cp /usr/bin/sleep "${DIR}"/sleep
> cp /usr/bin/sleep "${DIR}"/sleep2
> 
> "${DIR}"/sleep 10 &
> PROC=$!
> echo "/proc/${PROC}/exe before moving:"
> readlink /proc/${PROC}/exe
> mv "${DIR}"/sleep2 "${DIR}"/sleep
> echo "/proc/${PROC}/exe after moving:"
> readlink /proc/${PROC}/exe
> rm -rf "${DIR}"
Comment 9 Karl Mikaelsson cendio 2018-04-24 12:27:15 CEST
(In reply to comment #7)
> (In reply to comment #6)
> > Looks like a kernel bug in RHEL, present in at least 7.4 and on.
> 
> RHEL 7.0 is unaffected.

RHEL 7.2 is affected.

> # uname -r
> 3.10.0-327.13.1.el7.x86_64

> # cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.2 (Maipo)

> # ./bug7161.sh 
> /proc/10584/exe before moving:
> /tmp/tmp.vah05hF10P/sleep
> /proc/10584/exe after moving:
> /tmp/tmp.vah05hF10P/sleep2 (deleted)
Comment 14 Karl Mikaelsson cendio 2018-04-26 12:20:08 CEST
RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64

Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef (deleted)" suffixes, as created by upgrading ThinLinc with RPM.

HOWEVER:

It's imperative that you restart the vsmagent service as quickly as possible after installing upgraded packages.

If you do not do this in a timely fashion, previously scheduled session verification tasks will run old code in vsmagent that will effectively make any running sessions unreachable if a user disconnects.

Take into account that the administrator _must_ run tl-setup, deal with the configuration changes that happened during the 4.9.0 cycle, wait for the SELinux module, CUPS, printers, etc. This can easily take a minute for an experienced ThinLinc Developer, so it's easy to imagine scenarios where this takes 2-5 minutes or more for Joe Schmoe, system administrator.

As for me, I went for lunch during tl-setup and came back to an upgraded system that had lost sessions.

I think these issues needs to be at least discussed and understood before this bug is resolved.
Comment 15 Samuel Mannehed cendio 2018-04-26 13:18:12 CEST
(In reply to comment #14)
> RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64
> 
> Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef
> (deleted)" suffixes, as created by upgrading ThinLinc with RPM.
> 
> HOWEVER:
> 
> It's imperative that you restart the vsmagent service as quickly as possible
> after installing upgraded packages.
> 
> If you do not do this in a timely fashion, previously scheduled session
> verification tasks will run old code in vsmagent that will effectively make any
> running sessions unreachable if a user disconnects.
> 
> Take into account that the administrator _must_ run tl-setup, deal with the
> configuration changes that happened during the 4.9.0 cycle, wait for the
> SELinux module, CUPS, printers, etc. This can easily take a minute for an
> experienced ThinLinc Developer, so it's easy to imagine scenarios where this
> takes 2-5 minutes or more for Joe Schmoe, system administrator.
> 
> As for me, I went for lunch during tl-setup and came back to an upgraded system
> that had lost sessions.
> 
> I think these issues needs to be at least discussed and understood before this
> bug is resolved.

--> bug 7163.
Comment 16 Karl Mikaelsson cendio 2018-04-26 15:35:16 CEST
(In reply to comment #14)
> RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64
> 
> Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef
> (deleted)" suffixes, as created by upgrading ThinLinc with RPM.

Also works on Debian 9 i386 with thinlinc-vsm-4.9.0-5764.
Comment 17 Karl Mikaelsson cendio 2018-04-26 15:36:52 CEST
(In reply to comment #16)
> (In reply to comment #14)
> > RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64
> > 
> > Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef
> > (deleted)" suffixes, as created by upgrading ThinLinc with RPM.
> 
> Also works on Debian 9 i386 with thinlinc-vsm-4.9.0-5764.

To clarify: Debian 9 with the linux-image-4.9.0-6-686-pae kernel does not exhibit the triggering problem. The new code works just as well in this scenario.
Comment 18 Peter Åstrand cendio 2018-05-02 12:27:04 CEST
(In reply to comment #15)
> (In reply to comment #14)
> > RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64
> > 
> > Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef
> > (deleted)" suffixes, as created by upgrading ThinLinc with RPM.
> > 
> > HOWEVER:
> > 
> > It's imperative that you restart the vsmagent service as quickly as possible
> > after installing upgraded packages.
> > 
> > If you do not do this in a timely fashion, previously scheduled session
> > verification tasks will run old code in vsmagent that will effectively make any
> > running sessions unreachable if a user disconnects.
> > 
> > Take into account that the administrator _must_ run tl-setup, deal with the
> > configuration changes that happened during the 4.9.0 cycle, wait for the
> > SELinux module, CUPS, printers, etc. This can easily take a minute for an
> > experienced ThinLinc Developer, so it's easy to imagine scenarios where this
> > takes 2-5 minutes or more for Joe Schmoe, system administrator.
> > 
> > As for me, I went for lunch during tl-setup and came back to an upgraded system
> > that had lost sessions.
> > 
> > I think these issues needs to be at least discussed and understood before this
> > bug is resolved.
> 
> --> bug 7163.

I'm suggesting that 7163 is reverted and replaced with this solution:

--- vsm/thinlinc-vsm.spec.in    (revision 33232)
+++ vsm/thinlinc-vsm.spec.in    (arbetskopia)
@@ -59,12 +59,9 @@
 rm -rf
 
 %pre
-# Stop services before upgrading
+# Workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1571253
 if [ $1 -gt 1 ] ; then
-    # remember that these are services from the OLD package
-    # even if we have changed the services in the NEW package
-    /opt/thinlinc/libexec/service vsmagent stop
-    /opt/thinlinc/libexec/service vsmserver stop
+    rm -f /opt/thinlinc/libexec/tl-session
 fi
 # Save install time
 mkdir -p /opt/thinlinc/etc/.upgrade.stamp


I've verified that this solution works by using a modified version of "bug7161.sh". A full ThinLinc upgrade test remains.
Comment 20 Pierre Ossman cendio 2018-05-02 14:52:38 CEST
Alternative approach committed now. Need to do final (re-)testing, but then we should be done.
Comment 22 Samuel Mannehed cendio 2018-05-03 14:12:44 CEST
Can't see any issues when testing build 5770 on Ubuntu 16.04. Works well and can reconnect to sessions started before an upgrade.
Comment 23 Samuel Mannehed cendio 2018-05-03 14:42:18 CEST
Could reproduce the issue when upgrading from 4.8.0 to 4.8.1 on RHEL7:

2018-05-03 14:34:04 WARNING vsmagent.session: Broken session for user cendio, tl-session pid 19240 is not tl-session

And when upgrading from 4.8.1 to build 5770 I can can successfully reconnect to a session started prior to the upgrade. Looks good.

Note You need to log in before you can comment on or make changes to this bug.