www.cendio.com

Bug 7161

Summary: Can not reconnect to old session upon upgrade RHEL7
Product: ThinLinc Reporter: Henrik Andersson <hean01@cendio.se>
Component: VSM AgentAssignee: Pierre Ossman <ossman@cendio.se>
Status: CLOSED FIXED QA Contact: Bugzilla mail exporter <bugzilla-qa@cendio.se>
Severity: Normal    
Priority: P2 CC: astrand@cendio.se, samuel@cendio.se
Version: trunkKeywords: derfian_tester, relnotes, samuel_tester
Target Milestone: 4.9.0   
Hardware: PC   
OS: Unknown   
Acceptance Criteria:

Description From cendio 2018-04-23 18:38:45
Reproduce;

Installing 4.8.1 and create a session, upgrade ThinLinc to 4.9.0 an reconnect
to the session and it will fail. Creating a new session.

#tail /var/log/vsmserver.log 
2018-04-23 18:16:33 INFO vsmserver.license: Updating license data from disk to
memory
2018-04-23 18:16:33 INFO vsmserver.license: License summary: 5 concurrent
users. Hard limit of 6 concurrent users. 
2018-04-23 18:16:33 INFO vsmserver.session: Loaded 1 sessions for 1 users from
file
2018-04-23 18:16:56 INFO vsmserver.session: Session 127.0.0.1:1 for cendio has
terminated. Removing.
2018-04-23 18:16:56 INFO vsmserver.session: User with uid 1000 (cendio)
requested a new session
2018-04-23 18:16:57 INFO vsmserver: VSM Agent 127.0.0.1 successfully created a
new session for cendio
2018-04-23 18:18:44 INFO vsmserver.session: Session 127.0.0.1:10 for cendio has
terminated. Removing.
2018-04-23 18:18:44 INFO vsmserver.session: User with uid 1000 (cendio)
requested a new session
2018-04-23 18:18:46 INFO vsmserver: VSM Agent 127.0.0.1 successfully created a
new session for cendio
2018-04-23 18:18:46 WARNING vsmserver.session: Failed to get client ip for the
session


#tail /var/log/vsmagent.log 
2018-04-23 18:13:04 INFO vsmagent: My public hostname is 10.47.253.181
2018-04-23 18:13:57 INFO vsmagent.session: Verified connectivity to newly
started Xvnc for cendio
2018-04-23 18:16:24 INFO vsmagent: Got SIGTERM, signaling process to quit
2018-04-23 18:16:24 INFO vsmagent: Terminating. Have a nice day!
2018-04-23 18:16:27 INFO vsmagent: VSM Agent version 4.9.0 build 5758 started
2018-04-23 18:16:27 INFO vsmagent: My public hostname is 10.47.253.181
2018-04-23 18:16:56 WARNING vsmagent.session: Broken session for user cendio,
tl-session pid 6729 is not tl-session
2018-04-23 18:16:57 INFO vsmagent.session: Verified connectivity to newly
started Xvnc for cendio
2018-04-23 18:18:44 WARNING vsmagent.sessions: Broken session for user cendio,
tl-session process 26653 does not exist
2018-04-23 18:18:45 INFO vsmagent.session: Verified connectivity to newly
started Xvnc for cendio
------- Comment #1 From cendio 2018-04-23 18:39:23 -------
tl-session with pid 6729 is running:

# ps -ax | grep tl-session
  4841 ?        S      0:00 tl-session: cendio
  6729 ?        S      0:00 tl-session: cendio

# ls -la /proc/6729/exe 
lrwxrwxrwx. 1 root root 0 Apr 23 18:19 /proc/6729/exe ->
/opt/thinlinc/libexec/tl-session;5ade066b (deleted)
------- Comment #2 From cendio 2018-04-23 18:39:58 -------
The code in handler_verifysessions.py does not handle the additional string
";5ade066b" which we have not seen before...
------- Comment #3 From cendio 2018-04-23 18:42:06 -------
(In reply to comment #0)
> Reproduce;
> 
> Installing 4.8.1 and create a session, upgrade ThinLinc to 4.9.0 an reconnect
> to the session and it will fail. Creating a new session.
> 

Red Hat Enterprise Linux Server release 7.5 (Maipo)
------- Comment #4 From cendio 2018-04-24 10:49:53 -------
(In reply to comment #3)
> (In reply to comment #0)
> > Reproduce;
> > 
> > Installing 4.8.1 and create a session, upgrade ThinLinc to 4.9.0 an reconnect
> > to the session and it will fail. Creating a new session.
> > 
> 
> Red Hat Enterprise Linux Server release 7.5 (Maipo)


Same problem on RHEL 7.4 after upgrading 4.8.0 to 4.9.0.

> # cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.4 (Maipo)

> # readlink /proc/1942/exe
> /opt/thinlinc/libexec/tl-session;5adeeecd (deleted)
------- Comment #5 From cendio 2018-04-24 10:52:20 -------
So that suffix seems to come from rpm:

lib/fsm.c:    rasprintf(&tid, ";%08x", (unsigned)rpmtsGetTid(ts));

However this is used for the new file, not the existing one. So the sequence
is:

 1. Unpack tl-session;12345678
 2. mv tl-session;12345678 tl-session

So I'm not sure how we ended up with a running process pointing to that file.
------- Comment #6 From cendio 2018-04-24 11:11:26 -------
Looks like a kernel bug in RHEL, present in at least 7.4 and on.

First:
> # cp /usr/bin/python /usr/bin/mypython
> # cp /usr/bin/python /usr/bin/mypython2
> # mypython

In another terminal:

> # pid=$(pidof mypython)
> # readlink /proc/$pid/exe
> /usr/bin/mypython
> # mv /usr/bin/mypython2 /usr/bin/mypython
> # readlink /proc/$pid/exe
> /usr/bin/mypython2 (deleted)

error:     ^^^^^^^^^


On Fedora 27, I get the expected:

> # readlink /proc/$pid/exe
> /usr/bin/mypython
> # mv /usr/bin/mypython2 /usr/bin/mypython
> # readlink /proc/$pid/exe
> /usr/bin/mypython (deleted)
------- Comment #7 From cendio 2018-04-24 12:09:54 -------
(In reply to comment #6)
> Looks like a kernel bug in RHEL, present in at least 7.4 and on.

RHEL 7.0 is unaffected.

> # uname -r
> 3.10.0-123.6.3.el7.x86_64

> # cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.0 (Maipo)

> # ./bug7161.sh 
> /proc/10667/exe before moving:
> /tmp/tmp.OIgGHDSqqr/sleep
> /proc/10667/exe after moving:
> /tmp/tmp.OIgGHDSqqr/sleep (deleted)
------- Comment #8 From cendio 2018-04-24 12:11:56 -------
(In reply to comment #7)
> > # ./bug7161.sh 

FWIW; bug7161.sh:

> #!/bin/bash
> 
> set -e
> 
> DIR=$(mktemp -d)
> cp /usr/bin/sleep "${DIR}"/sleep
> cp /usr/bin/sleep "${DIR}"/sleep2
> 
> "${DIR}"/sleep 10 &
> PROC=$!
> echo "/proc/${PROC}/exe before moving:"
> readlink /proc/${PROC}/exe
> mv "${DIR}"/sleep2 "${DIR}"/sleep
> echo "/proc/${PROC}/exe after moving:"
> readlink /proc/${PROC}/exe
> rm -rf "${DIR}"
------- Comment #9 From cendio 2018-04-24 12:27:15 -------
(In reply to comment #7)
> (In reply to comment #6)
> > Looks like a kernel bug in RHEL, present in at least 7.4 and on.
> 
> RHEL 7.0 is unaffected.

RHEL 7.2 is affected.

> # uname -r
> 3.10.0-327.13.1.el7.x86_64

> # cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.2 (Maipo)

> # ./bug7161.sh 
> /proc/10584/exe before moving:
> /tmp/tmp.vah05hF10P/sleep
> /proc/10584/exe after moving:
> /tmp/tmp.vah05hF10P/sleep2 (deleted)
------- Comment #14 From cendio 2018-04-26 12:20:08 -------
RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64

Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef
(deleted)" suffixes, as created by upgrading ThinLinc with RPM.

HOWEVER:

It's imperative that you restart the vsmagent service as quickly as possible
after installing upgraded packages.

If you do not do this in a timely fashion, previously scheduled session
verification tasks will run old code in vsmagent that will effectively make any
running sessions unreachable if a user disconnects.

Take into account that the administrator _must_ run tl-setup, deal with the
configuration changes that happened during the 4.9.0 cycle, wait for the
SELinux module, CUPS, printers, etc. This can easily take a minute for an
experienced ThinLinc Developer, so it's easy to imagine scenarios where this
takes 2-5 minutes or more for Joe Schmoe, system administrator.

As for me, I went for lunch during tl-setup and came back to an upgraded system
that had lost sessions.

I think these issues needs to be at least discussed and understood before this
bug is resolved.
------- Comment #15 From cendio 2018-04-26 13:18:12 -------
(In reply to comment #14)
> RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64
> 
> Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef
> (deleted)" suffixes, as created by upgrading ThinLinc with RPM.
> 
> HOWEVER:
> 
> It's imperative that you restart the vsmagent service as quickly as possible
> after installing upgraded packages.
> 
> If you do not do this in a timely fashion, previously scheduled session
> verification tasks will run old code in vsmagent that will effectively make any
> running sessions unreachable if a user disconnects.
> 
> Take into account that the administrator _must_ run tl-setup, deal with the
> configuration changes that happened during the 4.9.0 cycle, wait for the
> SELinux module, CUPS, printers, etc. This can easily take a minute for an
> experienced ThinLinc Developer, so it's easy to imagine scenarios where this
> takes 2-5 minutes or more for Joe Schmoe, system administrator.
> 
> As for me, I went for lunch during tl-setup and came back to an upgraded system
> that had lost sessions.
> 
> I think these issues needs to be at least discussed and understood before this
> bug is resolved.

--> bug 7163.
------- Comment #16 From cendio 2018-04-26 15:35:16 -------
(In reply to comment #14)
> RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64
> 
> Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef
> (deleted)" suffixes, as created by upgrading ThinLinc with RPM.

Also works on Debian 9 i386 with thinlinc-vsm-4.9.0-5764.
------- Comment #17 From cendio 2018-04-26 15:36:52 -------
(In reply to comment #16)
> (In reply to comment #14)
> > RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64
> > 
> > Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef
> > (deleted)" suffixes, as created by upgrading ThinLinc with RPM.
> 
> Also works on Debian 9 i386 with thinlinc-vsm-4.9.0-5764.

To clarify: Debian 9 with the linux-image-4.9.0-6-686-pae kernel does not
exhibit the triggering problem. The new code works just as well in this
scenario.
------- Comment #18 From cendio 2018-05-02 12:27:04 -------
(In reply to comment #15)
> (In reply to comment #14)
> > RHEL 7.5 x86_64, thinlinc-vsm-4.9.0-5764.x86_64
> > 
> > Sessions are no longer discarded if their /proc/pid/exe link has ";01abcdef
> > (deleted)" suffixes, as created by upgrading ThinLinc with RPM.
> > 
> > HOWEVER:
> > 
> > It's imperative that you restart the vsmagent service as quickly as possible
> > after installing upgraded packages.
> > 
> > If you do not do this in a timely fashion, previously scheduled session
> > verification tasks will run old code in vsmagent that will effectively make any
> > running sessions unreachable if a user disconnects.
> > 
> > Take into account that the administrator _must_ run tl-setup, deal with the
> > configuration changes that happened during the 4.9.0 cycle, wait for the
> > SELinux module, CUPS, printers, etc. This can easily take a minute for an
> > experienced ThinLinc Developer, so it's easy to imagine scenarios where this
> > takes 2-5 minutes or more for Joe Schmoe, system administrator.
> > 
> > As for me, I went for lunch during tl-setup and came back to an upgraded system
> > that had lost sessions.
> > 
> > I think these issues needs to be at least discussed and understood before this
> > bug is resolved.
> 
> --> bug 7163.

I'm suggesting that 7163 is reverted and replaced with this solution:

--- vsm/thinlinc-vsm.spec.in    (revision 33232)
+++ vsm/thinlinc-vsm.spec.in    (arbetskopia)
@@ -59,12 +59,9 @@
 rm -rf

 %pre
-# Stop services before upgrading
+# Workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1571253
 if [ $1 -gt 1 ] ; then
-    # remember that these are services from the OLD package
-    # even if we have changed the services in the NEW package
-    /opt/thinlinc/libexec/service vsmagent stop
-    /opt/thinlinc/libexec/service vsmserver stop
+    rm -f /opt/thinlinc/libexec/tl-session
 fi
 # Save install time
 mkdir -p /opt/thinlinc/etc/.upgrade.stamp


I've verified that this solution works by using a modified version of
"bug7161.sh". A full ThinLinc upgrade test remains.
------- Comment #20 From cendio 2018-05-02 14:52:38 -------
Alternative approach committed now. Need to do final (re-)testing, but then we
should be done.
------- Comment #22 From cendio 2018-05-03 14:12:44 -------
Can't see any issues when testing build 5770 on Ubuntu 16.04. Works well and
can reconnect to sessions started before an upgrade.
------- Comment #23 From cendio 2018-05-03 14:42:18 -------
Could reproduce the issue when upgrading from 4.8.0 to 4.8.1 on RHEL7:

2018-05-03 14:34:04 WARNING vsmagent.session: Broken session for user cendio,
tl-session pid 19240 is not tl-session

And when upgrading from 4.8.1 to build 5770 I can can successfully reconnect to
a session started prior to the upgrade. Looks good.