Bug 7650 - SSH crashes with Kerberos authentication on Windows
Summary: SSH crashes with Kerberos authentication on Windows
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Client (show other bugs)
Version: 4.12.1
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.13.0
Assignee: Samuel Mannehed
URL:
Keywords: nikle_tester, relnotes
Depends on:
Blocks:
 
Reported: 2021-02-22 13:06 CET by Martin Östlund
Modified: 2021-03-23 14:25 CET (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Martin Östlund cendio 2021-02-22 13:06:34 CET
With 4.12.1 Windows clients we've had some reports from users that the client reports 
| Connection Error
| ! Couldn't setup secure tunnel to Thinlinc server.
|(Couldn't establish SSH tunnel, SSH terminated.

when trying to connect with Kerberos authentication.

Impact for the customer is that their users are not able to login with kerberos authentication. They'd either have to downgrade their client or disable kerberos authentication and authenticate by some other means.


Steps to reproduce:

Windows 10 with 4.12.1 client and option enabled for Security -> Kerberos Ticket checked.

This windows machine was joined to an internal Windows Active Domain and logon was made with a user on this Active Directory domain.


From command line, invoke ssh.exe with -vvv flags to get more debug output.

C:\Users\martin\Downloads\tl-nightly-clients\tl-4.12.1post-client-windows-x64>ssh.exe -vvv -N -o GlobalKnownHostsFile=nul -o UserKnownHostsFile=nul -o UpdateHostKeys=yes -o PasswordAuthentication=no -o ChallengeResponseAuthentication=no -o KbdInteractiveAuthentication=no -o PubkeyAuthentication=no -o GSSAPIAuthentication=yes -o CheckHostIP=no -o NumberOfPasswordPrompts=1 martin@lab-188.lkpg.cendio.se -p 
22 thinlinc-login master

This will produce (atleast) the following output on all occasions


AUTH SUCCESS
debug1: Authentication succeeded (gssapi-with-mic).
Authenticated to lab-188.lkpg.cendio.se ([10.48.2.188]:22).
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug3: send packet: type 90
CONNECTED
debug1: Requesting no-more-sessions@openssh.com
debug3: send packet: type 80
debug1: Entering interactive session.
debug1: pledge: filesystem full
debug3: sigaction(24): Invalid argument
debug3: sigaction(24): Invalid argument
debug3: sigaction(25): Invalid argument
debug3: sigaction(25): Invalid argument
debug3: sigaction(32): Invalid argument
debug3: receive packet: type 80
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug3: client_input_hostkeys: received RSA key SHA256:Cc7IfoU2l6GNZreonaVX62m0N6fmFj+xj0+6TuYLc8A

Then it just hangs there for a few seconds and get a silent crash. This crash can be found in Windows Event log with crash codes from ssh.exe

Out of 100 tries with above ssh.exe connection, I got this far 59 times. The other 41 tries it got a bit further with the connection but at all times it failed when exchanging hostkeys.

Only on 8 occasions of of 100 it managed to work successfully, with end output being

THINLINC-LOGIN: HELLO 4.12.1
debug2: channel 0: written 29 to efd 5
debug2: channel 0: rcvd ext data 33
THINLINC-LOGIN: CONNECTED MASTER

I beleive this might be related to https://www.cendio.com/bugzilla/show_bug.cgi?id=7536 

7536 was introduced in 4.12.1 and adds the argument 
-o UpdateHostKeys=yes to ssh.exe commandline

Using Kerberos authentication adds the argument 
-o GSSAPIAuthentication=yes to ssh.exe commandline

I have not been able to reproduce this with the Linux client.
I have not tested this with the OS X client
I have not tested to use Kerberos authentication on a Windows machine not joined to an AD domain.
Comment 3 Samuel Mannehed cendio 2021-03-12 17:06:07 CET
Things work fine if you run the ssh.exe that is included in the 4.12.1 client WITHOUT "-o UpdateHostKeys=yes". This verifies that the change made in bug 7536 is indeed the cause of this bug.

Things also work fine if you use a 32-bit build of ssh.exe.

The error code 0xC0000374 that Windows lists in Event Viewer for this problem means "Heap Corruption":

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
Comment 4 Samuel Mannehed cendio 2021-03-15 09:36:17 CET
In the few cases where you manage to connect successfully (~10% of the tries for both master and agent = ~1% in total) you will still encounter this crash eventually.

In my test it took 20 minutes and then tlclient (4.12.1post build 6773) crashed with this error:

> read: An existing connection was forcibly closed by the remote host. (10054)

And the same Heap Corruption error was logged from ssh.exe in Event Viewer.
Comment 5 Samuel Mannehed cendio 2021-03-15 09:39:41 CET
It's also worth noting that when starting ssh.exe manually from the command line I still get the crash when the kerberos authentication fails.

The command I used was:

ssh_64.exe -vvv -N -o GlobalKnownHostsFile=nul -o UserKnownHostsFile=nul -o UpdateHostKeys=yes -o PasswordAuthentication=yes -o ChallengeResponseAuthentication=yes -o KbdInteractiveAuthentication=yes -o PubkeyAuthentication=no -o GSSAPIAuthentication=yes -o CheckHostIP=no -o NumberOfPasswordPrompts=1 cendio@lab-188.lkpg.cendio.se -p 22 thinlinc-login master

This was while being logged in locally on Windows with a different user, and thus the cendio user I attempted to connect with did not have any Kerberos tickets available. SSH falls back to password authentication but then still crashes in the same manner as before.
Comment 6 Samuel Mannehed cendio 2021-03-15 14:47:40 CET
Tested tlclient from 4.12.0 and 4.9.0 as well, we get the ssh.exe Heap Corruption crash in both after ~30 minutes of connection with Kerberos auth on windows. That means we've had this issue for a long while now.
Comment 13 Samuel Mannehed cendio 2021-03-19 18:28:57 CET
This should be fixed now, I have tested both the ssh.exe itself and a new build of the ThinLinc Client. The login works fine with Kerberos and 'UpdateHostKeys=yes' and I've had a fixed tlclient connected uninterrupted for ~8 hours now.
Comment 17 Niko Lehto cendio 2021-03-23 14:25:10 CET
Reproduced in Windows 10 with client version 4.12.1, used the command described in comment #0.

This works with build 1895 and release notes looks good! Also verified that normal Kerberos authentication still works in both Windows 10 and Fedora 33.

Note You need to log in before you can comment on or make changes to this bug.