Bug 5814

Summary: Poor performance on high latency networks in HTML client
Product: ThinLinc Reporter: Pierre Ossman <ossman>
Component: Web AccessAssignee: Samuel Mannehed <samuel>
Status: CLOSED FIXED    
Severity: Normal CC: astrand, thoni56
Priority: P2 Keywords: astrand_tester, relnotes
Version: pre-1.0   
Target Milestone: 4.7.0   
Hardware: PC   
OS: Unknown   
Acceptance Criteria:
Bug Depends on:    
Bug Blocks: 4606    

Description Pierre Ossman cendio 2016-03-04 10:01:12 CET
noVNC lacks the fence and continuous update extensions necessary for decent performance on high latency networks.
Comment 1 Samuel Mannehed cendio 2016-03-11 17:09:46 CET
Some related discussions upstream:

https://github.com/kanaka/noVNC/issues/221
https://github.com/kanaka/noVNC/issues/563
Comment 2 Pierre Ossman cendio 2016-06-01 14:11:50 CEST
Nagle might also be an issue here, which we've overlooked. It can easily cause latency jitter that can confuse the system.

We can easily disable it in tlwebaccess (and noVNC can do the same in their proxy), but the browser side is another matter. We'll need to investigate what the browsers are doing and if it can be configured.
Comment 3 Pierre Ossman cendio 2016-06-01 14:31:53 CEST
There is an experiment here that indicates that at least Chrome leaves Nagle on:

https://github.com/valsteen/socketio-nagle-experiment

No information about a fix except to spam the connection with more data. I also couldn't find anything in the WebSockets documentation.

However this thread claims that Chrome disables Nagle:

https://bugs.chromium.org/p/chromium/issues/detail?id=450258

as does this, and also claims that Firefox also disables it:

https://bugzilla.mozilla.org/show_bug.cgi?id=542401
Comment 6 Samuel Mannehed cendio 2016-06-09 17:48:54 CEST
https://github.com/kanaka/noVNC/pull/623
Comment 15 Pierre Ossman cendio 2016-06-27 14:44:33 CEST
Works well now.

Tester should verify that we get a nice frame rate even with high latency (100+ ms), and no excessive buffering (latency much higher than the wire latency).

Note that noVNC requires a shitload of CPU, so large screen updates are not recommended.
Comment 16 Peter Åstrand cendio 2016-06-29 14:17:50 CEST
Code review:

It's a little bit unfortunate that the diff between our trunk and our vendor/current is very big, since it contains the continious updates patch (without markers). An alternative could have been an upstream feature branch (following master, not to be merged with master) that we could follow instead, but in any case this is temporary state (assuming https://github.com/kanaka/noVNC/pull/623 will be merged). 

Also see bug 5459, but since we have a separate bug for that problem, it does not cause this bug to be reopened. 

Moving on, I can see that the implementation in r31514 contains 3 FIXMEs. It is not clear to me what the consequences are; can these be safely ignored, do we need to fix them now, or do we need bugs for them?:

noVNC/include/rfb.js:
>+        _handle_server_fence_msg: function() {
>+            if (this._sock.rQwait("ServerFence header", 8, 1)) { return false; }
>+            this._sock.rQskipBytes(3); // Padding
>+            var flags = this._sock.rQshift32();
>+            var length = this._sock.rQshift8();
>+
>+            if (this._sock.rQwait("ServerFence payload", length, 9)) { return false; }
>+            var payload = this._sock.rQshiftStr(length); // FIXME: 64 bytes max


>+            // Filter out unsupported flags
>+            // FIXME: support syncNext
>+            flags &= (1<<0) | (1<<1);


>+                case 150: // EndOfContinuousUpdates
>+                    var first = !(this._supportsContinuousUpdates);
>+                    this._supportsContinuousUpdates = true;
>+                    this._enabledContinuousUpdates = false;
>+                    if (first) {
>+                        this._enabledContinuousUpdates = true;
>+                        this._updateContinuousUpdates();
>+                        Util.Info("Enabling continuous updates.");
>+                    } else {
>+                        // FIXME: may need to send a framebufferupdaterequest here
>+                    }
>+                    return true;

Reopening to get some input.
Comment 17 Peter Åstrand cendio 2016-06-30 09:43:15 CEST
I'd like to add a reference to bug 2566, which was used for implementing fence and continuous update extensions for the native client. 

I've done some performance tests now. I've hijacked usdemo.thinlinc.com. While using a production server for testing has drawbacks, it also has the advantage of being a real system with a real geographic distance to it. The bandwidth for upload and download has been tested with scp and is fine:

tl-4.6.0-clients.zip      100%  229MB   5.7MB/s   00:40    

tl-4.6.0-clients.zip      100%  229MB   9.2MB/s   00:25    

Ping time is steady at 130 ms. 

At first, I ran both the TL server and HTML5 client at Cendio, and only forwarding the traffic to usdemo and back, using two SSH tunnels:

sudo ssh root@usdemo.thinlinc.com -L scilla:301:127.0.0.1:301 -R 301:127.0.0.1:300 -v -N

The HTML5 client was modified to connecto to port 301 instead of 300. With this test, it was difficult to see any improvement. I guess this is because the tunnels does not disable Nagle, and interferes with the buffering etc? 

I've also tested to only update websocket.py/tlwebaccess - to write in larger chunks and to disable Nagle - but still run noVNC without support for fence/CU. I've verified that this does not give as good performance as with fence/CU. 

More test data, with format borrowed from bug 2566:

Test 1: TL 4.6.0 - before fence/CU
firefox/youtube      4
menus                6 
move terminal        4  
Totem video          5
oowriter             6  
evince               5


Test 2: TL 4.6.0post - with fence/CU
firefox/youtube      6
menus                8 
move terminal        6  Note: better FPS, but also a rubberband effect  
Totem video          7
oowriter             7        
evince               5
Comment 18 Pierre Ossman cendio 2016-06-30 15:45:36 CEST
(In reply to comment #16)
> 
> Moving on, I can see that the implementation in r31514 contains 3 FIXMEs. It is
> not clear to me what the consequences are; can these be safely ignored, do we
> need to fix them now, or do we need bugs for them?:
> 

All of them can be ignored for now. Rationale below:

> noVNC/include/rfb.js:
> >+        _handle_server_fence_msg: function() {
> >+            if (this._sock.rQwait("ServerFence header", 8, 1)) { return false; }
> >+            this._sock.rQskipBytes(3); // Padding
> >+            var flags = this._sock.rQshift32();
> >+            var length = this._sock.rQshift8();
> >+
> >+            if (this._sock.rQwait("ServerFence payload", length, 9)) { return false; }
> >+            var payload = this._sock.rQshiftStr(length); // FIXME: 64 bytes max
> 

The protocol limits the payload to 64 bytes. Right now we allow more than that. The FIXME is to enforce the limit to encourage protocol adherence.

> 
> >+            // Filter out unsupported flags
> >+            // FIXME: support syncNext
> >+            flags &= (1<<0) | (1<<1);
> 

We implemented the Fence flags necessary to get the desired functionality. SyncNext is another defined flag, that we did not implement right now but could be interesting in the future. It is not required and TigerVNC's viewer hasn't implemented it either.

> 
> >+                case 150: // EndOfContinuousUpdates
> >+                    var first = !(this._supportsContinuousUpdates);
> >+                    this._supportsContinuousUpdates = true;
> >+                    this._enabledContinuousUpdates = false;
> >+                    if (first) {
> >+                        this._enabledContinuousUpdates = true;
> >+                        this._updateContinuousUpdates();
> >+                        Util.Info("Enabling continuous updates.");
> >+                    } else {
> >+                        // FIXME: may need to send a framebufferupdaterequest here
> >+                    }
> >+                    return true;
> 

We theorised that it may be necessary to send another request there in the event that continuous updates is turned on and then turned off again. However nothing in the code exposes such functionality right now so we didn't feel it was worthwhile to explore the (possible) issue further.
Comment 19 Peter Åstrand cendio 2016-07-04 09:01:56 CEST
(In reply to comment #18)
> (In reply to comment #16)
> > 
> > Moving on, I can see that the implementation in r31514 contains 3 FIXMEs. It is
> > not clear to me what the consequences are; can these be safely ignored, do we
> > need to fix them now, or do we need bugs for them?:
> > 
> 
> All of them can be ignored for now. Rationale below:

Ok then, closing.
Comment 20 Pierre Ossman cendio 2016-09-21 11:00:28 CEST
We're seeing large lag under some conditions when continuous updates is turned on. Need to have another look.
Comment 21 Pierre Ossman cendio 2016-09-21 11:12:54 CEST
The problem is that timers are used as a form of "yield" in noVNC. However timers have a minimum delay of 4 ms in HTML (when used repeatedly). noVNC uses a timer after each message, limiting the processing to 250 messages per second.

However we get at least three message per update, meaning a limit of 83 updates per second. This can easily be exceeded for small updates. At that point we start getting a backlog in noVNC that keeps increasing. Xvnc will try to back off, but fail since this isn't a bandwidth issue.
Comment 23 Pierre Ossman cendio 2016-09-21 16:45:08 CEST
The big issue is fixed now, but I've also discovered something related. It is however not a regression so I'll be adding a new bug for that.

Tester should make sure the mouse isn't laggy on Internet Explorer. Best seen with a solid colour background.
Comment 24 Thomas Nilefalk cendio 2016-09-22 14:38:54 CEST
Tested with IE11 on Windows 10 Embedded, no significant lag is visible
- radical difference to build 5226
- no adverse effects using Chrome on MacOS...

Closing.