www.cendio.com
Bug 5814 - Poor performance on high latency networks in HTML client
: Poor performance on high latency networks in HTML client
Status: CLOSED FIXED
: ThinLinc
Web Access
: pre-1.0
: PC Unknown
: P2 Normal
: 4.7.0
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2016-03-04 10:01 by
Modified: 2016-09-23 10:10 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From cendio 2016-03-04 10:01:12
noVNC lacks the fence and continuous update extensions necessary for decent
performance on high latency networks.
------- Comment #1 From cendio 2016-03-11 17:09:46 -------
Some related discussions upstream:

https://github.com/kanaka/noVNC/issues/221
https://github.com/kanaka/noVNC/issues/563
------- Comment #2 From cendio 2016-06-01 14:11:50 -------
Nagle might also be an issue here, which we've overlooked. It can easily cause
latency jitter that can confuse the system.

We can easily disable it in tlwebaccess (and noVNC can do the same in their
proxy), but the browser side is another matter. We'll need to investigate what
the browsers are doing and if it can be configured.
------- Comment #3 From cendio 2016-06-01 14:31:53 -------
There is an experiment here that indicates that at least Chrome leaves Nagle
on:

https://github.com/valsteen/socketio-nagle-experiment

No information about a fix except to spam the connection with more data. I also
couldn't find anything in the WebSockets documentation.

However this thread claims that Chrome disables Nagle:

https://bugs.chromium.org/p/chromium/issues/detail?id=450258

as does this, and also claims that Firefox also disables it:

https://bugzilla.mozilla.org/show_bug.cgi?id=542401
------- Comment #6 From cendio 2016-06-09 17:48:54 -------
https://github.com/kanaka/noVNC/pull/623
------- Comment #15 From cendio 2016-06-27 14:44:33 -------
Works well now.

Tester should verify that we get a nice frame rate even with high latency (100+
ms), and no excessive buffering (latency much higher than the wire latency).

Note that noVNC requires a shitload of CPU, so large screen updates are not
recommended.
------- Comment #16 From cendio 2016-06-29 14:17:50 -------
Code review:

It's a little bit unfortunate that the diff between our trunk and our
vendor/current is very big, since it contains the continious updates patch
(without markers). An alternative could have been an upstream feature branch
(following master, not to be merged with master) that we could follow instead,
but in any case this is temporary state (assuming
https://github.com/kanaka/noVNC/pull/623 will be merged). 

Also see bug 5459, but since we have a separate bug for that problem, it does
not cause this bug to be reopened. 

Moving on, I can see that the implementation in r31514 contains 3 FIXMEs. It is
not clear to me what the consequences are; can these be safely ignored, do we
need to fix them now, or do we need bugs for them?:

noVNC/include/rfb.js:
>+        _handle_server_fence_msg: function() {
>+            if (this._sock.rQwait("ServerFence header", 8, 1)) { return false; }
>+            this._sock.rQskipBytes(3); // Padding
>+            var flags = this._sock.rQshift32();
>+            var length = this._sock.rQshift8();
>+
>+            if (this._sock.rQwait("ServerFence payload", length, 9)) { return false; }
>+            var payload = this._sock.rQshiftStr(length); // FIXME: 64 bytes max


>+            // Filter out unsupported flags
>+            // FIXME: support syncNext
>+            flags &= (1<<0) | (1<<1);


>+                case 150: // EndOfContinuousUpdates
>+                    var first = !(this._supportsContinuousUpdates);
>+                    this._supportsContinuousUpdates = true;
>+                    this._enabledContinuousUpdates = false;
>+                    if (first) {
>+                        this._enabledContinuousUpdates = true;
>+                        this._updateContinuousUpdates();
>+                        Util.Info("Enabling continuous updates.");
>+                    } else {
>+                        // FIXME: may need to send a framebufferupdaterequest here
>+                    }
>+                    return true;

Reopening to get some input.
------- Comment #17 From cendio 2016-06-30 09:43:15 -------
I'd like to add a reference to bug 2566, which was used for implementing fence
and continuous update extensions for the native client. 

I've done some performance tests now. I've hijacked usdemo.thinlinc.com. While
using a production server for testing has drawbacks, it also has the advantage
of being a real system with a real geographic distance to it. The bandwidth for
upload and download has been tested with scp and is fine:

tl-4.6.0-clients.zip      100%  229MB   5.7MB/s   00:40    

tl-4.6.0-clients.zip      100%  229MB   9.2MB/s   00:25    

Ping time is steady at 130 ms. 

At first, I ran both the TL server and HTML5 client at Cendio, and only
forwarding the traffic to usdemo and back, using two SSH tunnels:

sudo ssh root@usdemo.thinlinc.com -L scilla:301:127.0.0.1:301 -R
301:127.0.0.1:300 -v -N

The HTML5 client was modified to connecto to port 301 instead of 300. With this
test, it was difficult to see any improvement. I guess this is because the
tunnels does not disable Nagle, and interferes with the buffering etc? 

I've also tested to only update websocket.py/tlwebaccess - to write in larger
chunks and to disable Nagle - but still run noVNC without support for fence/CU.
I've verified that this does not give as good performance as with fence/CU. 

More test data, with format borrowed from bug 2566:

Test 1: TL 4.6.0 - before fence/CU
firefox/youtube      4
menus                6 
move terminal        4  
Totem video          5
oowriter             6  
evince               5


Test 2: TL 4.6.0post - with fence/CU
firefox/youtube      6
menus                8 
move terminal        6  Note: better FPS, but also a rubberband effect  
Totem video          7
oowriter             7        
evince               5
------- Comment #18 From cendio 2016-06-30 15:45:36 -------
(In reply to comment #16)
> 
> Moving on, I can see that the implementation in r31514 contains 3 FIXMEs. It is
> not clear to me what the consequences are; can these be safely ignored, do we
> need to fix them now, or do we need bugs for them?:
> 

All of them can be ignored for now. Rationale below:

> noVNC/include/rfb.js:
> >+        _handle_server_fence_msg: function() {
> >+            if (this._sock.rQwait("ServerFence header", 8, 1)) { return false; }
> >+            this._sock.rQskipBytes(3); // Padding
> >+            var flags = this._sock.rQshift32();
> >+            var length = this._sock.rQshift8();
> >+
> >+            if (this._sock.rQwait("ServerFence payload", length, 9)) { return false; }
> >+            var payload = this._sock.rQshiftStr(length); // FIXME: 64 bytes max
> 

The protocol limits the payload to 64 bytes. Right now we allow more than that.
The FIXME is to enforce the limit to encourage protocol adherence.

> 
> >+            // Filter out unsupported flags
> >+            // FIXME: support syncNext
> >+            flags &= (1<<0) | (1<<1);
> 

We implemented the Fence flags necessary to get the desired functionality.
SyncNext is another defined flag, that we did not implement right now but could
be interesting in the future. It is not required and TigerVNC's viewer hasn't
implemented it either.

> 
> >+                case 150: // EndOfContinuousUpdates
> >+                    var first = !(this._supportsContinuousUpdates);
> >+                    this._supportsContinuousUpdates = true;
> >+                    this._enabledContinuousUpdates = false;
> >+                    if (first) {
> >+                        this._enabledContinuousUpdates = true;
> >+                        this._updateContinuousUpdates();
> >+                        Util.Info("Enabling continuous updates.");
> >+                    } else {
> >+                        // FIXME: may need to send a framebufferupdaterequest here
> >+                    }
> >+                    return true;
> 

We theorised that it may be necessary to send another request there in the
event that continuous updates is turned on and then turned off again. However
nothing in the code exposes such functionality right now so we didn't feel it
was worthwhile to explore the (possible) issue further.
------- Comment #19 From cendio 2016-07-04 09:01:56 -------
(In reply to comment #18)
> (In reply to comment #16)
> > 
> > Moving on, I can see that the implementation in r31514 contains 3 FIXMEs. It is
> > not clear to me what the consequences are; can these be safely ignored, do we
> > need to fix them now, or do we need bugs for them?:
> > 
> 
> All of them can be ignored for now. Rationale below:

Ok then, closing.
------- Comment #20 From cendio 2016-09-21 11:00:28 -------
We're seeing large lag under some conditions when continuous updates is turned
on. Need to have another look.
------- Comment #21 From cendio 2016-09-21 11:12:54 -------
The problem is that timers are used as a form of "yield" in noVNC. However
timers have a minimum delay of 4 ms in HTML (when used repeatedly). noVNC uses
a timer after each message, limiting the processing to 250 messages per second.

However we get at least three message per update, meaning a limit of 83 updates
per second. This can easily be exceeded for small updates. At that point we
start getting a backlog in noVNC that keeps increasing. Xvnc will try to back
off, but fail since this isn't a bandwidth issue.
------- Comment #23 From cendio 2016-09-21 16:45:08 -------
The big issue is fixed now, but I've also discovered something related. It is
however not a regression so I'll be adding a new bug for that.

Tester should make sure the mouse isn't laggy on Internet Explorer. Best seen
with a solid colour background.
------- Comment #24 From cendio 2016-09-22 14:38:54 -------
Tested with IE11 on Windows 10 Embedded, no significant lag is visible
- radical difference to build 5226
- no adverse effects using Chrome on MacOS...

Closing.