www.cendio.com
Bug 5106 - improve VNC performance (latency, bandwidth, CPU usage, quality, ...)
: improve VNC performance (latency, bandwidth, CPU usage, quality, ...)
Status: NEW
: ThinLinc
VNC
: trunk
: PC Unknown
: P2 Normal
: ProductCouncil
Assigned To:
:
:
: 45 1215 1814 2566 2926 2928 2930 2931 3009 3893 4020 4333 4734 4735 4915 4956 4982 5026 5242 5618 5648 5719 5748 5812 7139
:
  Show dependency treegraph
 
Reported: 2014-04-22 12:45 by
Modified: 2018-11-19 13:31 (History)
Acceptance Criteria:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From cendio 2014-04-22 12:45:40
This is a tracker bug to document ideas and plans for how we improve
performance related aspects of the VNC/RFB portion in ThinLinc. This includes:

 - Bandwidth usage

 - Latency sensitivity

 - CPU usage

 - Image quality, perceived or otherwise.
------- Comment #1 From cendio 2014-04-22 13:00:32 -------
Historical things:

 - Evaluate ZRLE (bug 1215)

 - Evaluate deferred updates (bug 2931)

 - Evaluate Comparing Update Tracker (bug 4020)

 - Handle high latency networks (bug 2566)

 - SIMD accelerated JPEG (bug 2926)

 - Optimise JPEG entry coding (bug 3009)

 - Double buffering (bug 2930)
------- Comment #2 From cendio 2014-04-22 14:01:10 -------
Future plans:

- Clean up and move magic out of the Tight encoder (bug 4915 and bug 5026)

- Record sessions:

Needed so we can properly evaluate parts of the system with real world data.
Such recordings need to losslessly record every change, with timing
information.

- Evaluate existing components more thoroughly/systematically

Are we getting a good trade off for CPU/bandwidth for these:

 * Comparing Update Tracker

 * Deferred updates

 * Solid area detection

 * Sub-rect analysis

- Evaluate CODECs

Write some framework that allows us to get reliably numbers for what the trade
off is between CPU, bandwidth and quality for our CODECs. We need to consider
the effects the compression and quality settings has, as well as what type of
data might fit each CODEC best.

A suitable metric for quality might be SSIM.

- Evaluate sub-rect classes

Connected to the above, we need to investigate if we are classifying sub-rects
in a way that makes sense with regard to real world data, and allows each CODEC
to play to its strengths.

We also should look at the optimal size for sub rects. Perhaps we should have a
fixed grid of e.g. 64x64 pixels? That could also simplify update tracking and
save CPU time there, at the cost of sending a bit more data.

- Server side compression selection

The server is much better at selecting which CODEC to use and how much
compression to apply.

- Automatic lossless refresh (bug 2928)

- Partial updates (bug 4735)

Allows us to multiplex other packets when we have a large update and limited
bandwidth. Primarily it allows us to send out more congestion control packets,
which normally stops working as the update size approaches and exceeds BDP. A
protocol extension will be needed to fix this.

- Socket buffer (bug 4734)

Currently we'll hang the server if we fill the outgoing socket. We should have
a memory buffer for when this happens to avoid screwing up timing sensitive
things (like the congestion control).

- High level compression improvements:

 * Track multiple copy operations (limited to one right now)

 * More aggressive search for solid rects.

 * Reduce colours before encoding

 * Image caching (bug 1814)

 * Multi threaded encoding (OpenMP?)

- New CODECs:

 * Video CODECs (h.263/264/265, VP7/VP8/VP9, Theora, Dirac, ...)

 * PNG

 * Delta compared to previous frame

 * JPEG encoding separate from Tight, to encourage use

- Hardware acceleration (GPU, more SIMD) (bug 4982)

- Tweak deflateInit

You can give it hints about what kind of data to expect, and we might not be
using that fully today.

- Dithering (bug 3893)

- Lossless hint

The server might do lots of lossy stuff to the data, not just JPEG, so we need
a general way for the client to say that it wants pixel perfect data at all
times.

- Indicate stalled/slow connection (bug 4956)
------- Comment #3 From cendio 2014-07-10 13:10:04 -------
DRC har done some tests with the x264 codec:

http://www.turbovnc.org/About/H264
------- Comment #4 From cendio 2014-12-03 15:04:04 -------
* Check the overhead of the JPEG JFIF header. There is a lot of data there
which might make JPEG a bad choice for small blocks. Perhaps there are
pre-defined quantisation and huffman tables that can be used in such cases?
------- Comment #5 From cendio 2014-12-14 10:21:25 -------
http://bellard.org/bpg/
------- Comment #6 From cendio 2015-01-22 14:27:59 -------
- Finish SIMD on ARM (it apparently still lacks significant portions)
------- Comment #7 From cendio 2015-02-12 14:18:48 -------
 - The palette code doesn't mask away irrelevant bits. This means that a single
colour might get multiple entries and reduce the efficiency of the compression.
------- Comment #8 From cendio 2015-02-12 15:16:42 -------
 - Look at using ORC from GStreamer. It can probably optimise and generalise
many heavy loops. Might also be useful in libjpeg-turbo.
------- Comment #9 From cendio 2015-04-24 09:36:52 -------
 - Look at Google's Snappy as an alternative to zlib based encodings.
------- Comment #10 From cendio 2015-07-30 10:22:47 -------
(In reply to comment #6)
> - Finish SIMD on ARM (it apparently still lacks significant portions)

The important bits have been fixed now. There are still some missing pieces,
but nothing that is normally used. Coverage for SIMD optimisation in
libjpeg-turbo can be seen here:

http://www.libjpeg-turbo.org/About/SIMDCoverage
------- Comment #11 From cendio 2015-09-23 12:42:05 -------
More Google compression algorithms to look at:

 - Zofpli: https://github.com/google/zopfli

 - Brotli:
http://google-opensource.blogspot.se/2015/09/introducing-brotli-new-compression.html
------- Comment #12 From cendio 2015-10-05 11:46:59 -------
 - FLIF (http://flif.info/)
------- Comment #13 From cendio 2015-12-01 08:58:58 -------
Both Intel and Cloudflare have implemented optimised versions of zlib that are
a drop in replacement:

https://github.com/jtkukunas/zlib
https://github.com/cloudflare/zlib

Benchmarks:

https://www.snellman.net/blog/archive/2014-08-04-comparison-of-intel-and-cloudflare-zlib-patches.html
https://blog.cloudflare.com/cloudflare-fights-cancer/
------- Comment #14 From cendio 2015-12-03 14:36:39 -------
 - The HTML client uses compression level 9 even though we've seen little
benefit above level 3. The native client and the server use 2 by default.
------- Comment #15 From cendio 2015-12-11 13:08:10 -------
We might want to see if pixman has some routines that can be used to speed up
parts of the pipeline.
------- Comment #16 From cendio 2016-01-19 12:50:42 -------
More benchmarking of modern compression routines:

http://richg42.blogspot.se/2016/01/zlib-in-serious-danger-of-becoming.html
------- Comment #17 From cendio 2016-07-07 15:25:25 -------
Apple joins the fray with their own compression algorithm:

https://github.com/lzfse/lzfse
------- Comment #20 From cendio 2016-11-08 16:40:19 -------
(In reply to comment #2)
> 
> Are we getting a good trade off for CPU/bandwidth for these:
> 
>  * Comparing Update Tracker
> 

I did a quick test running youtube under various desktop environments and
measured how much compression the CUT achieved:

 Gnome 3: 1:1.6
 KDE, Xfce (without composite), MATE (with composite): 1:5

So it seems to do a lot for this use case for most environments. However Gnome
3 maintained a much lower frame rate, which could be also be a large factor.
------- Comment #21 From cendio 2017-03-26 11:51:39 -------
> Guetzli is a JPEG encoder that aims for excellent compression density at high visual quality.
> Guetzli-generated images are typically 20-30% smaller than images of equivalent quality 
> generated by libjpeg. Guetzli generates only sequential (nonprogressive) JPEGs due to faster 
> decompression speeds they offer.

https://arstechnica.com/information-technology/2017/03/google-jpeg-guetzli-encoder-file-size/

https://github.com/google/guetzli/
------- Comment #22 From cendio 2017-11-13 12:04:36 -------
And another compression format:

https://github.com/inikep/lizard
------- Comment #23 From cendio 2017-12-05 14:06:24 -------
(bug 4333 degraded from own bug to just an idea comment:)

Most SIMD instructions require a specific memory alignment to work optimally.
As discovered on bug 4328, we're not always doing this properly. It could be
worth investigating if we can improve alignment and thereby improve
performance.
------- Comment #25 From cendio 2018-11-19 13:31:17 -------
Microsoft developed a new lossless CODEC as part of RemoteFX which could be
interesting to have in VNC as well. Some overview here:

https://msdn.microsoft.com/en-us/library/ff635792.aspx