Bug 5106 - improve VNC performance (latency, bandwidth, CPU usage, quality, ...)
Summary: improve VNC performance (latency, bandwidth, CPU usage, quality, ...)
Status: NEW
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VNC (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: MediumPrio
Assignee: Pierre Ossman
URL:
Keywords:
Depends on: 1814 3455 3893 4734 4956 4982 5648 45 1215 2566 2926 2928 2930 2931 3009 4020 4333 4735 4915 5026 5242 5618 5719 5748 5812 7139 7463
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-22 12:45 CEST by Pierre Ossman
Modified: 2020-02-18 12:40 CET (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Pierre Ossman cendio 2014-04-22 12:45:40 CEST
This is a tracker bug to document ideas and plans for how we improve performance related aspects of the VNC/RFB portion in ThinLinc. This includes:

 - Bandwidth usage

 - Latency sensitivity

 - CPU usage

 - Image quality, perceived or otherwise.
Comment 1 Pierre Ossman cendio 2014-04-22 13:00:32 CEST
Historical things:

 - Evaluate ZRLE (bug 1215)

 - Evaluate deferred updates (bug 2931)

 - Evaluate Comparing Update Tracker (bug 4020)

 - Handle high latency networks (bug 2566)

 - SIMD accelerated JPEG (bug 2926)

 - Optimise JPEG entry coding (bug 3009)

 - Double buffering (bug 2930)
Comment 2 Pierre Ossman cendio 2014-04-22 14:01:10 CEST
Future plans:

- Clean up and move magic out of the Tight encoder (bug 4915 and bug 5026)

- Record sessions:

Needed so we can properly evaluate parts of the system with real world data. Such recordings need to losslessly record every change, with timing information.

- Evaluate existing components more thoroughly/systematically

Are we getting a good trade off for CPU/bandwidth for these:

 * Comparing Update Tracker

 * Deferred updates

 * Solid area detection

 * Sub-rect analysis

- Evaluate CODECs

Write some framework that allows us to get reliably numbers for what the trade off is between CPU, bandwidth and quality for our CODECs. We need to consider the effects the compression and quality settings has, as well as what type of data might fit each CODEC best.

A suitable metric for quality might be SSIM.

- Evaluate sub-rect classes

Connected to the above, we need to investigate if we are classifying sub-rects in a way that makes sense with regard to real world data, and allows each CODEC to play to its strengths.

We also should look at the optimal size for sub rects. Perhaps we should have a fixed grid of e.g. 64x64 pixels? That could also simplify update tracking and save CPU time there, at the cost of sending a bit more data.

- Server side compression selection

The server is much better at selecting which CODEC to use and how much compression to apply.

- Automatic lossless refresh (bug 2928)

- Partial updates (bug 4735)

Allows us to multiplex other packets when we have a large update and limited bandwidth. Primarily it allows us to send out more congestion control packets, which normally stops working as the update size approaches and exceeds BDP. A protocol extension will be needed to fix this.

- Socket buffer (bug 4734)

Currently we'll hang the server if we fill the outgoing socket. We should have a memory buffer for when this happens to avoid screwing up timing sensitive things (like the congestion control).

- High level compression improvements:

 * Track multiple copy operations (limited to one right now)

 * More aggressive search for solid rects.

 * Reduce colours before encoding

 * Image caching (bug 1814)

 * Multi threaded encoding (OpenMP?)

- New CODECs:

 * Video CODECs (h.263/264/265, VP7/VP8/VP9, Theora, Dirac, ...)

 * PNG

 * Delta compared to previous frame

 * JPEG encoding separate from Tight, to encourage use

- Hardware acceleration (GPU, more SIMD) (bug 4982)

- Tweak deflateInit

You can give it hints about what kind of data to expect, and we might not be using that fully today.

- Dithering (bug 3893)

- Lossless hint

The server might do lots of lossy stuff to the data, not just JPEG, so we need a general way for the client to say that it wants pixel perfect data at all times.

- Indicate stalled/slow connection (bug 4956)
Comment 3 Pierre Ossman cendio 2014-07-10 13:10:04 CEST
DRC har done some tests with the x264 codec:

http://www.turbovnc.org/About/H264
Comment 4 Pierre Ossman cendio 2014-12-03 15:04:04 CET
* Check the overhead of the JPEG JFIF header. There is a lot of data there which might make JPEG a bad choice for small blocks. Perhaps there are pre-defined quantisation and huffman tables that can be used in such cases?
Comment 5 Aaron Sowry cendio 2014-12-14 10:21:25 CET
http://bellard.org/bpg/
Comment 6 Pierre Ossman cendio 2015-01-22 14:27:59 CET
- Finish SIMD on ARM (it apparently still lacks significant portions)
Comment 7 Pierre Ossman cendio 2015-02-12 14:18:48 CET
 - The palette code doesn't mask away irrelevant bits. This means that a single colour might get multiple entries and reduce the efficiency of the compression.
Comment 8 Pierre Ossman cendio 2015-02-12 15:16:42 CET
 - Look at using ORC from GStreamer. It can probably optimise and generalise many heavy loops. Might also be useful in libjpeg-turbo.
Comment 9 Pierre Ossman cendio 2015-04-24 09:36:52 CEST
 - Look at Google's Snappy as an alternative to zlib based encodings.
Comment 10 Pierre Ossman cendio 2015-07-30 10:22:47 CEST
(In reply to comment #6)
> - Finish SIMD on ARM (it apparently still lacks significant portions)

The important bits have been fixed now. There are still some missing pieces, but nothing that is normally used. Coverage for SIMD optimisation in libjpeg-turbo can be seen here:

http://www.libjpeg-turbo.org/About/SIMDCoverage
Comment 11 Pierre Ossman cendio 2015-09-23 12:42:05 CEST
More Google compression algorithms to look at:

 - Zofpli: https://github.com/google/zopfli

 - Brotli: http://google-opensource.blogspot.se/2015/09/introducing-brotli-new-compression.html
Comment 12 Pierre Ossman cendio 2015-10-05 11:46:59 CEST
 - FLIF (http://flif.info/)
Comment 13 Pierre Ossman cendio 2015-12-01 08:58:58 CET
Both Intel and Cloudflare have implemented optimised versions of zlib that are a drop in replacement:

https://github.com/jtkukunas/zlib
https://github.com/cloudflare/zlib

Benchmarks:

https://www.snellman.net/blog/archive/2014-08-04-comparison-of-intel-and-cloudflare-zlib-patches.html
https://blog.cloudflare.com/cloudflare-fights-cancer/
Comment 14 Pierre Ossman cendio 2015-12-03 14:36:39 CET
 - The HTML client uses compression level 9 even though we've seen little benefit above level 3. The native client and the server use 2 by default.
Comment 15 Pierre Ossman cendio 2015-12-11 13:08:10 CET
We might want to see if pixman has some routines that can be used to speed up parts of the pipeline.
Comment 16 Pierre Ossman cendio 2016-01-19 12:50:42 CET
More benchmarking of modern compression routines:

http://richg42.blogspot.se/2016/01/zlib-in-serious-danger-of-becoming.html
Comment 17 Pierre Ossman cendio 2016-07-07 15:25:25 CEST
Apple joins the fray with their own compression algorithm:

https://github.com/lzfse/lzfse
Comment 20 Pierre Ossman cendio 2016-11-08 16:40:19 CET
(In reply to comment #2)
> 
> Are we getting a good trade off for CPU/bandwidth for these:
> 
>  * Comparing Update Tracker
> 

I did a quick test running youtube under various desktop environments and measured how much compression the CUT achieved:

 Gnome 3: 1:1.6
 KDE, Xfce (without composite), MATE (with composite): 1:5

So it seems to do a lot for this use case for most environments. However Gnome 3 maintained a much lower frame rate, which could be also be a large factor.
Comment 21 Karl Mikaelsson cendio 2017-03-26 11:51:39 CEST
> Guetzli is a JPEG encoder that aims for excellent compression density at high visual quality.
> Guetzli-generated images are typically 20-30% smaller than images of equivalent quality 
> generated by libjpeg. Guetzli generates only sequential (nonprogressive) JPEGs due to faster 
> decompression speeds they offer.

https://arstechnica.com/information-technology/2017/03/google-jpeg-guetzli-encoder-file-size/

https://github.com/google/guetzli/
Comment 22 Pierre Ossman cendio 2017-11-13 12:04:36 CET
And another compression format:

https://github.com/inikep/lizard
Comment 23 Pierre Ossman cendio 2017-12-05 14:06:24 CET
(bug 4333 degraded from own bug to just an idea comment:)

Most SIMD instructions require a specific memory alignment to work optimally.
As discovered on bug 4328, we're not always doing this properly. It could be
worth investigating if we can improve alignment and thereby improve
performance.
Comment 25 Pierre Ossman cendio 2018-11-19 13:31:17 CET
Microsoft developed a new lossless CODEC as part of RemoteFX which could be interesting to have in VNC as well. Some overview here:

https://msdn.microsoft.com/en-us/library/ff635792.aspx

Note You need to log in before you can comment on or make changes to this bug.