Bug 5648 - multi-threaded VNC rect encoding (Xvnc side)
Summary: multi-threaded VNC rect encoding (Xvnc side)
Status: NEW
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VNC (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: MediumPrio
Assignee: Pierre Ossman
URL:
Keywords:
Depends on:
Blocks: performance
  Show dependency treegraph
 
Reported: 2015-09-23 11:00 CEST by Peter Åstrand
Modified: 2015-12-23 15:29 CET (History)
1 user (show)

See Also:
Acceptance Criteria:


Attachments

Description Peter Åstrand cendio 2015-09-23 11:00:35 CEST
We have bug 5618 for supporting multi threaded decoding on the client side. This bug is for multi threaded encoding on the server side. TurboVNC has this:

http://www.turbovnc.org/About/TigerVNC

TurboVNC:
Multi-threaded Tight encoding
Comment 1 Pierre Ossman cendio 2015-12-08 16:56:34 CET
This was a lot easier to do by copying a lot of the work done for bug 5618. I was able to get this up and running in a few hours:

https://github.com/TigerVNC/tigervnc/tree/master/tests/results/multicore

Results are however more mixed here. My i7 and a Xeon server we have show improvements of around 50%. But an Opteron server is regressing around 5% whilst burning through 50% more CPU. Need to investigate further what's happening.
Comment 2 Pierre Ossman cendio 2015-12-09 09:12:53 CET
Wrong URL. This is the proper one:

https://github.com/CendioOssman/tigervnc/tree/multicore
Comment 3 Pierre Ossman cendio 2015-12-22 16:08:22 CET
Urgh. This is turning out to be extremely complex to measure. The good news is that it seems like it is a win on all systems. But it is very difficult to get good numbers stating so.

 a) perf is broken on RHEL 6 (which the opteron machine runs). It fails to count threads in many cases, giving absurdly low values.

 b) I am having serious doubts that rusage/task_clock is being counted correctly. It is much higher in the multi-core cases, but no other measurement is. So it seems like it is not actually doing anything and some kind of idle time is being included in that figure. IOW the CPU should be available for other things. Looking at cycles and instructions is probably better, but a) is causing issues there.

 c) The tests have problems ramping up the CPU speed. This is the primary cause of why the Opteron looks so bad in the tests. Forcing maximum speed makes the multi-core tests surpass the single-core ones every time. So it seems like we keep ending up on cores that are clocked down and it takes a while for them to ramp up. Whilst in the single-core case we stay on the same core and get it up to a nice, fast speed. This explains why the Opteron is having so much problems as it is a 32-core machine and it is very likely that we end up on unused cores there.
Comment 4 Pierre Ossman cendio 2015-12-23 15:29:21 CET
I restructured the queueing a bit to avoid stalls and it is better now, but not completely fixed. It is however at the point where there are no regressions compared to the old, single-core code.

The github branch has been updated with the new code.

Note You need to log in before you can comment on or make changes to this bug.