Bug 5415 - performance issue on macOS
Summary: performance issue on macOS
Status: NEW
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VNC (show other bugs)
Version: pre-1.0
Hardware: Mac macOS
: P2 Normal
Target Milestone: MediumPrio
Assignee: Pierre Ossman
URL:
Keywords:
: 4551 (view as bug list)
Depends on: 6153 7047
Blocks:
  Show dependency treegraph
 
Reported: 2015-02-02 10:42 CET by Pierre Ossman
Modified: 2018-05-02 16:52 CEST (History)
3 users (show)

See Also:
Acceptance Criteria:


Attachments
Profiler trace (185.25 KB, application/octet-stream)
2015-02-02 19:55 CET, Igor Izyumin
Details
Reference code (6.40 KB, text/plain)
2016-12-29 11:24 CET, Pierre Ossman
Details
Instruments run for old code (284.26 KB, image/png)
2017-01-04 16:34 CET, Pierre Ossman
Details
Instruments run for new code (270.90 KB, image/png)
2017-01-04 16:34 CET, Pierre Ossman
Details

Description Pierre Ossman cendio 2015-02-02 10:42:31 CET
We've gotten a report that the OS X client is having performance issues on the latest version of OS X. Retina (hidpi) mode was also active, but it is uncertain if this is a contributing factor.

These links references some changes in OS X that can cause performance issue:

http://1014.org/index.php?article=516

http://www.juce.com/forum/topic/redraw-slowness-138-vs-128

We should investigate this and see to what extent we are affected and if we can fix it.
Comment 2 Igor Izyumin 2015-02-02 19:54:18 CET
The performance issue is most obvious with apps that do high-framerate screen updates.  In our case, this is a CAD tool that draws its own mouse cursor.  On OS X, there is a large amount of lag that makes the tool annoying to use.  This does not occur on the Windows client when it runs in a VM on the same hardware.

This same problem occurs on a non-Retina Macbook Pro running Yosemite, so perhaps it is not related to high dpi.  On the Retina machine, the slowdown also occurs on an external monitor, and does not improve much when the window is shrunk.

Investigation with the Instruments profiler on the vncviewer process (which is using most of the CPU) while the slow redraw is happening with our CAD tool shows that an inordinate amount of CPU time is spent inside some kind of color format conversion routine (img_colormatch_read) in the system graphics library.  I have attached the profiler trace.
Comment 3 Igor Izyumin 2015-02-02 19:55:20 CET
Created attachment 597 [details]
Profiler trace
Comment 4 Igor Izyumin 2015-02-10 03:15:43 CET
I've done some more investigating by replacing the Thinlinc VNC viewer binary with the open-source Tigervnc one.  Fixing the color conversion issue (per the code snippet in the first link) improved drawing performance, but didn't really help the slow cursor motion.  This occurs because of a different issue.  Unlike Windows and Linux, OS X interpolates the mouse cursor, thus generating an unusually high number of pointer events.  When a program on the remote end updates the screen in response to each mouse event, this causes events to pile up in the event queue and results in significant lag.  The vncviewer program has a command-line option to rate-limit mouse events (PointerEventInterval), but it is disabled by default.  Changing the default to something more reasonable (e.g. 30 ms) completely eliminated this problem and made our CAD program work beautifully.  It would be nice if this was a configurable option in the client GUI, since it might be useful to increase this interval on network links with high latency.
Comment 5 Pierre Ossman cendio 2015-06-18 16:24:48 CEST
I'm seeing rather high CPU usage on our 10.10 machine in the lab, so we should be able to investigate this issue on that machine.
Comment 6 Pierre Ossman cendio 2015-08-27 12:16:18 CEST
I tried to do a quick test to fix this, but I couldn't see any improvements in CPU usage. Could you share the changes you did to vncviewer?
Comment 7 Pierre Ossman cendio 2015-08-27 13:20:28 CEST
I had a look in instruments, and there is indeed some change in what gets called. The color space conversion is gone, but it is replaced by other conversion routines. I've tried various formats without any real reduction in CPU usage. So right now it seems to be unknown how to efficiently put data on the screen on OS X. We need to investigate a lot more.
Comment 8 Pierre Ossman cendio 2015-08-27 13:55:27 CEST
So I did a test with color space conversion fixed (using CMGetSystemProfile(), but we should probably use CGDisplayCopyColorSpace()), and kCGImageAlphaNoneSkipFirst, and CGContextSetBlendMode() set to kCGBlendModeCopy. At this point I think it is as good as it gets as the CPU consumption is now in the function CGBlt_copyBytes(). Unfortunately the CPU consumption there is as bad as it was before any changes.

So maybe we need to re-evaluate the entire architecture of using a CGBitmap.
Comment 9 Igor Izyumin 2015-08-27 20:00:52 CEST
Pierre, I don't know if you saw my other comment, but I tracked down the slowness we were experiencing to the PointerEventInterval being set to zero by default; changing it to 25 ms fixed the lag problem, and the color space conversion seems to have had nothing to do with it after all.

You should consider changing the default value for PointerEventInterval to something other than zero on OS X, since mouse movement seems to generate a lot more events than on Windows or Linux, and remote programs that redraw in response to mouse movement can cause a lot of lag.  Maybe it could even be dynamically adjusted based on the round-trip delay to prevent mouse events from piling up in the queue.
Comment 10 Pierre Ossman cendio 2016-12-19 16:03:15 CET
Perhaps a clue:

http://robert.ocallahan.org/2010/05/cglayer-performance-trap-with-isflipped_03.html

We don't see to fiddle with "ifFlipped", but we do modify the transformation matrix.
Comment 11 Pierre Ossman cendio 2016-12-22 14:35:29 CET
I've managed to find a few issues at least:

 a) FLTK does something when it sets up drawing that slows things down considerably. There are some changes on trunk for this, but they don't seem to have much effect in practice. Bypassing FLTK and drawing right away gives a massive performance boost.

 b) Despite documentation, CGColorSpaceCreateDeviceRGB() does still invoke color space conversion. Using CGDisplayCopyColorSpace(kCGDirectMainDisplay) saves us quite a few cycles.

 c) The frame rate is capped to the display rate. So CGContextDrawImage() can block. We're not wasting any CPU, but we're also not able to spend CPU time on something else. Will have to see if there is a way to do things asynchronously.
Comment 12 Pierre Ossman cendio 2016-12-29 11:20:49 CET
(In reply to comment #11)
> I've managed to find a few issues at least:
> 
>  a) FLTK does something when it sets up drawing that slows things down
> considerably. There are some changes on trunk for this, but they don't seem to
> have much effect in practice. Bypassing FLTK and drawing right away gives a
> massive performance boost.
> 

Turns out that this is caused the by the transformation matrix that FLTK sets up. Cancelling that matrix allows the normal paths to draw at full speed.

>  c) The frame rate is capped to the display rate. So CGContextDrawImage() can
> block. We're not wasting any CPU, but we're also not able to spend CPU time on
> something else. Will have to see if there is a way to do things asynchronously.

This is fundamental to how macOS does things since 10.4. It was possible to disable this until 10.10, but no longer.

The proper fix for this is to redesign FLTK so that it only draws things when requested by the system. I.e. when drawRect: is called.
Comment 13 Pierre Ossman cendio 2016-12-29 11:21:36 CET
I also found that a CGImage with a simle data provider was faster than a CGBitmapContext.
Comment 14 Pierre Ossman cendio 2016-12-29 11:24:10 CET
Created attachment 766 [details]
Reference code

Reference pure cocoa test program that gives a base line for the performance we can achieve.
Comment 15 Pierre Ossman cendio 2017-01-04 16:26:08 CET
*** Bug 4551 has been marked as a duplicate of this bug. ***
Comment 16 Pierre Ossman cendio 2017-01-04 16:33:40 CET
The easy stuff is now upstream:

https://github.com/TigerVNC/tigervnc/commit/41a0c151c554a37fbffece8ef36848ed47fd17d3

Together with a performance measuring tool:

https://github.com/TigerVNC/tigervnc/commit/38a1c70260f3457977f073cc1535a542877e8671

However the tool shows the same result before and after the fix:

> Full window update:
> 
> Rendering time: 15.6056 ms/frame
> Rendering rate: 50.3942 Mpixels/s
> 
> Partial window update:
> 
> Rendering time: 15.6887 ms/frame
> Rendering rate: 12.7137 Mpixels/s

The problem is that we're being throttled to 60 Hz (~16 ms) by the system because of FLTK's update model. So we'll have to fix that to get really sane numbers here.

However running Instruments before and after show that we are spending way less CPU now. The tests execute for 20 seconds, and we can see 11 seconds vs 2 seconds active CPU usage. Screenshots of this are attached.
Comment 17 Pierre Ossman cendio 2017-01-04 16:34:31 CET
Created attachment 767 [details]
Instruments run for old code
Comment 18 Pierre Ossman cendio 2017-01-04 16:34:49 CET
Created attachment 768 [details]
Instruments run for new code

Note You need to log in before you can comment on or make changes to this bug.