We've gotten a report that the OS X client is having performance issues on the latest version of OS X. Retina (hidpi) mode was also active, but it is uncertain if this is a contributing factor.
These links references some changes in OS X that can cause performance issue:
We should investigate this and see to what extent we are affected and if we can fix it.
The performance issue is most obvious with apps that do high-framerate screen updates. In our case, this is a CAD tool that draws its own mouse cursor. On OS X, there is a large amount of lag that makes the tool annoying to use. This does not occur on the Windows client when it runs in a VM on the same hardware.
This same problem occurs on a non-Retina Macbook Pro running Yosemite, so perhaps it is not related to high dpi. On the Retina machine, the slowdown also occurs on an external monitor, and does not improve much when the window is shrunk.
Investigation with the Instruments profiler on the vncviewer process (which is using most of the CPU) while the slow redraw is happening with our CAD tool shows that an inordinate amount of CPU time is spent inside some kind of color format conversion routine (img_colormatch_read) in the system graphics library. I have attached the profiler trace.
Created attachment 597 [details]
I've done some more investigating by replacing the Thinlinc VNC viewer binary with the open-source Tigervnc one. Fixing the color conversion issue (per the code snippet in the first link) improved drawing performance, but didn't really help the slow cursor motion. This occurs because of a different issue. Unlike Windows and Linux, OS X interpolates the mouse cursor, thus generating an unusually high number of pointer events. When a program on the remote end updates the screen in response to each mouse event, this causes events to pile up in the event queue and results in significant lag. The vncviewer program has a command-line option to rate-limit mouse events (PointerEventInterval), but it is disabled by default. Changing the default to something more reasonable (e.g. 30 ms) completely eliminated this problem and made our CAD program work beautifully. It would be nice if this was a configurable option in the client GUI, since it might be useful to increase this interval on network links with high latency.
I'm seeing rather high CPU usage on our 10.10 machine in the lab, so we should be able to investigate this issue on that machine.
I tried to do a quick test to fix this, but I couldn't see any improvements in CPU usage. Could you share the changes you did to vncviewer?
I had a look in instruments, and there is indeed some change in what gets called. The color space conversion is gone, but it is replaced by other conversion routines. I've tried various formats without any real reduction in CPU usage. So right now it seems to be unknown how to efficiently put data on the screen on OS X. We need to investigate a lot more.
So I did a test with color space conversion fixed (using CMGetSystemProfile(), but we should probably use CGDisplayCopyColorSpace()), and kCGImageAlphaNoneSkipFirst, and CGContextSetBlendMode() set to kCGBlendModeCopy. At this point I think it is as good as it gets as the CPU consumption is now in the function CGBlt_copyBytes(). Unfortunately the CPU consumption there is as bad as it was before any changes.
So maybe we need to re-evaluate the entire architecture of using a CGBitmap.
Pierre, I don't know if you saw my other comment, but I tracked down the slowness we were experiencing to the PointerEventInterval being set to zero by default; changing it to 25 ms fixed the lag problem, and the color space conversion seems to have had nothing to do with it after all.
You should consider changing the default value for PointerEventInterval to something other than zero on OS X, since mouse movement seems to generate a lot more events than on Windows or Linux, and remote programs that redraw in response to mouse movement can cause a lot of lag. Maybe it could even be dynamically adjusted based on the round-trip delay to prevent mouse events from piling up in the queue.
Perhaps a clue:
We don't see to fiddle with "ifFlipped", but we do modify the transformation matrix.
I've managed to find a few issues at least:
a) FLTK does something when it sets up drawing that slows things down considerably. There are some changes on trunk for this, but they don't seem to have much effect in practice. Bypassing FLTK and drawing right away gives a massive performance boost.
b) Despite documentation, CGColorSpaceCreateDeviceRGB() does still invoke color space conversion. Using CGDisplayCopyColorSpace(kCGDirectMainDisplay) saves us quite a few cycles.
c) The frame rate is capped to the display rate. So CGContextDrawImage() can block. We're not wasting any CPU, but we're also not able to spend CPU time on something else. Will have to see if there is a way to do things asynchronously.
(In reply to comment #11)
> I've managed to find a few issues at least:
> a) FLTK does something when it sets up drawing that slows things down
> considerably. There are some changes on trunk for this, but they don't seem to
> have much effect in practice. Bypassing FLTK and drawing right away gives a
> massive performance boost.
Turns out that this is caused the by the transformation matrix that FLTK sets up. Cancelling that matrix allows the normal paths to draw at full speed.
> c) The frame rate is capped to the display rate. So CGContextDrawImage() can
> block. We're not wasting any CPU, but we're also not able to spend CPU time on
> something else. Will have to see if there is a way to do things asynchronously.
This is fundamental to how macOS does things since 10.4. It was possible to disable this until 10.10, but no longer.
The proper fix for this is to redesign FLTK so that it only draws things when requested by the system. I.e. when drawRect: is called.
I also found that a CGImage with a simle data provider was faster than a CGBitmapContext.
Created attachment 766 [details]
Reference pure cocoa test program that gives a base line for the performance we can achieve.
*** Bug 4551 has been marked as a duplicate of this bug. ***
The easy stuff is now upstream:
Together with a performance measuring tool:
However the tool shows the same result before and after the fix:
> Full window update:
> Rendering time: 15.6056 ms/frame
> Rendering rate: 50.3942 Mpixels/s
> Partial window update:
> Rendering time: 15.6887 ms/frame
> Rendering rate: 12.7137 Mpixels/s
The problem is that we're being throttled to 60 Hz (~16 ms) by the system because of FLTK's update model. So we'll have to fix that to get really sane numbers here.
However running Instruments before and after show that we are spending way less CPU now. The tests execute for 20 seconds, and we can see 11 seconds vs 2 seconds active CPU usage. Screenshots of this are attached.
Created attachment 767 [details]
Instruments run for old code
Created attachment 768 [details]
Instruments run for new code