Bug 5748 - upgrade pixman to get new performance enhancements
Summary: upgrade pixman to get new performance enhancements
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Build system (show other bugs)
Version: pre-1.0
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.6.0
Assignee: Pierre Ossman
URL:
Keywords: relnotes, samuel_tester
Depends on:
Blocks: performance
  Show dependency treegraph
 
Reported: 2015-12-11 13:01 CET by Pierre Ossman
Modified: 2016-04-12 12:24 CEST (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Pierre Ossman cendio 2015-12-11 13:01:21 CET
I was playing around with bug 5648 and noticed that most of the time was not spent encoding, but rather a function called bits_image_fetch_bilinear_affine_pad_x8r8g8b8(). This is part of pixman.

Our pixman is a few years behind, so I tried upgrading it to the latest. And there were massive improvements. The relevant function fell from being 50% of the CPU usage to 15%. Encoding is now back on top as the main CPU bottleneck.

There were no problems playing youtube in fullscreen at 1080p when both this upgrade and bug 5648 were in place, whilst before there were noticeable frame drops.
Comment 1 Pierre Ossman cendio 2015-12-11 13:05:17 CET
For reference, the test case was Firefox 42.0 on my Fedora 23 workstation (i7-3770) and youtube in full screen using Firefox' video player.
Comment 2 Pierre Ossman cendio 2015-12-30 23:04:42 CET
I did some benchmarking using this tool I found here:

http://cgit.freedesktop.org/~aplattner/xrenderbenchmark/

Unfortunately it showed no significant changes between the old and new code. But we must conclude that this test is then insufficient as we saw noticeable improvements with Firefox.

So I found this little nice tirade on why all synthetic benchmarks suck:

https://cworth.org/intel/performance_measurement/

And it also points to a tracing tool in cairo that can replay graphical operations from real application use. So let's see what that gives us.
Comment 3 Pierre Ossman cendio 2016-01-04 15:19:05 CET
Urgh. That didn't really show much either. Need to make sure I'm not doing the tests incorrectly. Could also try getting a trace from the firefox usage we saw was improved.
Comment 4 Pierre Ossman cendio 2016-01-04 15:43:12 CET
No dice. Firefox' rendering of video is not showing up in cairo traces.
Comment 5 Pierre Ossman cendio 2016-01-05 14:27:57 CET
Did some more digging using perf and gdb.

Firefox is doing two CPU heavy operations; upscaling the video to the target size, and compositing it in the browser offscreen pixmap.

The second of this is handled by sse2_composite_src_x888_8888 and was already present in the old version of pixman.

The first step however was only partially accelerated in the old pixman, and Firefox was using things in a way that was not accelerated. The new modes that have been added are bilinear scaling with repeat modes active. The existing code could only handle scaling with no repeat active.

There has also been some acceleration for adding a constant to all pixels of a buffer, and bilinear scaling with a simple mask.



So the quick summary is that many more forms of scaling are now faster. I will try to get a test of exactly how much faster.
Comment 7 Pierre Ossman cendio 2016-01-05 15:33:10 CET
(In reply to comment #5)
>
> The first step however was only partially accelerated in the old pixman, and
> Firefox was using things in a way that was not accelerated. The new modes that
> have been added are bilinear scaling with repeat modes active. The existing
> code could only handle scaling with no repeat active.
> 

Only partially correct. The old code handled different repeat modes as well. What it didn't handle was format conversion from x888 to 8888. So it's becoming more and more of a corner case (although Firefox is a pretty common use case).

I've modified xrenderbenchmark to replicate these conditions. With the old pixman I get:

> $ DISPLAY=:2 ./xrenderbenchmark -ops SRC -tests filter -time 20 -argb
> xrenderbenchmark version 1.0.2-agp1
> X Server from: The X.Org Foundation, Release: 11400000
> 	Xrender version: 0.11
> ---------------------------------------------
> Test: Src
> 		 Transformation/Bilinear filter...................96600 frames in 20.002 seconds = 4829.511 FPS

And perf shows this function being used: bits_image_fetch_bilinear_affine_pad_x8r8g8b8


An upgraded pixman shows:

> $ DISPLAY=:2 ./xrenderbenchmark -ops SRC -tests filter -time 20 -argb
> xrenderbenchmark version 1.0.2-agp1
> X Server from: The X.Org Foundation, Release: 11400000
> 	Xrender version: 0.11
> ---------------------------------------------
> Test: Src
> 		 Transformation/Bilinear filter...................595650 frames in 20.0009 seconds = 29781.218 FPS

And perf now shows this function instead: fast_composite_scaled_bilinear_sse2_x888_8888_pad_SRC

So about a 500% increase. Not shabby. :)
Comment 8 Pierre Ossman cendio 2016-01-05 15:34:24 CET
There might also be more, smaller improvements in pixman. There has been 208 commits since our last update.
Comment 9 Samuel Mannehed cendio 2016-01-08 17:05:55 CET
I can't find any regressions. I have tested build 4996 on Fedora 23 using a variety of different media players and browsers in the session. I have also compared the performance to the old code playing a video in firefox and verified the performance improvements. Closing.

Note You need to log in before you can comment on or make changes to this bug.