Currently a VNC screen update is sent all in one go. This is problematic when we have large updates as it messes with the congestion control. The congestion control relies on "ping" packets between the client and server. If there is too large delay between these packages, it will revert back to 1 update per RTT.
We should introduce a mechanism to split up framebuffer updates over several messages. That way we could interleave other commands between the pieces.
One easy way to solve this is to introduce a new "end of frame" pseudo-encoding. The client would then look for such a rect instead of triggering on the FBU message.
One slight implementation problem that needs to be solved is the timestamps of the interleaved fence messages. They will most likely all be the same, even thought they will be gradually pushed onto the pipe, giving an artificially inflated RTT.
We probably also need to solve bug 4734 for this to have the desired effect.
Thinking about this a bit more, splitting up the updates is probably not the answer, or at least not the whole answer.
In a "normal" congestion control system, you get lots and lots of feedback. TCP gets more or less an ACK every 1500 bytes. I don't think we can split things up that much without getting clobbered by the overhead.
A better alternative is to improve the congestion control handling:
Instead of doing the whole fire-and-forget we do now, we instead keep track of every congestion marker we send out. When we get one back we can look at the next expected marker (that's still on the wire), compare the time stamps and data counters and make a qualified guess as to how much data has already been consumed. That way we only need two markers per RTT.
Another upside of this is that we don't need to fill the markers with a lot of data, reducing their overhead to just a few bytes.
An improved version of the congestion control system is available here:
Works well for the obvious cases, but needs more testing to make sure there aren't any corner cases that misbehave.
There's a PR now which will hopefully get some testing:
Created attachment 782 [details]
Measurement for current code
Test run with the existing code and dump of relevant stats:
- Throttled to 1 Mbps and 1500 ms RTT (BDP 183 KiB)
- Started glxgears in fullscreen over 1024x768
- Paused it a long period (trigger close of congestion window)
- Paused it a short period
The columns show our computed congestion window and the computed in flight data, and the same values as reported by the underlying socket.
Created attachment 783 [details]
Measurement for new code
Same measurement with the suggested changes.
Created attachment 784 [details]
Visualise the data using:
$ gnuplot -c plot.gp before.csv
The above files show the technical details, and how the internal calculations are better.
But from a user point of view the differences are not that noticeable. When the latency is low things are the same as before (as it isn't congested). If the bandwidth is insufficient then we also get the same behaviour (the old code relied on the fallback hack though). A small win that we can get the same without hacks though.
However in the case there is sufficient bandwidth but high latency, then the new code shows an improvement. It is able to reach the maximum bandwidth much faster now. Mostly you see this when some kind of animation starts, but you can also see it in the middle of videos if there is a prolonged "fade to black". At that point the frame rate drops again, but the new code picks up speed much faster.
> * Performance should be equal, or better, compared to an old server for all network conditions
I could not find a scenario where it was worse.
> * Frame rate (and bandwidth) should reach the maximum faster compared to the old server (mostly noticeable with high latency)
I need 100+ ms to see this, but it does indeed happen. It becomes more noticeable the higher the latency is.