www.cendio.com
Bug 5902 - audio sometimes refuses to work
: audio sometimes refuses to work
Status: NEW
: ThinLinc
Sound
: pre-1.0
: PC Unknown
: P2 Normal
: MediumPrio
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2016-05-19 16:01 by
Modified: 2016-06-08 12:21 (History)
Acceptance Criteria:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From cendio 2016-05-19 16:01:02
I have been seeing an intermittent problem for some time that audio might
refuse to work in a session. It's always been just as audio should start to
play, and only on my Fedora 23 workstation. Sometimes it resolves itself, and
sometimes I have to kill the session. No errors in any log.

The problem is also related somehow to the client's PulseAudio server, as
bypassing the session server does not help. Local sound on the workstation
still works though.

Unfortunately it has been very difficult to debug as it happens about once
every two weeks. Until this week, where I found a test case that can reproduce
it:

 - Ubuntu 14.04
 - Unity
 - Super Tux Kart

This seems to trigger the bug nine times out of ten. I have not tried to
recreate the system from scratch to see how fragile the setup is.
------- Comment #1 From cendio 2016-05-19 16:01:50 -------
Discussion started on the upstream mailing list:

https://lists.freedesktop.org/archives/pulseaudio-discuss/2016-May/026240.html
------- Comment #2 From cendio 2016-05-19 16:24:58 -------
Problem identified. It is caused by a reduction in latency (buffer size) and
all related parameters. The scenario is this:

 1. Large latency, large buffer, large target fill, large minimum request.
Silence in queue (i.e. buffer is full).

 2. Buffer drains slightly, making it fall below target fill. It is however
still below the minimum request, so nothing is sent to the client.

 3. The client requests a reduced latency, buffer is reduced, target fill is
reduced, minimum request is reduced. The buffer now greatly exceeds target fill
as it was almost up to the previous target fill level. This means that the
server will not be asking the client for more data for a while.

 4. Some time later we've drained most of the excess and are almost back down
to the target fill level. However the data requested in 2 is sufficiently large
that we never fall back down below target fill. Hence we never start requesting
for more data. And we already decided in 2 not to send a request for the first
portion.

So the fundamental problem here is that requesting data from the client can be
triggered not only by the buffer emptying, but also by parameters changing. And
specifically changes to the minimum request size is not handled properly.

In theory this can be caused by any program that triggers a massive reduction
in buffer latency.
------- Comment #3 From cendio 2016-05-19 16:49:05 -------
Sent suggested patches to upstream:

https://lists.freedesktop.org/archives/pulseaudio-discuss/2016-May/026248.html

However this only fixes the problem long term as the bug is in the system's
PulseAudio, not ours. It's not obvious if we can do a workaround until then.
------- Comment #4 From cendio 2016-05-19 16:58:04 -------
The fix seems to provoke some glitches in the audio though. Not sure if it
means the patch is bad, or if it simply exposes bugs in the tunnel module. I
can see some chatter about buffer sizes in the log, but no underruns.
------- Comment #5 From cendio 2016-05-20 11:14:46 -------
I turned up logging on the other two servers (system and session), and
unfortunately nothing logged from those either when the sounds is crackling.

A large glitch was however noticed by the system server, which promptly
increased its minimum latency to 4.0 ms. However our tunnel module fought back
a bit and it took a few turns until it got the latency up high enough.

There is definitely more that can be done here, but I'm moving it to a separate
bug. Opened bug 5903 for improving the latency handling.

The initial crackling is still a mystery though. Perhaps we should just start
at a few ms minimum rather than zero?