Bug 5589 - upgrade PulseAudio
Summary: upgrade PulseAudio
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Sound (show other bugs)
Version: pre-1.0
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.5.0
Assignee: Pierre Ossman
URL:
Keywords: prosaic, samuel_tester
Depends on:
Blocks: 4194
  Show dependency treegraph
 
Reported: 2015-07-01 16:13 CEST by Pierre Ossman
Modified: 2020-01-16 14:00 CET (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments
Debug log file from freeze of thinlinc session. (308.09 KB, text/x-log)
2015-08-11 15:13 CEST, Henrik Andersson
Details
tlclient.log from eLux RL where audio doesn't work (6.82 KB, text/x-log)
2015-08-21 15:26 CEST, Samuel Mannehed
Details

Description Pierre Ossman cendio 2015-07-01 16:13:51 CEST
We're currently on 4.0 and the latest version is 6.0. We should upgrade to get any bug fixes and improvements, and to stay up to date with what's happening.
Comment 1 Pierre Ossman cendio 2015-07-06 13:46:58 CEST
We've now upgraded to 6.0 and switched over to the new tunnel modules. Something is very off though. Using Fedora 22 as both the server and client, and without bug 4194, I'm getting either complete stall or choppy audio when trying to use youtube. I'm also getting insane errors in the log:

2015-07-06T13:44:44: pulseaudio[E]: W: [pulseaudio] protocol-native.c: Client sent non-aligned memblock: index 0, length 16364, frame size: 8
Comment 2 Pierre Ossman cendio 2015-07-06 16:18:33 CEST
I find fixes for this upstream. I'm seeing a few more issues though:

 - No logging of buffer handling in the new modules. Would like at least changes to buffer size, and underrun/overflow

 - No volume control in the new modules

 - Latency is rather low. This is normally a good thing, but I'm seeing a lot of underruns on at least the armhf board (with F22) until it forces a sufficient increase in the buffers. Hopefully we can fix this without screwing up the low latency on other systems.
Comment 3 Pierre Ossman cendio 2015-07-06 16:21:18 CEST
(In reply to comment #1)
> We've now upgraded to 6.0 and switched over to the new tunnel modules.
> Something is very off though. Using Fedora 22 as both the server and client,
> and without bug 4194, I'm getting either complete stall or choppy audio when
> trying to use youtube. I'm also getting insane errors in the log:
> 
> 2015-07-06T13:44:44: pulseaudio[E]: W: [pulseaudio] protocol-native.c: Client
> sent non-aligned memblock: index 0, length 16364, frame size: 8

Fixed in r30576. See upstream:

https://bugs.freedesktop.org/show_bug.cgi?id=88452
Comment 4 Pierre Ossman cendio 2015-07-07 09:07:31 CEST
(In reply to comment #2)
>  - Latency is rather low. This is normally a good thing, but I'm seeing a lot
> of underruns on at least the armhf board (with F22) until it forces a
> sufficient increase in the buffers. Hopefully we can fix this without screwing
> up the low latency on other systems.

Not even aplay and paplay work reliably on that setup so I'm going to dismiss that one as a platform issue rather than a ThinLinc bug, even if it happens to be worse now than with 4.4.0. I'll try another distribution and see if it's more sane.
Comment 5 Pierre Ossman cendio 2015-07-07 09:42:22 CEST
We also lost X11 support when we moved to the new modules. This time it's a bit more tricky to solve as the handling is in the common libpulse, and not just the tunnel modules. I think we'll have to do some dlopen() magic.
Comment 6 Pierre Ossman cendio 2015-07-07 11:25:06 CEST
(In reply to comment #4)
> (In reply to comment #2)
> >  - Latency is rather low. This is normally a good thing, but I'm seeing a lot
> > of underruns on at least the armhf board (with F22) until it forces a
> > sufficient increase in the buffers. Hopefully we can fix this without screwing
> > up the low latency on other systems.
> 
> Not even aplay and paplay work reliably on that setup so I'm going to dismiss
> that one as a platform issue rather than a ThinLinc bug, even if it happens to
> be worse now than with 4.4.0. I'll try another distribution and see if it's
> more sane.

Works well with the same board but with Xubuntu 13.10 instead.
Comment 7 Pierre Ossman cendio 2015-07-07 13:07:25 CEST
(In reply to comment #2)
> 
>  - No logging of buffer handling in the new modules. Would like at least
> changes to buffer size, and underrun/overflow
> 

r30578. Also sent upstream:

https://bugs.freedesktop.org/show_bug.cgi?id=91257
Comment 8 Pierre Ossman cendio 2015-07-09 12:34:47 CEST
(In reply to comment #2)
> 
>  - No volume control in the new modules
> 

Fixed in r30581 and sent upstream:
https://bugs.freedesktop.org/show_bug.cgi?id=91280
Comment 9 Pierre Ossman cendio 2015-07-09 14:53:22 CEST
(In reply to comment #5)
> We also lost X11 support when we moved to the new modules. This time it's a bit
> more tricky to solve as the handling is in the common libpulse, and not just
> the tunnel modules. I think we'll have to do some dlopen() magic.

Fixed in r30583.
Comment 10 Pierre Ossman cendio 2015-07-09 15:12:58 CEST
Should be all done now.

Tester needs to verify playback and recording on every arch. Preferably test against an older server in order to avoid influences by bug 4194.

On Linux you also need to test the four modes (auto, pulse, alsa, oss). One platform is sufficient for this.

Volume control needs to be tested on at least one platform. Remember to test both playback and recording. Also remember to test both client to server and server to client. Note that gnome-control-center is buggy when handling recording streams. pactl can be used for more low level control.

Reading of X11 properties needs to be tested on Linux. E.g. screwing up the PULSE_SERVER root window property and seeing that this causes us to fail to connect to the local pulseaudio server.
Comment 11 Pierre Ossman cendio 2015-07-16 16:53:33 CEST
It's not working well on the ARM board, even with Xubuntu. It gets into a buffer underrun state that it has severe difficulties getting out of. Most easily triggered together with bug 4194.

The problem seems to be that our new PulseAudio is aiming for a much lower latency. When playing youtube in Firefox I can see that the tunnel sink is configuring itself for a latency of just 25 ms. Our old client used a fixed latency of 250 ms. And I guess the board is simply too weak to keep up with the 25 ms latency.

Switching over to ALSA also shows the issue, but that module is smart enough to compensate. It quickly detects that the configured latency is too low and increases it (without glitches even). It stabilises slightly under 200 ms.

I see a few options here:

 - Revert behaviour back to a static 250 ms. Not nice as it punishes all decent platforms.

 - Conditionally force a lower bound on weak platforms. Not sure how to identify them though. All ARM?

 - Make the tunnel module more like the ALSA one and increase the latency as it detects problems.
Comment 12 Pierre Ossman cendio 2015-07-17 10:49:24 CEST
(In reply to comment #11)
> 
>  - Make the tunnel module more like the ALSA one and increase the latency as it
> detects problems.

A very basic version of this has been implemented in r30610 and sent upstream:

https://bugs.freedesktop.org/show_bug.cgi?id=91370
Comment 13 Henrik Andersson cendio 2015-08-11 15:13:36 CEST
Created attachment 635 [details]
Debug log file from freeze of thinlinc session.

Testing client build 4843 on Fc22 x86_64 workstation against ThinLinc 4.4.0 on eudemo.thinlinc.com the following problems occured:

1. Played a youtube video for >15 minutes and the pulseaudio server died on the client with the following assert:

  2015-08-11T14:16:17: pulseaudio[E]: E: [tunnel-sink] queue.c: Assertion '!e->next' failed at pulsecore/queue.c:104, function pa_queue_pop(). Aborting.

2. Retried to reproduce the above issue but failed. Video played fine after 40 minutes. Then I tried to sleep on the tab key in a terminal to produce several bells and then metacity hang. Killing client side pulseaudio server released the freeze in the ThinLinc session. This time i also ran thinlinc client with debug output, see attached file.
Comment 14 Henrik Andersson cendio 2015-08-11 15:25:12 CEST
(In reply to comment #13)

> 2. Retried to reproduce the above issue but failed. Video played fine after 40
> minutes. Then I tried to sleep on the tab key in a terminal to produce several
> bells and then metacity hang. Killing client side pulseaudio server released
> the freeze in the ThinLinc session. This time i also ran thinlinc client with
> debug output, see attached file.

This also triggers another bug:

Pulseaudio server on client is running at 100% and when closing tlclient which should cleanup hung processes wont kill the pulseaudio server process. Killing pulseaudio server from terminal using SIGKILL works as expected.
Comment 15 Henrik Andersson cendio 2015-08-11 15:41:05 CEST
(In reply to comment #13)

> 
> 2. Retried to reproduce the above issue but failed. Video played fine after 40
> minutes. Then I tried to sleep on the tab key in a terminal to produce several
> bells and then metacity hang. Killing client side pulseaudio server released
> the freeze in the ThinLinc session. This time i also ran thinlinc client with
> debug output, see attached file.

This is not reproduceable using nightly build of client on windows platform.
Comment 16 Henrik Andersson cendio 2015-08-11 15:45:21 CEST
(In reply to comment #13)

> 2. Retried to reproduce the above issue but failed. Video played fine after 40
> minutes. Then I tried to sleep on the tab key in a terminal to produce several
> bells and then metacity hang. Killing client side pulseaudio server released
> the freeze in the ThinLinc session. This time i also ran thinlinc client with
> debug output, see attached file.

Not reproducible using same Fc22 workstation but with 4.4.0 x86_64 client.
Comment 17 Pierre Ossman cendio 2015-08-12 08:36:05 CEST
(In reply to comment #14)
> 
> This also triggers another bug:
> 
> Pulseaudio server on client is running at 100% and when closing tlclient which
> should cleanup hung processes wont kill the pulseaudio server process. Killing
> pulseaudio server from terminal using SIGKILL works as expected.


Bug 5606.
Comment 18 Pierre Ossman cendio 2015-08-12 10:44:46 CEST
(In reply to comment #13)
> Created an attachment (id=635) [details]
> Debug log file from freeze of thinlinc session.
> 
> Testing client build 4843 on Fc22 x86_64 workstation against ThinLinc 4.4.0 on
> eudemo.thinlinc.com the following problems occured:
> 
> 1. Played a youtube video for >15 minutes and the pulseaudio server died on the
> client with the following assert:
> 
>   2015-08-11T14:16:17: pulseaudio[E]: E: [tunnel-sink] queue.c: Assertion
> '!e->next' failed at pulsecore/queue.c:104, function pa_queue_pop(). Aborting.
> 
> 2. Retried to reproduce the above issue but failed. Video played fine after 40
> minutes. Then I tried to sleep on the tab key in a terminal to produce several
> bells and then metacity hang. Killing client side pulseaudio server released
> the freeze in the ThinLinc session. This time i also ran thinlinc client with
> debug output, see attached file.

Both are probably the same bug. The mainloop was accessed from two threads, which isn't supported. Fixed in r30654.

The bug was in the volume handling, so it is primarily that which needs re-testing.
Comment 19 Henrik Andersson cendio 2015-08-13 08:59:04 CEST
Commit r30547 upgrades dependencies for the new version of pulseaudio. However building libjson which is one component of the update fails. Seems like a missing 
build dep for autoconf.

The log is for arm buidl but all archs fails the same way:

  + /opt/cendio-build/bin/cbrun armhf make -j4
  (CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh   /home/hean01/Development/cenbuild/repo/rpmbuild/BUILD/json-c-0.12/missing autoheader)
  /home/hean01/Development/cenbuild/repo/rpmbuild/BUILD/json-c-0.12/missing: line   81: autoheader: command not found
  WARNING: 'autoheader' is missing on your system.
           You should only need it if you modified 'acconfig.h' or
           'configure.ac' or m4 files included by 'configure.ac'.
           The 'autoheader' program is part of the GNU Autoconf package:
           <http://www.gnu.org/software/autoconf/>
           It also requires GNU m4 and Perl in order to run:
           <http://www.gnu.org/software/m4/>
           <http://www.perl.org/>
  make: *** [config.h.in] Error 127
Comment 20 Pierre Ossman cendio 2015-08-13 10:36:21 CEST
Fixed in r30662.
Comment 21 Henrik Andersson cendio 2015-08-13 12:04:07 CEST
(In reply to comment #20)
> Fixed in r30662.

Works as expected.
Comment 22 Samuel Mannehed cendio 2015-08-21 15:22:39 CEST
Using 4.4.0 server on Fedora 20 and a client from nightly build:

I have tested playback and recording with success using the client on Windows 10, OSX 10.10 and Fedora 22 (pulseaudio backend).

When I got to testing on eLux RL (the S900 terminal) neither playback nor recording worked. Looking at tlclient.log it seems like when it doesn't detect PulseAudio it doesn't go further and just gives up on audio.
Comment 23 Samuel Mannehed cendio 2015-08-21 15:26:08 CEST
Created attachment 636 [details]
tlclient.log from eLux RL where audio doesn't work
Comment 24 Samuel Mannehed cendio 2015-08-21 15:31:19 CEST
(In reply to comment #22)
> When I got to testing on eLux RL (the S900 terminal) neither playback nor
> recording worked. Looking at tlclient.log it seems like when it doesn't detect
> PulseAudio it doesn't go further and just gives up on audio.

I also verified that there is no problem when using a 4.4.0 client on eLux RL.
Comment 25 Pierre Ossman cendio 2015-08-21 15:45:40 CEST
(In reply to comment #22)
> When I got to testing on eLux RL (the S900 terminal) neither playback nor
> recording worked. Looking at tlclient.log it seems like when it doesn't detect
> PulseAudio it doesn't go further and just gives up on audio.

Urgh. Our simplistic approach of auto detecting the audio system doesn't work well with the new tunnel modules. They behave a bit too asynchronously. I think we'll have to go with a more complex detection logic. :/
Comment 26 Pierre Ossman cendio 2015-08-21 16:28:31 CEST
New method in r30695.
Comment 27 Samuel Mannehed cendio 2015-08-28 11:27:46 CEST
(In reply to comment #10)
> Tester needs to verify playback and recording on every arch. Preferably test
> against an older server in order to avoid influences by bug 4194.

I have verified playback and recording using a ThinLinc 4.4.0 server and audacity on a Fedora 20 machine. I have tested the following clients:

- tlclient build 4865 (rpm) on Fedora 22
- tlclient build 4865 (exe) on Windows 10
- tlclient build 4871 (UC_RP) on eLux RP 4.8.0
- tlclient build 4871 (armhf) on Xubuntu 15.04
- tlclient build 4871 (UC_ARM) on eLux RT 3.0.0

> On Linux you also need to test the four modes (auto, pulse, alsa, oss). One
> platform is sufficient for this.

Verified auto, pulse and alsa on eLux RP 4.8.0 with client build 4871. I can't find any machine to test oss on.. skipping that.

> Volume control needs to be tested on at least one platform. Remember to test
> both playback and recording. Also remember to test both client to server and
> server to client. Note that gnome-control-center is buggy when handling
> recording streams. pactl can be used for more low level control.

Verified on Fedora 22 with client build 4865.

> Reading of X11 properties needs to be tested on Linux. E.g. screwing up the
> PULSE_SERVER root window property and seeing that this causes us to fail to
> connect to the local pulseaudio server.

Verified on Fedora 22 with client build 4865. I used the following command to change the property:

# xprop -root -f PULSE_SERVER 8s -set PULSE_SERVER tcp:example.com:4713

After that neither playback nor recording worked in tlclient. Running the client with -d 5 gave the following information in the log:

2015-08-28T10:55:11: pulseaudio[E]: D: [tunnel-sink] context.c: Trying to connect to tcp:example.com:4713...
2015-08-28T10:55:11: pulseaudio[E]: D: [tunnel-source] context.c: Trying to connect to tcp:example.com:4713...
2015-08-28T10:55:11: pulseaudio[E]: D: [tunnel-sink] module-tunnel-sink-new.c: Context failed: Connection refused.
2015-08-28T10:55:11: pulseaudio[E]: D: [tunnel-source] module-tunnel-source-new.c: Context failed with err Connection refused.

Note You need to log in before you can comment on or make changes to this bug.