Pulseaudio lock of pulsecookie file is pessimal on NFS

Bug #817269 reported by Thomas Bushnell, BSG
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
pulseaudio (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Pulseaudio uses a cookie file (normally ~/.pulse-cookie). This file is manipulated by the code in src/pulsecore/authkey.c. Currently it does this to read the key:
1) Open the file
2) Acquire a write-lock for the file.
3) Read the file.
4) If the file is not a good cookie, generate a new cookie and write it to the file.
5) Close the file.

If more than one thread is opening the cookie at once, then there is contention at step (2) which is rarely necessary. Instead, step (2) should acquire only a read-lock, and then step (4) should promote that to a write-lock before writing the file.

This is particularly bad in the case where the .pulse-cookie file is on NFS. In that case, the lock contention forces a thirty-second backoff from whichever thread gets there second.

I am happy to work with y'all in engineering a patch for this. It requires obviously a change to src/pulsecore/authkey.c, and a corresponding one in src/pulsecore/core-util.c where the actual lock syscall occurs.

Changed in pulseaudio (Ubuntu):
status: New → Confirmed
Revision history for this message
Thomas Bushnell, BSG (tbushnell) wrote :

The following script (if ~ is in NFS) will demonstrate the problem. The two pactl processes both try to acquire the lock on the file simultaneously, and whichever one loses will take thirty seconds before it wins, because of the needless contention (and the unfortunate facts of how lockd in NFS works).

!/bin/bash

NUM_PROCS=2

procs=()
for i in $(seq 1 $NUM_PROCS); do
  pactl list > /dev/null &
  procs[$i]=$!
done
for i in $(seq 1 $NUM_PROCS); do
  wait ${procs[$i]}
done

Revision history for this message
Thomas Bushnell, BSG (tbushnell) wrote :

The following patch solves the problem.

Revision history for this message
David Henningsson (diwic) wrote :

Hi Thomas and thanks for the patch!
I don't know much about NFS myself, but I've sent it to PulseAudio upstream and awaiting their comment on the issue.

http://lists.freedesktop.org/archives/pulseaudio-discuss/2011-August/011036.html

Revision history for this message
David Henningsson (diwic) wrote :

Hi Thomas,

There were some questions about your patch upstream, could you please comment on this message?

http://lists.freedesktop.org/archives/pulseaudio-discuss/2011-August/011039.html

Changed in pulseaudio (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Thomas Bushnell, BSG (tbushnell) wrote :

Thank you, David. I've joined that list and added my thoughts.

Daniel T Chen (crimsun)
Changed in pulseaudio (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Thomas Bushnell, BSG (tbushnell) wrote :

I think this may have been caused by a local configuration change. We will not know for certain for a few weeks. I would like to leave the bug alone until I can confirm this newer understanding, at which point I'll close it.

There isn't any need to spend energy on it right now.

Revision history for this message
James (purpleidea) wrote :

Dear others,

I believe I am experiencing this issue or something similar. I have:

* an NFS Server
* Centos 6.2 Clients Mounting /home/ from NFS server.
* I've been able to trigger the problem consistently using the bash snippet.
* Symptom: NFS server gets hosed, and all other clients hang because of Denial of Service (I think)

On Server (repeatedly...):
Mar 28 16:15:50 nfsserver kernel: [17297451.712162] rpcbind: server ws-1 not responding, timed out
Mar 28 16:16:50 nfsserver kernel: [17297511.617176] lockd: server ws-1 not responding, timed out
Mar 28 16:16:50 nfsserver kernel: [17297511.617188] lockd: couldn't create RPC handle for ws-1

On Client (repeatedly):
Mar 28 16:12:10 workstation kernel: Shorewall:net2fw:DROP:IN=eth0 OUT= MAC=00:30:48:b3:30:1e:00:30:48:9d:7f:ed:08:00 SRC=172.16.1.21 DST=172.16.1.141 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=26398 DF PROTO=TCP SPT=47553 DPT=111 WINDOW=5840 RES=0x00 SYN URGP=0

This means for some reason the SERVER is trying to access the CLIENT on port 111 (and is getting denied) This should never actually happen, and I noticed it in the logs.

In any case, any help to resolve this would be appreciate. Thank you.

Revision history for this message
Thomas Bushnell BSG (tb-becket) wrote :

In our case, the problem was traced to lockd and portmap. Specifically, without access to lockd, NFS can only do the 30 second fallback method of retrying locks. We had restricted portmap to listen only on localhost, which is why lockd was unable to function properly. Once we unrestricted portmap appropriately, the problem was solved.

Changed in pulseaudio (Ubuntu):
status: Triaged → Invalid
Revision history for this message
James (purpleidea) wrote :

FWIW: I've noticed that by changing /etc/pulse/client.conf and setting:

cookie-file = /tmp/foobar-pulse-cookie

I can now run the above bash script without breaking anything, and it also returns right away. However, since multiple users can log on to the same machine, it would make sense to have client.conf allow something like:

cookie-file = /tmp/$USER-pulse-cookie

to prevent collisions from two users. Maybe it does, and I just don't know about this?
Also, these are workarounds, and it would be great to solve this problem too :)

Thanks!

Revision history for this message
James (purpleidea) wrote :

@thomas: The NFS server I'm using is an existing one that my predecessor setup. Can you comment on how you fixed portmap? I assume you mean on the server side ?

The server is listening on all interfaces and without any firewall blocking those ports...

James (purpleidea)
Changed in pulseaudio (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
David Henningsson (diwic) wrote :

@James, in the long term one should maybe move the cookie file into $XDG_CONFIG_DIR. I'm no FHS expert, but it seems reasonable.

Revision history for this message
James (purpleidea) wrote :

@David, can the /etc/pulse/client.conf file take an argument like:

cookie-file = /tmp/$USER-pulse-cookie

How do I accomplish moving the cookie file so that more than one user can login simultaneously ?

?

@anyone
Also I am having this locking issue. How does one fix this please ? Thank you.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for pulseaudio (Ubuntu) because there has been no activity for 60 days.]

Changed in pulseaudio (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.