pulse audio hangs in kernel call

Bug #1880835 reported by csita
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

there is no audio device accessible any more and pulse-audio process can not be terminated
in kernel log there is such a output:

This makes the complete audio unstable in the laptop.

I think it is started to happen when I installed Nvidia driver and there is an external monitor attached with DP audio device.

[ 364.150644] INFO: task pulseaudio:2183 blocked for more than 120 seconds.
[ 364.150646] Tainted: P OE 5.4.0-31-generic #35-Ubuntu
[ 364.150647] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 364.150647] pulseaudio D 0 2183 2175 0x00004120
[ 364.150649] Call Trace:
[ 364.150653] __schedule+0x2e3/0x740
[ 364.150655] ? dma_pte_clear_level+0x142/0x190
[ 364.150655] schedule+0x42/0xb0
[ 364.150656] schedule_timeout+0x203/0x2f0
[ 364.150657] wait_for_completion+0xb1/0x120
[ 364.150659] ? wake_up_q+0x70/0x70
[ 364.150663] fw_run_transaction+0xdc/0x110 [firewire_core]
[ 364.150665] ? fw_core_handle_bus_reset+0x440/0x440 [firewire_core]
[ 364.150666] ? free_iova_fast+0x18/0x30
[ 364.150667] ? close_transaction+0x110/0x110 [firewire_core]
[ 364.150668] ? intel_unmap+0xd9/0x1c0
[ 364.150670] ? transmit_phy_packet_callback+0x20/0x20 [firewire_core]
[ 364.150671] snd_fw_transaction+0xbc/0x110 [snd_firewire_lib]
[ 364.150673] ? avc_general_get_plug_info+0x100/0x100 [snd_firewire_lib]
[ 364.150674] pcr_modify+0x65/0x100 [snd_firewire_lib]
[ 364.150675] cmp_connection_break+0x37/0x50 [snd_firewire_lib]
[ 364.150677] snd_oxfw_stream_start_duplex+0x1ee/0x2e0 [snd_oxfw]
[ 364.150678] ? wait_woken+0x80/0x80
[ 364.150679] pcm_capture_prepare+0x26/0x50 [snd_oxfw]
[ 364.150682] snd_pcm_do_prepare+0x1e/0x30 [snd_pcm]
[ 364.150684] snd_pcm_action_single+0x3c/0x80 [snd_pcm]
[ 364.150686] snd_pcm_action_nonatomic+0x6e/0x80 [snd_pcm]
[ 364.150687] snd_pcm_prepare+0x6a/0x90 [snd_pcm]
[ 364.150689] snd_pcm_common_ioctl+0x28c/0xc00 [snd_pcm]
[ 364.150691] snd_pcm_ioctl+0x27/0x40 [snd_pcm]
[ 364.150692] do_vfs_ioctl+0x407/0x670
[ 364.150694] ? __secure_computing+0x42/0xe0
[ 364.150694] ksys_ioctl+0x67/0x90
[ 364.150695] __x64_sys_ioctl+0x1a/0x20
[ 364.150697] do_syscall_64+0x57/0x190
[ 364.150698] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 364.150699] RIP: 0033:0x7f0bbfa1a37b
[ 364.150701] Code: Bad RIP value.
[ 364.150701] RSP: 002b:00007ffed915d308 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 364.150702] RAX: ffffffffffffffda RBX: 00005607c7b63de0 RCX: 00007f0bbfa1a37b
[ 364.150703] RDX: 0000000000000000 RSI: 0000000000004140 RDI: 000000000000006e
[ 364.150703] RBP: 00005607c7b63bb0 R08: 0000000000000000 R09: 0000000000001000
[ 364.150703] R10: 0000000000000004 R11: 0000000000000202 R12: 00005607c7b63de0
[ 364.150704] R13: 00007ffed915d6f0 R14: 0000000000000000 R15: 00007ffed915d480

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-31-generic 5.4.0-31.35
ProcVersionSignature: Ubuntu 5.4.0-31.35-generic 5.4.34
Uname: Linux 5.4.0-31-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Wed May 27 10:07:13 2020
InstallationDate: Installed on 2020-02-22 (94 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Alpha amd64 (20200220)
MachineType: Dell Inc. Precision 7540
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-31-generic root=UUID=7a5890f6-2724-480a-8fa7-778d07cd929d ro quiet splash vt.handoff=7
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: Daemon not responding.
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-31-generic N/A
 linux-backports-modules-5.4.0-31-generic N/A
 linux-firmware 1.187
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/12/2020
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.8.2
dmi.board.name: 0T2FXT
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.8.2:bd03/12/2020:svnDellInc.:pnPrecision7540:pvr:rvnDellInc.:rn0T2FXT:rvrA00:cvnDellInc.:ct10:cvr:
dmi.product.family: Precision
dmi.product.name: Precision 7540
dmi.product.sku: 0926
dmi.sys.vendor: Dell Inc.

Revision history for this message
csita (sz-richard) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
csita (sz-richard) wrote :

After debugging a bit. I narrowed down the problem.

It is related to the FireWire800 auido adapter (Tascam FireOne) the device goes in to a non working state. It seems that the the control part of the communication works. But The SoundCard is in a dead lock. And it hangs the kernel diver communication.

The the problem solved by power cycle the Tascan FireOne device and reboot the ubuntu laptop. After the boot then the audio works again and pulseaudio does not hang any more.

So it seems the the problem caused by the device communication misbehavior on the filerwire port.

I think the kernel driver shall be examined to able to cope with the errors from the device side.

Since this breaks done the complete audio subsystem and the machine can not even rebooted.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please test latest mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7-rc7/

There are several fixes for oxfw which is the driver for FireWire One.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
csita (sz-richard) wrote :

Thanks I try it out and come back with the result.

Thanks for the prompt response

Revision history for this message
csita (sz-richard) wrote :
Download full text (7.5 KiB)

OK I have started to test it now.

My cernel is now:
$ uname -a
Linux DEM-RSZABO-U20 5.7.0-050700rc7-generic #202005242331 SMP Sun May 24 23:33:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

with self compiled and patched nvida driver (the latest driver kernel module with the mainline kernel does not compile without a patch see (https://gitlab.com/snippets/1965550) ):

So Now I run Nvidia driver and external monitor over the thunderbolt 3 where also the FireWire interface is running.
Nvidia
$ nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Wed May 27 19:57:17 2020
Driver Version : 440.82
CUDA Version : 10.2

Attached GPUs : 1
GPU 00000000:01:00.0
    Product Name : Quadro T2000
    Product Brand : Quadro
    Display Mode : Enabled
    Display Active : Enabled
    Persistence Mode : Disabled
    Accounting Mode : Disabled
    Accounting Mode Buffer Size : 4000
    Driver Model
        Current : N/A
        Pending : N/A
    Serial Number : N/A
    GPU UUID : GPU-e970a3f0-ccb4-aca2-c0bb-c63d35d10fdc
    Minor Number : 0
    VBIOS Version : 90.17.1A.00.A3
    MultiGPU Board : No
    Board ID : 0x100
    GPU Part Number : N/A
    Inforom Version
        Image Version : G001.0000.02.04
        OEM Object : 1.1
        ECC Object : N/A
        Power Management Object : N/A
    GPU Operation Mode
        Current : N/A
        Pending : N/A
    GPU Virtualization Mode
        Virtualization Mode : None
        Host VGPU Mode : N/A
    IBMNPU
        Relaxed Ordering Mode : N/A
    PCI
        Bus : 0x01
        Device : 0x00
        Domain : 0x0000
        Device Id : 0x1FB810DE
        Bus Id : 00000000:01:00.0
        Sub System Id : 0x09261028
        GPU Link Info
            PCIe Generation
                Max : 3
                Current : 1
            Link Width
                Max : 16x
                Current : 16x
        Bridge Chip
            Type : N/A
            Firmware : N/A
        Replays Since Reset : 0
        Replay Number Rollovers : 0
        Tx Throughput : 1000 KB/s
        Rx Throughput : 6000 KB/s
    Fan Speed : N/A
    Performance State : P8
    Clocks Throttle Reasons
        Idle : Active
        Applications Clocks Setting : Not Active
        SW Power Cap : Not Active
        HW Slowdown : Not Active
            HW Thermal Slowdown : Not Active
            HW Power Brake Slowdown...

Read more...

Revision history for this message
csita (sz-richard) wrote :

Now The system seems working just fine.

I have to run a long test to see if the problem occurs again

Revision history for this message
csita (sz-richard) wrote :
Download full text (7.0 KiB)

I have been using it for a while now and I manage to reproduce the problem with the mainline kernel :
uname ouput:
Linux DEM-RSZABO-U20 5.7.0-050700rc7-generic #202005242331 SMP Sun May 24 23:33:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

dmesg output:

[ 172.040868] firewire_ohci 0000:27:00.0: enabling device (0000 -> 0002)
[ 172.045220] usb 7-1.4.2: New USB device found, idVendor=0b0e, idProduct=1113, bcdDevice= 3.04
[ 172.045221] usb 7-1.4.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 172.045222] usb 7-1.4.2: Product: Jabra Engage 75
[ 172.045223] usb 7-1.4.2: Manufacturer: GN Audio A/S
[ 172.045224] usb 7-1.4.2: SerialNumber: 033DCCC2D80F
[ 172.098317] firewire_ohci 0000:27:00.0: isochronous cycle inconsistent
[ 172.098355] firewire_ohci 0000:27:00.0: added OHCI v1.10 device as card 1, 8 IR + 8 IT contexts, quirks 0x0, physUB
[ 172.216832] input: GN Audio A/S Jabra Engage 75 as /devices/pci0000:00/0000:00:1c.0/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:02.0/0000:0b:00.0/usb7/7-1/7-1.4/7-1.4.2/7-1.4.2:1.3/0003:0B0E:1113.0013/input/input52
[ 172.274621] jabra 0003:0B0E:1113.0013: input,hiddev2,hidraw7: USB HID v1.11 Device [GN Audio A/S Jabra Engage 75] on usb-0000:0b:00.0-1.4.2/input3
[ 172.518897] igb 0000:0c:00.0 ens1: igb: ens1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 172.618312] firewire_core 0000:27:00.0: created device fw0: GUID 000a2702007351d1, S800
[ 172.625488] firewire_core 0000:27:00.0: created device fw1: GUID 00022e0200000007, S400
[ 172.625489] firewire_core 0000:27:00.0: phy config: new root=ffc0, gap_count=5
[ 172.626660] IPv6: ADDRCONF(NETDEV_CHANGE): ens1: link becomes ready
[ 172.739267] thunderbolt 0-1: new device found, vendor=0x5a device=0xde21
[ 172.739268] thunderbolt 0-1: Other World Computing Thunderbolt 3 Dock
[ 173.131591] thunderbolt 0-101: new device found, vendor=0x1e device=0x1
[ 173.131592] thunderbolt 0-101: LG Electronics 34UM95
[ 173.571435] thunderbolt 0-10101: new device found, vendor=0x1 device=0x8004
[ 173.571436] thunderbolt 0-10101: Apple, Inc. Thunderbolt to FireWire Adapter
[ 178.864484] ACPI: button: The lid device is not compliant to SW_LID.
[ 182.938764] logitech-hidpp-device 0003:046D:406D.000E: HID++ 4.5 device connected.
[ 183.180829] logitech-hidpp-device 0003:046D:406D.000E: multiplier = 8
[ 484.390178] INFO: task pulseaudio:2213 blocked for more than 120 seconds.
[ 484.390181] Tainted: P OE 5.7.0-050700rc7-generic #202005242331
[ 484.390182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 484.390182] pulseaudio D 0 2213 2190 0x00004120
[ 484.390184] Call Trace:
[ 484.390188] __schedule+0x2dd/0x780
[ 484.390190] ? dma_pte_clear_level+0x148/0x1a0
[ 484.390191] schedule+0x55/0xc0
[ 484.390192] schedule_timeout+0x207/0x2f0
[ 484.390194] ? at_context_transmit+0x66/0x1a0 [firewire_ohci]
[ 484.390195] wait_for_completion+0x94/0x100
[ 484.390198] fw_run_transaction+0xdc/0x110 [firewire_core]
[ 484.390200] ? fw_core_handle_bus_reset+0x470/0x470 [firewire_core]
[ 484.390202] ? free_iova_fast+0x18/0x30
[ 484.390204] ? close_transaction+0x110/0x110 [firewir...

Read more...

Revision history for this message
max w beckett (maxywb) wrote :
Download full text (3.3 KiB)

My system has hit the same bug and I do not use firewire. I have not upgraded to mainline kernel. The power cycling work arounds above do not mitigate the issue for me.

 $ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04 LTS"

 $ uname -a
Linux hostname 5.4.0-39-generic #43-Ubuntu SMP Fri Jun 19 10:28:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

 $ pactl info
Connection failure: Connection refused
pa_context_connect() failed: Connection refused

 $ ps aux | grep pulse | grep -v grep
username 12895 17.0 0.0 100788 10908 ? D<s 09:38 0:00 /usr/bin/pulseaudio --daemonize=no --log-target=journal

 $ sudo cat /proc/12895/stack
[<0>] usb_start_wait_urb+0x8f/0x180
[<0>] usb_control_msg+0xe3/0x140
[<0>] snd_usb_ctl_msg+0xe3/0x190 [snd_usb_audio]
[<0>] snd_usb_init_sample_rate+0x235/0x320 [snd_usb_audio]
[<0>] snd_usb_pcm_prepare+0x159/0x460 [snd_usb_audio]
[<0>] snd_pcm_do_prepare+0x1e/0x30 [snd_pcm]
[<0>] snd_pcm_action_single+0x3c/0x80 [snd_pcm]
[<0>] snd_pcm_action_nonatomic+0x6e/0x80 [snd_pcm]
[<0>] snd_pcm_prepare+0x6a/0x90 [snd_pcm]
[<0>] snd_pcm_common_ioctl+0x28c/0xc00 [snd_pcm]
[<0>] snd_pcm_ioctl+0x27/0x40 [snd_pcm]
[<0>] do_vfs_ioctl+0x407/0x670
[<0>] ksys_ioctl+0x67/0x90
[<0>] __x64_sys_ioctl+0x1a/0x20
[<0>] do_syscall_64+0x57/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The stack frames line up w/ the above.

$ journalctl -f | grep pulse # a selection of the output
Jun 27 09:38:50 hostname systemd[2373]: pulseaudio.service: State 'stop-sigterm' timed out. Killing.
Jun 27 09:38:50 hostname systemd[2373]: pulseaudio.service: Killing process 12140 (pulseaudio) with signal SIGKILL.

Jun 27 09:38:56 hostname systemd[2373]: pulseaudio.service: Main process exited, code=killed, status=9/KILL
Jun 27 09:38:56 hostname systemd[2373]: pulseaudio.service: Failed with result 'timeout'.
Jun 27 09:38:56 hostname systemd[2373]: pulseaudio.service: Scheduled restart job, restart counter is at 5.
Jun 27 09:38:56 hostname pulseaudio[12895]: Stale PID file, overwriting.
Jun 27 09:40:26 hostname systemd[2373]: pulseaudio.service: start operation timed out. Terminating.
Jun 27 09:41:56 hostname systemd[2373]: pulseaudio.service: State 'stop-sigterm' timed out. Killing.
Jun 27 09:41:56 hostname systemd[2373]: pulseaudio.service: Killing process 12895 (pulseaudio) with signal SIGKILL.
Jun 27 09:42:02 hostname systemd[2373]: pulseaudio.service: Main process exited, code=killed, status=9/KILL
Jun 27 09:42:02 hostname systemd[2373]: pulseaudio.service: Failed with result 'timeout'.
Jun 27 09:42:02 hostname systemd[2373]: pulseaudio.service: Scheduled restart job, restart counter is at 6.
Jun 27 09:42:03 hostname pulseaudio[13050]: Stale PID file, overwriting.
Jun 27 09:43:33 hostname systemd[2373]: pulseaudio.service: start operation timed out. Terminating.
Jun 27 09:45:03 hostname systemd[2373]: pulseaudio.service: State 'stop-sigterm' timed out. Killing.
Jun 27 09:45:03 hostname systemd[2373]: pulseaudio.service: Killing process 13050 (pulseaudio) with signal SIGKILL.
Jun 27 09:45:10 hostname systemd[2373]: pulseaudio.service: Main process exited, code=killed, status=9/...

Read more...

Revision history for this message
max w beckett (maxywb) wrote :

Taking after csita I unplugged/replugged all my external audio devices and restarted pulseaudio and alsa-mixer. I found the below error in journalctl:

Jun 27 09:15:37 hostname pulseaudio[3735]: Failed to load module "module-alsa-card" (argument: "device_id="2" name="usb-N-SIA_KJ-USB_HEASET-00" card_name="alsa_card.usb-N-SIA_KJ-USB_HEASET-00" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no deferred_volume=yes use_ucm=yes avoid_resampling=no card_properties="module-udev-detect.discovered=1""): initialization failed.

This is a usb headset. When unplugging and leaving unplugged after reboot everything works. The failure to load the the module for the USB headset was for some reason leaving pulseaudio blocked on IO in a D state.

I use audio via HDMI output from my Nvidia card. I have Nvidia driver 440 installed on my system, but I do not use the Nvidia driver. I still use nouveau.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

csita,

Please see if disabling Secureboot helps.Secureboot and Preboot DMA protection go hand in hand on Dell systems.

This is an IOMMU bug.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

max w beckett,

What you are seeing is a completely different bug. Please file a separate one.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Mario,

It's Precision 7540 and dma_pte_clear_level()...

Revision history for this message
Mario Limonciello (superm1) wrote :

KH: Agree it looks like IOMMU bug, this needs to get raised up to the Intel IOMMU guys. They are tracking something specific to i915 in ICL and TGL, but this is not as new.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Right. Let's wait for the original bug reporter's reply.

We don't have any firewire audio device to test with...

Revision history for this message
csita (sz-richard) wrote :

Just to be sure that I understood the conversion correctly ?.

The problem is related to the memory management unit handling in the kernel.

And it happens with me because of the MMU has extra configuration due to the secure boot.

So to prove it I should run the official kernel 5.4 and disable secure boot and see if the problem happens ?.

Did I got it right ?.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Yes that's correct.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.