Bluetoothd fills 100% CPU when bluetooth keyboard goes in idle

Bug #1717796 reported by marco.pallotta
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Bluez Utilities
Confirmed
Medium
The Ubuntu Power Consumption Project
Medium
Unassigned
bluez (Ubuntu)
Medium
Unassigned

Bug Description

When bluetooth keyboard goes in idle bluetoothd fills 100% CPU. At this point you have to press any key to reconnect the device and let bluetoothd to return to work normally.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: bluez 5.37-0ubuntu5.1
ProcVersionSignature: Ubuntu 4.4.0-93.116-generic 4.4.79
Uname: Linux 4.4.0-93-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
Date: Sun Sep 17 19:02:40 2017
ExecutablePath: /usr/lib/bluetooth/bluetoothd
InstallationDate: Installed on 2016-08-06 (406 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
InterestingModules: rfcomm bnep btusb bluetooth
MachineType: GIGABYTE GB-BXBT-2807
ProcEnviron:
 LANG=it_IT.UTF-8
 PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-93-generic.efi.signed root=UUID=8931ead4-8d43-4aef-8946-c7845c63c7d3 ro quiet splash vt.handoff=7
SourcePackage: bluez
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/22/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F8
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: MZBAYAP-00
dmi.board.vendor: GIGABYTE
dmi.board.version: 1.x
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF8:bd10/22/2015:svnGIGABYTE:pnGB-BXBT-2807:pvr1.x:rvnGIGABYTE:rnMZBAYAP-00:rvr1.x:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: GB-BXBT-2807
dmi.product.version: 1.x
dmi.sys.vendor: GIGABYTE
hciconfig:
 hci0: Type: BR/EDR Bus: USB
  BD Address: DC:85:DE:FD:2F:BC ACL MTU: 820:8 SCO MTU: 255:16
  UP RUNNING PSCAN ISCAN
  RX bytes:528225 acl:32183 sco:0 events:1134 errors:0
  TX bytes:33849 acl:338 sco:0 commands:388 errors:1

Revision history for this message
marco.pallotta (marco-pallotta) wrote :
Revision history for this message
marco.pallotta (marco-pallotta) wrote :

This is my kernel.log when bluetoothd goes at 100%:

"
Sep 17 18:41:43 GB-BXBT-2807 kernel: [ 7738.885123] hid-generic 0005:0A5C:8502.000B: unknown main item tag 0x0
Sep 17 18:41:43 GB-BXBT-2807 kernel: [ 7738.885475] input: Bluetooth Keyboard as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2:1.0/bluetooth/hci0/hci0:2/0005:0A5C:8502.000B/input/input17
Sep 17 18:41:43 GB-BXBT-2807 kernel: [ 7738.886183] hid-generic 0005:0A5C:8502.000B: input,hidraw1: BLUETOOTH HID v0.01 Keyboard [Bluetooth Keyboard] on dc:85:de:fd:2f:bc
"

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

We need some stack information to help us determine the cause of the CPU spikes.

1. Download the attached dstack script to your home directory.
2. Log in to a virtual terminal (Ctrl+Alt+F1), and switch back (Ctrl+Alt+F7)
3. When the spike happens, switch to Ctrl+Alt+F1 and run this:
    sudo -s
    sh ./dstack bluetoothd >> stacks.txt
    sh ./dstack bluetoothd >> stacks.txt
    sh ./dstack bluetoothd >> stacks.txt
    sh ./dstack bluetoothd >> stacks.txt
    sh ./dstack bluetoothd >> stacks.txt
   (yes, five times)
4. Attach the resulting stacks.txt to this bug.

Changed in bluez (Ubuntu):
status: New → Incomplete
Revision history for this message
marco.pallotta (marco-pallotta) wrote :

In attach the requested trace output.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks. Unfortunately it appears bluetoothd is a single-threaded program spinning its g_main_loop without actually doing any heavy work on each iteration. This makes it hard to identify the cause of the problem without running it through a profiler.

Changed in bluez (Ubuntu):
status: Incomplete → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in bluez (Ubuntu):
status: New → Confirmed
Changed in ubuntu-power-consumption:
importance: Undecided → Medium
Changed in bluez (Ubuntu):
importance: Undecided → Medium
Changed in ubuntu-power-consumption:
status: New → Confirmed
Revision history for this message
johndow2001 (johndow2001) wrote :

Dear Ubuntu dev Team

DON'T MAKE "UPGRADES" WHICH CHANGE PERMISSIONS!!!

It is a permission problem with my home folder and pulseaudio.

I've had 2 issues:
 - bluetoothd operating at 100% of one CPU core
 - no way to connect ANY bt device

If happened after one of your "upgrades", so please test it MORE THOROUGHLY next time. Yes, I'm certain it happened after an upgrade cause I made backup 2 days prior to this extremely frustrating issue and no software other than "ubuntu upgrade" was installed in the meantime.

After having researched internet twice and having already made my mind to switch to another linux distro if the solution does't come up till the end of the day - I came across this:

Run the following command to fix the $HOME directory permissions for the current $USER:

sudo chown -R $USER:$USER $HOME/
I used http://www.linuxquestions.org/questions/linux-software-2/problem-with-pulse-audio-834041/ and http://ubuntuforums.org/showthread.php?p=6208727 as sources.

The solution for me was to fix the permissions using the following command: sudo chown -R lionel:lionel /home/lionel and then using pavucontrol to unmute the output.

The gnome indicator is still greyed but music is back, and that's a good thing.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Can anyone experiencing this bug please report it to the BlueZ developers here:

  https://bugzilla.kernel.org/enter_bug.cgi?product=Drivers

with Component = Bluetooth, and then tell us the new bug ID?

Changed in ubuntu-power-consumption:
status: Confirmed → Incomplete
Changed in bluez (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for The Ubuntu Power Consumption Project because there has been no activity for 60 days.]

Changed in ubuntu-power-consumption:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for bluez (Ubuntu) because there has been no activity for 60 days.]

Changed in bluez (Ubuntu):
status: Incomplete → Expired
Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

Many bluetooth HID keyboard profile devices disconnect on idle to save power, I have two such devices:

Logitech Y-X5A77 keyboard

Sony PS3 bluetooth remote control

When the device disconnects on idle the g_io_channels get disconnected

bluetoothd[3167]: profiles/input/device.c:ctrl_watch_cb() Device 00:07:61:F6:C9:7D disconnected
bluetoothd[3167]: profiles/input/device.c:intr_watch_cb() Device 00:07:61:F6:C9:7D disconnected

But, something isn't closed properly (possibly g_io_add_watch) since bluetoothd then spins at 100% CPU in g_main_loop_run() until the device reconnects, at which point every works normally again until the device next idles.

This bug has been present for at least 2 years, I'm certain it's much longer:

https://bugs.launchpad.net/ubuntu/+source/bluez/+bug/1717796

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

I've tried adding g_source_remove()s for ctrl_watch and intr_watch before shutting down the io channels but it doesn't make any difference.

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

strace results in continuous repeating:

poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}, {fd=42, events=POLLIN}, {fd=43, events=POLLIN}, {fd=44, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}, {fd=49, events=POLLIN}, {fd=50, events=POLLIN}, {fd=51, events=POLLIN}, {fd=52, events=POLLIN}, {fd=53, events=POLLIN}, {fd=55, events=POLLOUT}], 28, -1) = 1 ([{fd=55, revents=POLLNVAL}])

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

Ahhh...

I was already aware that on first connect the HID input device wasn't created, but I hadn't realised it's part of the same bug.

What's happening is that on initial connect:

 dev->sec_watch = g_io_add_watch(idev->intr_io, G_IO_OUT,
                                                        encrypt_notify, idev);

creates the notify callback. But it never triggers. On second connection the callback gets triggered, but the connect code gets run again so another refcount is added. This means the intr channel never gets closed which means it's stuck on the last event.

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

I flipped the G_IO_OUT to G_IO_IN and it started working on first connect. I don't know why it was G_IO_OUT?

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

Created attachment 283925
Patch to fix high CPU usage with HID keyboard and other devices

Revision history for this message
In , luiz.dentz (luiz.dentz-linux-kernel-bugs) wrote :

Ive sent a fix upstream:

https://<email address hidden>/T/#u

Let me know if that works.

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

(In reply to Luiz Von Dentz from comment #6)
> Ive sent a fix upstream:
>
> https://lore.kernel.org/linux-bluetooth/20190724110151.4258-1-luiz.
> <email address hidden>/T/#u
>
> Let me know if that works.

That will prevent the channel getting left dangling, but it doesn't address the issue of the callback not happening on initial connect, which is AFAICT what results in that part of the bug for me.

Is it really supposed to be waiting for G_IO_OUT? It seems it only gets triggered for me when a second connection is attempted on the dangling channel, presumably during handshake..?

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

Luiz, did you see (In reply to Luiz Von Dentz from comment #6)
> Ive sent a fix upstream:
>
> https://lore.kernel.org/linux-bluetooth/20190724110151.4258-1-luiz.
> <email address hidden>/T/#u
>
> Let me know if that works.

Luiz, did you see the patch I attached? Perhaps it's better to remove the old watcher first than not add another if it exists, but it shouldn't happen anyway. The explicit remove should happen to catch the case where the connection is broken before being fully initialised but the callback should get triggered once data starts to flow over the encrypted channel.

In my tests, allowing the channel to fully close by removing the sec_watch prevents the keyboard from ever connecting, this is because the callback never gets triggered. I believe the bug is caused by the refcount sec_watch creates not being automatically removed because the callback is left pending, it doesn't get removed on channel shutdown, but has the side-effect of making it work with the channel being already open when the device reconnects because it triggers the callback.

What made it work every time is changing sec_watch to wait for G_IO_IN instead of G_IO_OUT, but presumably it is G_IO_OUT for a reason?

Revision history for this message
In , luiz.dentz (luiz.dentz-linux-kernel-bugs) wrote :

Im not sure we even need the watch in the first place, the kernel should block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it might be possible to get rid of sec_watch altogether.

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

(In reply to Luiz Von Dentz from comment #9)
> Im not sure we even need the watch in the first place, the kernel should
> block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it
> might be possible to get rid of sec_watch altogether.

Going through the history it originally worked that way, I'm not sure why it was changed..?

Revision history for this message
In , luiz.dentz (luiz.dentz-linux-kernel-bugs) wrote :

(In reply to Steven Newbury from comment #10)
> (In reply to Luiz Von Dentz from comment #9)
> > Im not sure we even need the watch in the first place, the kernel should
> > block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it
> > might be possible to get rid of sec_watch altogether.
>
> Going through the history it originally worked that way, I'm not sure why it
> was changed..?

Well lets try to remove it then and see if that fix the problem.

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

(In reply to Luiz Von Dentz from comment #11)
> (In reply to Steven Newbury from comment #10)
> > (In reply to Luiz Von Dentz from comment #9)
> > > Im not sure we even need the watch in the first place, the kernel should
> > > block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it
> > > might be possible to get rid of sec_watch altogether.
> >
> > Going through the history it originally worked that way, I'm not sure why
> it
> > was changed..?
>
> Well lets try to remove it then and see if that fix the problem.

That works. I'll attach a patch on here.

Revision history for this message
In , steve (steve-linux-kernel-bugs) wrote :

Created attachment 284073
Remove sec_watch, instead attempt immediate connection

Revision history for this message
In , luiz.dentz (luiz.dentz-linux-kernel-bugs) wrote :

(In reply to Steven Newbury from comment #13)
> Created attachment 284073 [details]
> Remove sec_watch, instead attempt immediate connection

Could you please send as a patch to linux-bluetooth?

tags: added: eoan
Changed in bluez (Ubuntu):
status: Expired → Confirmed
Changed in ubuntu-power-consumption:
status: Expired → Confirmed
Changed in bluez:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Isaac Cohen (icohen2000) wrote :

Hi,

I'm not sure if this is the right place to post this. But basically, I applied the patch that Steve created but the issue appears to persist. Did I apply it properly? Here's what I did: I edited bluez/profiles/input/device.c and then ran './configure', 'make' and 'make install'.

Revision history for this message
Isaac Cohen (icohen2000) wrote :

Actually, my apologies. I rebooted after reinstalled the revised version of bluez and it all works fine now. Thank you so much!

-Isaac

tags: removed: eoan
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.