Comment 37 for bug 1144322

Revision history for this message
In , Blc+gentoo (blc+gentoo) wrote :

Created attachment 351696
WARN/stack dump/oops in the bluetooth rfcomm code introduced in raw kernel 3.8.x

This is an *UPSTREAM* bug and also some collation of known information of the nature of this bug.

A bug that was introduced upstream by the bluetooth developers in 3.8.x which remains in 3.9.x will cause the machine to crash with an oops when rfcomm is disconnected while a tty is connected. This is unexpected behavior. While in 3.10-rc5 the behavior changed, the bug still exists.

The initial method to trigger this bug was listed in

http://forums.gentoo.org/viewtopic-t-961421-highlight-.html

In brief, set up any bluetooth rfcomm connection and then rip up the bluetooth connection (/etc/init.d/bluetooth stop, rfcomm release, use blueman to disconnect Dial Up Networking/Serial). (I believe that pulling the bluetooth USB device from the plug also will trigger this issue, but I'd call that nonnatural behavior.) The kernel will then stomp over another kernel structure and cause the kernel to get corrupted, making other subsystem oops.

As Gentoo appears to not have bluetooth setup for networkmanager, it should not be affected unless someone is using rfcomm directly to communicate with a bluetooth serial device, say over minicom for a bluetooth device or using pppd directly to access a bluetooth modem. I hit the bug because I have a /etc/portage/patches/net-misc/networkmanager patch file to allow bluetooth rfcomm links.

As far as I can tell and from reports/tests upstream, this is probably due to bluetooth rfcomm not following standard tty procedures ripping up connected applications if the bluetooth link is torn down without cleaning up the tty. A patch to expose the bad rfcomm behavior was posted on LKML on 2013 May 15, which also prevents the machine from hanging/crashing by stopping the memory corruption. It does not fix the problem, merely instruments it (and also prevents other subsystems from dying, causing potential data loss).

The patch that Peter Hurley wrote was:

diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c
index 6d9e0b2..a4f4fa9 100644
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -140,6 +140,10 @@ EXPORT_SYMBOL(tty_port_destroy);
  static void tty_port_destructor(struct kref *kref)
  {
      struct tty_port *port = container_of(kref, struct tty_port, kref);
+
+ /* check if last port ref was dropped before tty release */
+ if (WARN_ON(port->itty))
+ return;
      if (port->xmit_buf)
          free_page((unsigned long)port->xmit_buf);
      tty_port_destroy(port);

Attached is the warnings and errors generated when I disable rfcomm from blueman with the above patch showing the correct trace. Without the above patch, corruption will tend to make other functions show incorrect information and tends to completely crash/hang the machine shortly after disconnection.