Comment 27 for bug 1455376

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

I took a different route and tried to find out what is going wrong with the current code. I loaded the linux-source package and built kernel 3.19.3 versions with added debug messages in drivers/usb/core/hub.c to see that is happening. I found the following chain of calls:

1. hub_port_reset()
2. > hub_port_wait_reset()
3. > > hub_port_status() returns port status 0x02C0 (USB_SS_PORT_LS_SS_INACTIVE)
4. > > hub_port_warm_reset_required() returns TRUE due to link_state == USB_SS_PORT_LS_SS_INACTIVE
5. > hub_port_wait_reset() returns -ENOTCONN due to the TRUE result from hub_port_warm_reset_required()
6. > hub_port_finish_reset(... *status = -ENOTCONN)
7. > > usb_set_device_state(udev, USB_STATE_NOTATTACHED)

And that's the problem: USB device state NOTATTACHED is a dead end as usb_set_device_state() does not seem to allow ever leaving that state:

 if (udev->state == USB_STATE_NOTATTACHED) {
  ; /* do nothing */
 }

That seems bogus to me. A "warm reset" sounds like it is supposed to be recoverable, but a port which ever runs into this state cannot recover since the NOTATTACHED software state is a deathtrap.

I tried altering the end of hub_port_finish_reset() to:

  if (udev)
   usb_set_device_state(udev, *status
     ? USB_STATE_ATTACHED // FIX: was USB_STATE_NOTATTACHED
     : USB_STATE_DEFAULT);

and that fixes the problem. The port is warm reset, the USB device is set to a working state and the boot continues without problems.

However:

1. I'm not sure if this is the right place to fix the problem. Maybe there are scenarios in which the "dead end" is needed. Then it would be a better solution to change the status in a place where it is known that a warm reset will be tried...

2. I cannot find any substantial difference in drivers/usb/core/hub.c between kernel sources 3.17.8 and 3.18-rc1 which would explain why this problem occurs so frequently with the latter, but not with the former. So this may not be the root cause of my problem, but at least it's a way to fix it, and from what I can tell, this is erroneous behavior which requires fixing anyway.

Now, how to get this issue up with the experts on this code to discuss if my findings are right and how to best fix it...?