I took a different route and tried to find out what is going wrong with the current code. I loaded the linux-source package and built kernel 3.19.3 versions with added debug messages in drivers/usb/core/hub.c to see that is happening. I found the following chain of calls:
1. hub_port_reset()
2. > hub_port_wait_reset()
3. > > hub_port_status() returns port status 0x02C0 (USB_SS_PORT_LS_SS_INACTIVE)
4. > > hub_port_warm_reset_required() returns TRUE due to link_state == USB_SS_PORT_LS_SS_INACTIVE
5. > hub_port_wait_reset() returns -ENOTCONN due to the TRUE result from hub_port_warm_reset_required()
6. > hub_port_finish_reset(... *status = -ENOTCONN)
7. > > usb_set_device_state(udev, USB_STATE_NOTATTACHED)
And that's the problem: USB device state NOTATTACHED is a dead end as usb_set_device_state() does not seem to allow ever leaving that state:
if (udev->state == USB_STATE_NOTATTACHED) {
; /* do nothing */
}
That seems bogus to me. A "warm reset" sounds like it is supposed to be recoverable, but a port which ever runs into this state cannot recover since the NOTATTACHED software state is a deathtrap.
I tried altering the end of hub_port_finish_reset() to:
if (udev)
usb_set_device_state(udev, *status
? USB_STATE_ATTACHED // FIX: was USB_STATE_NOTATTACHED
: USB_STATE_DEFAULT);
and that fixes the problem. The port is warm reset, the USB device is set to a working state and the boot continues without problems.
However:
1. I'm not sure if this is the right place to fix the problem. Maybe there are scenarios in which the "dead end" is needed. Then it would be a better solution to change the status in a place where it is known that a warm reset will be tried...
2. I cannot find any substantial difference in drivers/usb/core/hub.c between kernel sources 3.17.8 and 3.18-rc1 which would explain why this problem occurs so frequently with the latter, but not with the former. So this may not be the root cause of my problem, but at least it's a way to fix it, and from what I can tell, this is erroneous behavior which requires fixing anyway.
Now, how to get this issue up with the experts on this code to discuss if my findings are right and how to best fix it...?
I took a different route and tried to find out what is going wrong with the current code. I loaded the linux-source package and built kernel 3.19.3 versions with added debug messages in drivers/ usb/core/ hub.c to see that is happening. I found the following chain of calls:
1. hub_port_reset() wait_reset( ) PORT_LS_ SS_INACTIVE) warm_reset_ required( ) returns TRUE due to link_state == USB_SS_ PORT_LS_ SS_INACTIVE wait_reset( ) returns -ENOTCONN due to the TRUE result from hub_port_ warm_reset_ required( ) finish_ reset(. .. *status = -ENOTCONN) device_ state(udev, USB_STATE_ NOTATTACHED)
2. > hub_port_
3. > > hub_port_status() returns port status 0x02C0 (USB_SS_
4. > > hub_port_
5. > hub_port_
6. > hub_port_
7. > > usb_set_
And that's the problem: USB device state NOTATTACHED is a dead end as usb_set_ device_ state() does not seem to allow ever leaving that state:
if (udev->state == USB_STATE_ NOTATTACHED) {
; /* do nothing */
}
That seems bogus to me. A "warm reset" sounds like it is supposed to be recoverable, but a port which ever runs into this state cannot recover since the NOTATTACHED software state is a deathtrap.
I tried altering the end of hub_port_ finish_ reset() to:
if (udev) set_device_ state(udev, *status NOTATTACHED
usb_
? USB_STATE_ATTACHED // FIX: was USB_STATE_
: USB_STATE_DEFAULT);
and that fixes the problem. The port is warm reset, the USB device is set to a working state and the boot continues without problems.
However:
1. I'm not sure if this is the right place to fix the problem. Maybe there are scenarios in which the "dead end" is needed. Then it would be a better solution to change the status in a place where it is known that a warm reset will be tried...
2. I cannot find any substantial difference in drivers/ usb/core/ hub.c between kernel sources 3.17.8 and 3.18-rc1 which would explain why this problem occurs so frequently with the latter, but not with the former. So this may not be the root cause of my problem, but at least it's a way to fix it, and from what I can tell, this is erroneous behavior which requires fixing anyway.
Now, how to get this issue up with the experts on this code to discuss if my findings are right and how to best fix it...?