Comment 106 for bug 1730924

Revision history for this message
In , verdre (verdre-linux-kernel-bugs) wrote :

I've been looking into this for the last few days and it turns out the issue is definitely caused by PCI link power management: Using the CONFIG_PCIEASPM_DEBUG config option of the kernel I was able to track the issue down to the PCI ASPM L1.2 power saving state (set link_state of the parent PCI bridge to 15 or 111 to enable everything except L1.2 [1]).

Not sure how to continue at this point, since PCIe ASPM is implemented in hardware and debugging would require special equipment, I really hope someone at Marvell steps up and has a look into this. It might also be possible that L1.2 is disabled on Windows, but I haven't found tools to read the L1 substates from the PCI configuration on Windows yet.

Another interesting fact is that the random command timeouts I'm looking into here only seem to happen on some Surface devices using the 8897 chip via PCI, that is the Pro 4, Pro 2017, the Pro 6, Laptop 2 (see comments in [2]), but not on the Surface Book 2 (according to owners of the device). There are also multiple reports of the same issue on completely different devices to be found on the internet ([3], [4]), but those never got any attention by Marvell developers.

[1] https://github.com/torvalds/linux/blob/master/drivers/pci/pcie/aspm.c#L28
[2] https://github.com/jakeday/linux-surface/issues/163
[3] https://www.spinics.net/lists/linux-wireless/msg159776.html
[4] https://www.spinics.net/lists/linux-wireless/msg175943.html