When handling multifunction devices in zPCI we take the
UID of the PCI function with function number 0
(that always exists according to the PCI spec)
as domain number.
Therefore when hot plugging functions with function
number larger than 0 before function 0, we need
to hold these in standby before creating the
domain and bus.
This has been tested during feature development
using a patched QEMU and with DPM but never in Classic
Mode.
Reproduction:
This issue was introduced with the Topology aware PCI
Enumeration code so test with a Linux supporting
that feature. E.g. Upstream, Devel Driver etc.
On a Classic Mode machine with a multi-function device,
hot plug ("Reassign I/O Path") only the FID of the
second port to the LPAR.
Symptom:
After this any additional hotplug and even just
deconfiguring a PCI device will hang. A hotplug
makes the entire Linux instance unresponsive.
Analysis:
The problem occurs in Classic Mode but not with
previous testing as the LPAR hypervisor does
hot plug/Reassign I/O Path as a two step process:
1. zPCI event with PEC 0x0302 to plug the zPCI function in Standby
2. zPCI event with PEC 0x0301 to configure the zPCI function
For the first event we create the zdev in clp_add_pci_device()
in Standby which is all fine so far.
The problem then occurs in step 2 as we then find
the existing zdev and try to configure it.
This however does not work as the PCI bus
is not yet created (as we still don't know the UID of
function 0 that will become its domain).
The bus pointer zdev->zbus->bus pointer is thus still
NULL but will be accessed by common code which
inevitably results in disaster including
the above mentioned hang and (possibly) the below
RCU stall:
The fix is very simple, we check zdev->zbus->bus
for being NULL and in that case bail from the
case 0x0301 before calling the PCI common code
pci_scan_single_device() with the NULL pointer.
The only subtlety is that we still need to
do the zpci_enable_device() because the
code in arch/s390/pci/pci_bus.c assumes
that it can immediately do a scan of
all devfn != 0 PCI functions once
PCI function 0 is found.
It thereby mimics what happens
when we only find the FID for a function with
devfn != 0 in the CLP List PCI Functions.
This is implemented in the following upstream
commit:
0b2ca2c7d0c9e2731d01b6c862375d44a7e13923 s390/pci: fix hot-plug of PCI function missing bus
It is included in v5.10-rc3 and has been tagged for
stable > v5.8 i.e. all upstream versions with
the PCI enumeration changes.
Also it carries the appropriate Fixes tag.
I have verified that it cherry-picks cleanly
on current focal master-next and expect
it to cleanly cherry-pick on newer Ubuntu
Kernels too.
Background:
When handling multifunction devices in zPCI we take the
UID of the PCI function with function number 0
(that always exists according to the PCI spec)
as domain number.
Therefore when hot plugging functions with function
number larger than 0 before function 0, we need
to hold these in standby before creating the
domain and bus.
This has been tested during feature development
using a patched QEMU and with DPM but never in Classic
Mode.
Reproduction:
This issue was introduced with the Topology aware PCI
Enumeration code so test with a Linux supporting
that feature. E.g. Upstream, Devel Driver etc.
On a Classic Mode machine with a multi-function device,
hot plug ("Reassign I/O Path") only the FID of the
second port to the LPAR.
Symptom:
After this any additional hotplug and even just
deconfiguring a PCI device will hang. A hotplug
makes the entire Linux instance unresponsive.
Analysis:
The problem occurs in Classic Mode but not with
previous testing as the LPAR hypervisor does
hot plug/Reassign I/O Path as a two step process:
1. zPCI event with PEC 0x0302 to plug the zPCI function in Standby
2. zPCI event with PEC 0x0301 to configure the zPCI function
For the first event we create the zdev in clp_add_ pci_device( )
in Standby which is all fine so far.
The problem then occurs in step 2 as we then find
the existing zdev and try to configure it.
This however does not work as the PCI bus
is not yet created (as we still don't know the UID of
function 0 that will become its domain).
The bus pointer zdev->zbus->bus pointer is thus still
NULL but will be accessed by common code which
inevitably results in disaster including
the above mentioned hang and (possibly) the below
RCU stall:
[ 689.724703] rcu: INFO: rcu_sched self-detected stall on CPU 1/0x40000000000 00002 softirq=1234/1234 fqs=14001 5c4>] show_stack+ 0x8c/0xd8 bc4>] sched_show_ task.part. 0+0xe4/ 0x110 a5e>] rcu_dump_ cpu_stacks+ 0xde/0x120 5c6>] print_cpu_ stall+0x266/ 0x330 428>] rcu_sched_ clock_irq+ 0x618/0x670 d7a>] update_ process_ times+0xba/ 0xf0 6fa>] tick_sched_ timer+0x9a/ 0x220 962>] __hrtimer_ run_queues+ 0x182/0x3a0 2f8>] hrtimer_ interrupt+ 0x138/0x450 1c0>] do_IRQ+0x90/0xa0 e96>] ext_int_ handler+ 0x17e/0x184 73e>] pci_get_ slot+0x5e/ 0xa0 182>] pci_scan_ single_ device+ 0x32/0x2a0 8f2>] __zpci_ event_availabil ity+0x192/ 0x360 c16>] chsc_process_ crw+0x2e6/ 0x300 088>] crw_collect_ info+0x2b8/ 0x320 f3a>] kthread+0x14a/0x170 814>] ret_from_ fork+0x24/ 0x2c
[ 689.724712] rcu: 16-....: (42004 ticks this GP) idle=6ee/
[ 689.724742] (t=42006 jiffies g=89 q=3770)
[ 689.724743] Task dump for CPU 16:
[ 689.724745] task:kmcheck state:R running task stack: 0 pid: 205 ppid: 2 flags:0x00000004
[ 689.724747] Call Trace:
[ 689.724757] [<0000000ccde0b
[ 689.724762] [<0000000ccd0da
[ 689.724764] [<0000000ccde0e
[ 689.724767] [<0000000ccd146
[ 689.724768] [<0000000ccd14a
[ 689.724771] [<0000000ccd15c
[ 689.724775] [<0000000ccd176
[ 689.724777] [<0000000ccd15d
[ 689.724779] [<0000000ccd160
[ 689.724782] [<0000000ccd045
[ 689.724784] [<0000000ccde2b
[ 689.724790] [<0000000ccd9f3
[ 689.724794] [<0000000ccd9dc
[ 689.724797] [<0000000ccd086
[ 689.724800] [<0000000ccdd40
[ 689.724802] [<0000000ccdd4b
[ 689.724804] [<0000000ccd0ca
[ 689.724805] [<0000000ccde2b
The fix is very simple, we check zdev->zbus->bus single_ device( ) with the NULL pointer.
for being NULL and in that case bail from the
case 0x0301 before calling the PCI common code
pci_scan_
The only subtlety is that we still need to device( ) because the pci/pci_ bus.c assumes
do the zpci_enable_
code in arch/s390/
that it can immediately do a scan of
all devfn != 0 PCI functions once
PCI function 0 is found.
It thereby mimics what happens
when we only find the FID for a function with
devfn != 0 in the CLP List PCI Functions.
This is implemented in the following upstream
commit:
0b2ca2c7d0c9e27 31d01b6c862375d 44a7e13923 s390/pci: fix hot-plug of PCI function missing bus
It is included in v5.10-rc3 and has been tagged for
stable > v5.8 i.e. all upstream versions with
the PCI enumeration changes.
Also it carries the appropriate Fixes tag.
I have verified that it cherry-picks cleanly
on current focal master-next and expect
it to cleanly cherry-pick on newer Ubuntu
Kernels too.