17.10 fails to boot on POWER9 DD2.0 with Deep stop states
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Released
|
High
|
Canonical Kernel Team | ||
linux (Ubuntu) |
Fix Released
|
High
|
Seth Forshee | ||
Artful |
Fix Released
|
High
|
Seth Forshee |
Bug Description
== Comment: #0 - Ranjal G. Shenoy
On Boston DD2.0 system, where deep stop states such as stop4 are enabled, the 17.10 kernel Ubuntu-4.12.0-12.13 fails to boot.
It requires the following upstream fixes to be backported.
1) commit 5f221c3ca13d ("powerpc/
2) commit ec4867355244 ("powerpc/
3) commit cb0be7ec0307 ("powerpc/
4) commit 1e1601b38e6e ("powerpc/
5) commit 22c6663dc69a ("powerpc/
6) commit f9122ee4f558 ("cpuidle-powernv: Allow Deep stop states that don't stop time")
7) commit 785a12afdb4a ("powerpc/
8) commit e1c1cfed5432 ("powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle")
9) commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug")
10) https:/
Of these 1-7 are in Linux Kernel 4.13. 8 and 9 are in powerpc/linux.git -next branch. and 10) is posted upstream which fixes 9).
These patches have been backported on top of Ubuntu-4.12.0-12.13 and tested on Boston where they are working as expected.
== Comment: #1 - Ranjal G. Shenoy
The lower 8 bits of core_idle_state_ptr tracks the number of non-idle
threads in the core. This is supposed to be initialized to bit-map
corresponding to the threads_per_core. However, currently it is
initialized to PNV_CORE_
POWER8 which has 8 threads per core, but not for POWER9 which has 4
threads per core.
As a result, on POWER9, core_idle_state_ptr gets initialized to
0xFF. In case when all the threads of the core are idle, the bits
corresponding tracking the idle-threads are non-zero. As a result, the
idle entry/exit code fails to save/restore per-core hypervisor state
since it assumes that there are threads in the cores which are still
active.
Fix this by correctly initializing the lower bits of the
core_
Cherry-picked from commit 5f221c3ca13d ("powerpc/
Correctly initialize core_idle_
== Comment: #2 - Ranjal G. Shenoy
On POWER8, in case of
- nap: both timebase and hypervisor state is retained.
- fast-sleep: timebase is lost. But the hypervisor state is retained.
- winkle: timebase and hypervisor state is lost.
Hence, the current code for handling exit from a idle state assumes
that if the timebase value is retained, then so is the hypervisor
state. Thus, the current code doesn't restore per-core hypervisor
state in such cases.
But that is no longer the case on POWER9 where we do have stop states
in which timebase value is retained, but the hypervisor state is
lost. So we have to ensure that the per-core hypervisor state gets
restored in such cases.
Fix this by ensuring that even in the case when timebase is retained,
we explicitly check if we are waking up from a deep stop that loses
per-core hypervisor state (indicated by cr4 being eq or gt), and if
this is the case, we restore the per-core hypervisor state.
Cherry-picked from commit ec4867355244 ("powerpc/
Decouple Timebase restore & Per-core SPRs restore")
== Comment: #3 - Ranjal G. Shenoy
On wakeup from a deep stop state which is supposed to lose the
hypervisor state, we don't restore the LPCR to the old value but set
it to a "sane" value via cur_cpu_
The problem is that the "sane" value doesn't include UPRT and the HR
bits which are required to run correctly in Radix mode.
Fix this on POWER9 onwards by restoring the LPCR value whatever it was
before executing the stop instruction.
Cherry-picked from commit cb0be7ec0307 ("powerpc/
LPCR on wakeup from deep-stop")
== Comment: #4 - Ranjal G. Shenoy
Some of the SPR values (HID0, MSR, SPRG0) don't change during the run
time of a booted kernel, once they have been initialized.
The contents of these SPRs are lost when the CPUs enter deep stop
states. So instead saving and restoring SPRs from the kernel, use the
stop-api provided by the firmware by which the firmware can restore
the contents of these SPRs to their initialized values after wakeup
from a deep stop state.
Apart from these, program the PSSCR value to that of the deepest stop
state via the stop-api. This will be used to indicate to the
underlying firmware as to what stop state to put the threads that have
been woken up by a special-wakeup.
And while we are at programming SPRs via stop-api, ensure that HID1,
HID4 and HID5 registers which are only available on POWER8 are not
requested to be restored by the firware on POWER9.
Cherry-picked from commit 1e1601b38e6e ("powerpc/
SPRs for deep idle states via stop API.")
== Comment: #5 - Ranjal G. Shenoy
On Power9 DD1 due to a hardware bug the Power-Saving Level Status
field (PLS) of the PSSCR for a thread waking up from a deep state can
under-report if some other thread in the core is in a shallow stop
state. The scenario in which this can manifest is as follows:
1) All the threads of the core are in deep stop.
2) One of the threads is woken up. The PLS for this thread will
correctly reflect that it is waking up from deep stop.
3) The thread that has woken up now executes a shallow stop.
4) When some other thread in the core is woken, its PLS will reflect
the shallow stop state.
Thus, the subsequent thread for which the PLS is under-reporting the
wakeup state will not restore the hypervisor resources.
Hence, on DD1 systems, use the Requested Level (RL) field as a
workaround to restore the contents of the hypervisor resources on the
wakeup from the stop state.
Cherry-picked from commit 22c6663dc69a ("powerpc/
Requested Level for restoring state on P9 DD1")
== Comment: #6 - Ranjal G. Shenoy
The current code in the cpuidle-powernv intialization only allows deep
stop states (indicated by OPAL_PM_
(indicated by OPAL_PM_
POWER8 time where deep states used to lose the timebase. However, on
POWER9, we do have stop states that are deep (they lose hypervisor
state) but retain the timebase.
Fix the initialization code in the cpuidle-powernv driver to allow
such deep states.
Further, there is a bug in cpuidle-powernv driver with
CONFIG_
even if a platform idle state which loses time base was not added to
the cpuidle table.
Fix this by ensuring that the nr_idle_states variable gets incremented
only when the platform idle state was added to the cpuidle table.
Cherry-picked from commit f9122ee4f558 ("cpuidle-powernv: Allow Deep
stop states that don't stop time")
== Comment: #7 - Ranjal G. Shenoy
Currently, we use the opal call opal_slw_set_reg() to inform the
Sleep-Winkle Engine (SLW) to restore the contents of some of the
Hypervisor state on wakeup from deep idle states that lose full
hypervisor context (characterized by the flag
OPAL_
However, the current code has a bug in that if opal_slw_set_reg()
fails, we don't disable the use of these deep states (winkle on
POWER8, stop4 onwards on POWER9).
This patch fixes this bug by ensuring that if programing the
sleep-winkle engine to restore the hypervisor states in
pnv_
clearing the OPAL_PM_
supported_
using winkle for CPU-Hotplug, and POWER9 will put the offlined CPUs to
the default stop state when available.
Further, we ensure in the initialization of the cpuidle-powernv driver
to only include those states whose flags are present in
supported_
states when they have been disabled due to stop-api failure.
Fixes: 1e1601b38e6 ("powerpc/
states via stop API.")
Cherry-picked from commit 785a12afdb4a ("powerpc/
LOSE_
== Comment: #8 - Ranjal G. Shenoy
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Cherry-picked from commit e1c1cfed5432 ("powerpc/powernv: Save/Restore
additional SPRs for stop4 cpuidle")
== Comment: #9 - Ranjal G. Shenoy
Currently we use the stop-api provided by the firmware to program the
SLW engine to restore the values of hypervisor resources that get lost
on deeper idle states (such as winkle). Since the deep states were
only used for CPU-Hotplug on POWER8 systems, we would program the LPCR
to have the PECE1 bit since Hotplugged CPUs shouldn't be spuriously
woken up by decrementer.
On POWER9, some of the deep platform idle states such as stop4 can be
used in cpuidle as well. In this case, we want the CPU in stop4 to be
woken up by the decrementer when some timer on the CPU expires.
In this patch, we program the stop-api for LPCR with PECE1
bit cleared only when we are offlining the CPU and set it
back once the CPU is online.
Cherry-picked from commit 24be85a23d1f ("powerpc/powernv: Clear PECE1
in LPCR via stop-api only on Hotplug")
== Comment: #10 - Ranjal G. Shenoy
commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
stop-api only on Hotplug") clears the PECE1 bit of the LPCR via
stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on
an offlined CPU which is in a deep stop state.
In the case where the stop-api support is found to be lacking, the
commit 785a12afdb4a ("powerpc/
states when stop-api fails") disables deep states that lose hypervisor
context. Thus in this case, the offlined CPU will be put to some
shallow idle state.
However, we currently unconditionally clear the PECE1 in LPCR via
stop-api during CPU-Hotplug even when deep states are disabled due to
stop-api failure.
Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug
*only* when the offlined CPU will be put to a deep state that loses
hypervisor context.
Fixes: commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
stop-api only on Hotplug")
upstream reference: https:/
CVE References
Changed in ubuntu-power-systems: | |
importance: | Undecided → High |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in ubuntu-power-systems: | |
status: | New → Fix Committed |
Changed in ubuntu-power-systems: | |
status: | Fix Committed → Fix Released |
tags: | added: cscc |
Default Comment by Bridge