kdump fails when crash is triggered after DLPAR cpu add operation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Released
|
High
|
Canonical Kernel Team | ||
makedumpfile (Ubuntu) |
Fix Released
|
Undecided
|
Thadeu Lima de Souza Cascardo | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Thadeu Lima de Souza Cascardo | ||
Cosmic |
Won't Fix
|
Undecided
|
Unassigned | ||
Disco |
Won't Fix
|
Undecided
|
Unassigned | ||
Eoan |
Fix Released
|
Undecided
|
Thadeu Lima de Souza Cascardo | ||
Focal |
Fix Released
|
Undecided
|
Thadeu Lima de Souza Cascardo |
Bug Description
[Impact]
After a CPU add/hotplug operation on Power systems, kdump will fail after a crash. The kdump kernel needs to be reloaded after a CPU add/hotplug.
[Test case]
Do CPU add/hotplug, trigger a crash, and check for a successful kdump.
[Regression potential]
Multiple reloads caused by multiple sequential CPU adds may cause spurious log results, and systemd may fail to properly reload the kdump kernel. This has been handled by resetting the failure counter when doing such reloads.
== Comment: #0 - Hari Krishna Bathini - 2019-05-10 05:55:40 ==
---Problem Description---
kdump fails when crash is triggered after CPU add operation.
Machine Type = na
---System Hang---
Crashed in early boot process of kdump kernel after crash
Had to issue system reset from HMC to reclaim
---Steps to Reproduce---
1. Configure kdump.
2. Add cpu from HMC.
3. Trigger crash.
4. Machine hangs after crash as below:
---
[169250.213166] IPI complete
[169250.234331] kexec: Starting switchover sequence.
I'm in purgatory
---uname output---
na
---Debugger---
A debugger is not configured
== Comment: #1 - Hari Krishna Bathini - 2019-05-10 05:56:46 ==
The problem is, kexec udev rule to restart kdump-tools service - when a core is added,
is not being triggered. The old DT created by kexec (before the core is added)
is being used by KDump Kernel. So, when system crashes on a thread from
the added core(s), KDump kernel is failing to get the 'boot_cpuid' and
eventually failing to boot..
== Comment: #2 - Hari Krishna Bathini - 2019-05-10 06:02:27 ==
The udev rule when CPU is added is not triggered because ppc64 does not
eject add/remove event when a CPU is hot added/removed. It only ejects
online/offline event to user space when CPU is hot added/removed.
So, the below udev rules are never triggered when needed:
SUBSYSTEM=="cpu", ACTION=="add", PROGRAM=
SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM=
Also, with how CPU hot add & remove are handled in ppc64, a udev trigger
to reload kdump after CPU is hot removed is NOT necessary. So, fix the CPU
hot add case by updating the udev rule and drop the udev rule meant for CPU
hot remove in the kdump udev rules file:
SUBSYSTEM=="cpu", ACTION=="online", PROGRAM=
tags: | added: architecture-ppc64le bugnameltc-177551 severity-high targetmilestone-inin--- |
Changed in ubuntu: | |
assignee: | nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) |
affects: | ubuntu → kexec-tools (Ubuntu) |
Changed in ubuntu-power-systems: | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
Changed in ubuntu-power-systems: | |
status: | New → Triaged |
tags: | added: powervm |
Changed in ubuntu-power-systems: | |
status: | Confirmed → Incomplete |
Changed in ubuntu-power-systems: | |
status: | Incomplete → Triaged |
Changed in makedumpfile (Ubuntu Disco): | |
assignee: | nobody → Thadeu Lima de Souza Cascardo (cascardo) |
status: | New → In Progress |
Changed in makedumpfile (Ubuntu Bionic): | |
assignee: | nobody → Thadeu Lima de Souza Cascardo (cascardo) |
status: | New → In Progress |
Changed in makedumpfile (Ubuntu Cosmic): | |
status: | New → Won't Fix |
Changed in ubuntu-power-systems: | |
status: | Triaged → In Progress |
tags: |
added: verification-failed verification-failed-disco removed: verification-needed verification-needed-disco |
no longer affects: | kexec-tools (Ubuntu Eoan) |
no longer affects: | kexec-tools (Ubuntu Disco) |
no longer affects: | kexec-tools (Ubuntu Cosmic) |
no longer affects: | kexec-tools (Ubuntu Bionic) |
no longer affects: | kexec-tools (Ubuntu Xenial) |
no longer affects: | kexec-tools (Ubuntu) |
Changed in makedumpfile (Ubuntu Eoan): | |
status: | Fix Released → In Progress |
Changed in makedumpfile (Ubuntu Disco): | |
status: | Fix Committed → In Progress |
Changed in makedumpfile (Ubuntu Bionic): | |
status: | Won't Fix → In Progress |
Changed in makedumpfile (Ubuntu Disco): | |
status: | Won't Fix → In Progress |
tags: |
added: verification-done-bionic removed: verification-needed-bionic |
Changed in ubuntu-power-systems: | |
status: | In Progress → Fix Committed |
tags: |
added: targetmilestone-inin18043 removed: targetmilestone-inin--- |
Changed in ubuntu-power-systems: | |
status: | Fix Committed → Fix Released |
I will start working on an upload to eoan by next week. I should have something for you to test early in the week.