kdump fails when crash is triggered after DLPAR cpu add operation

Bug #1828596 reported by bugproxy
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Canonical Kernel Team
makedumpfile (Ubuntu)
Fix Released
Undecided
Thadeu Lima de Souza Cascardo
Xenial
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Thadeu Lima de Souza Cascardo
Cosmic
Won't Fix
Undecided
Unassigned
Disco
Won't Fix
Undecided
Unassigned
Eoan
Fix Released
Undecided
Thadeu Lima de Souza Cascardo
Focal
Fix Released
Undecided
Thadeu Lima de Souza Cascardo

Bug Description

[Impact]
After a CPU add/hotplug operation on Power systems, kdump will fail after a crash. The kdump kernel needs to be reloaded after a CPU add/hotplug.

[Test case]
Do CPU add/hotplug, trigger a crash, and check for a successful kdump.

[Regression potential]
Multiple reloads caused by multiple sequential CPU adds may cause spurious log results, and systemd may fail to properly reload the kdump kernel. This has been handled by resetting the failure counter when doing such reloads.

== Comment: #0 - Hari Krishna Bathini - 2019-05-10 05:55:40 ==
---Problem Description---
kdump fails when crash is triggered after CPU add operation.

Machine Type = na

---System Hang---
 Crashed in early boot process of kdump kernel after crash

Had to issue system reset from HMC to reclaim

---Steps to Reproduce---
 1. Configure kdump.
2. Add cpu from HMC.
3. Trigger crash.
4. Machine hangs after crash as below:

---
[169250.213166] IPI complete
[169250.234331] kexec: Starting switchover sequence.
I'm in purgatory
                             --- STRUCK HERE ---

---uname output---
na

---Debugger---
A debugger is not configured

== Comment: #1 - Hari Krishna Bathini - 2019-05-10 05:56:46 ==
The problem is, kexec udev rule to restart kdump-tools service - when a core is added,
is not being triggered. The old DT created by kexec (before the core is added)
is being used by KDump Kernel. So, when system crashes on a thread from
the added core(s), KDump kernel is failing to get the 'boot_cpuid' and
eventually failing to boot..

== Comment: #2 - Hari Krishna Bathini - 2019-05-10 06:02:27 ==
The udev rule when CPU is added is not triggered because ppc64 does not
eject add/remove event when a CPU is hot added/removed. It only ejects
online/offline event to user space when CPU is hot added/removed.

So, the below udev rules are never triggered when needed:

SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl try-restart kdump-tools.service"

Also, with how CPU hot add & remove are handled in ppc64, a udev trigger
to reload kdump after CPU is hot removed is NOT necessary. So, fix the CPU
hot add case by updating the udev rule and drop the udev rule meant for CPU
hot remove in the kdump udev rules file:

SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-177551 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kexec-tools (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

I will start working on an upload to eoan by next week. I should have something for you to test early in the week.

Changed in kexec-tools (Ubuntu):
status: New → Invalid
Changed in makedumpfile (Ubuntu):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in ubuntu-power-systems:
status: New → Triaged
Frank Heimes (fheimes)
tags: added: powervm
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

At my ppa, there is a version with the change. Can you please test? The package is available for bionic, cosmic, disco and eoan.

ppa:cascardo/kdump2

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Marking as "incomplete" while awaiting test results from Thadeu's PPA kernel.

Changed in ubuntu-power-systems:
status: Triaged → Incomplete
Changed in makedumpfile (Ubuntu):
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-05-21 06:16 EDT-------
Cascardo, the udev rules (/lib/udev/rules.d/50-kdump-tools.rules) should have been:

SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"

but the package has:

SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"

Can we get that sorted..

Thanks
Hari

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Hari.

So, as you said, other architectures will use add/remove instead of online, and we want to support them too. Any reason not to do it that you are thinking of?

Thanks.
Cascardo.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-22 02:33 EDT-------
(In reply to comment #11)
> Hi, Hari.
>
> So, as you said, other architectures will use add/remove instead of online,
> and we want to support them too. Any reason not to do it that you are
> thinking of?

No action with these rules on ppc64 as ADD/REMOVE events are not ejected
for CPU subsystem as of today. So, they don't have any impact and can be ignored.
But I thought this rules were there by accident and the entries would be put
under arch flags to avoid them for ppc64..

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-22 07:16 EDT-------
(In reply to comment #12)
[...]
> But I thought this rules were there by accident and the entries would be put
> under arch flags to avoid them for ppc64..

If that is too much to ask, I am fine with the current change.
The change works as expected..

Thanks
Hari

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Based on the last comment, it looks like IBM's testing was successful and this patch is ready for SRU.
Thanks.

Changed in ubuntu-power-systems:
status: Incomplete → Confirmed
Changed in makedumpfile (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This is now in eoan-proposed. Please verify. I will start the backport process when it hits eoan.

Thanks.
Cascardo.

Changed in makedumpfile (Ubuntu Eoan):
status: Confirmed → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-06-24 07:49 EDT-------
Thanks for the change. With it, try-restart is being triggered for
kdump-tools service after CPU add operation but systemd reported
failure with below logs:

Jun 24 06:47:06 ubuntu systemd[1]: Stopped Kernel crash dump capture service.
Jun 24 06:47:06 ubuntu systemd[1]: Starting Kernel crash dump capture service...
Jun 24 06:47:06 ubuntu kdump-tools[2023]: Starting kdump-tools: * Creating symlink /var/lib/kdump/vmlinuz
Jun 24 06:47:06 ubuntu kdump-tools[2023]: * Creating symlink /var/lib/kdump/initrd.img
Jun 24 06:47:06 ubuntu kdump-tools[2023]: Modified cmdline:BOOT_IMAGE=/vmlinux-5.0.0-17-generic root=/dev/mapper/ubuntu--vg-root ro systemd.unit=kdump-tools-dump.service maxcpus=1 irqpo
Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Main process exited, code=killed, status=15/TERM
Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Failed with result 'signal'.
Jun 24 06:47:06 ubuntu systemd[1]: Stopped Kernel crash dump capture service.
Jun 24 06:47:06 ubuntu systemd[1]: Starting Kernel crash dump capture service...
Jun 24 06:47:06 ubuntu kdump-tools[2071]: Starting kdump-tools: * Creating symlink /var/lib/kdump/vmlinuz
Jun 24 06:47:06 ubuntu kdump-tools[2071]: * Creating symlink /var/lib/kdump/initrd.img
Jun 24 06:47:06 ubuntu kdump-tools[2071]: Modified cmdline:BOOT_IMAGE=/vmlinux-5.0.0-17-generic root=/dev/mapper/ubuntu--vg-root ro systemd.unit=kdump-tools-dump.service maxcpus=1 irqpo
Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Main process exited, code=killed, status=15/TERM
Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Failed with result 'signal'.
Jun 24 06:47:06 ubuntu systemd[1]: Stopped Kernel crash dump capture service.
Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Start request repeated too quickly.
Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Failed with result 'signal'.
Jun 24 06:47:06 ubuntu systemd[1]: Failed to start Kernel crash dump capture service.

---
Looks like a ratelimit issue with systemd. Is there some systemd option to workaround it?

I am running the below command on a PowerVM machine:

# drmgr -c cpu -r -q 1 (to remove a core)
# drmgr -c cpu -a -q 1 (to add it back -> this triggers 8 CPU online udev events as SMT is 8)

To conclude, udev rule alone is not sufficient. Need a way to address the multiple
requests at once..

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu2

---------------
makedumpfile (1:1.6.5-1ubuntu2) eoan; urgency=medium

  [ Thadeu Lima de Souza Cascardo ]
  * Use maxcpus instead of nr_cpus on ppc64el. (LP: #1828597)
  * Reload kdump when CPU is brought online. (LP: #1828596)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Fri, 14 Jun 2019 10:58:40 -0300

Changed in makedumpfile (Ubuntu Eoan):
status: Fix Committed → Fix Released
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
description: updated
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote : Re: [Bug 1828596] Comment bridged from LTC Bugzilla

On Mon, Jun 24, 2019 at 11:59:48AM -0000, bugproxy wrote:
> ------- Comment From <email address hidden> 2019-06-24 07:49 EDT-------
> Thanks for the change. With it, try-restart is being triggered for
> kdump-tools service after CPU add operation but systemd reported
> failure with below logs:
>
> Jun 24 06:47:06 ubuntu systemd[1]: Stopped Kernel crash dump capture service.
> Jun 24 06:47:06 ubuntu systemd[1]: Starting Kernel crash dump capture service...
> Jun 24 06:47:06 ubuntu kdump-tools[2023]: Starting kdump-tools: * Creating symlink /var/lib/kdump/vmlinuz
> Jun 24 06:47:06 ubuntu kdump-tools[2023]: * Creating symlink /var/lib/kdump/initrd.img
> Jun 24 06:47:06 ubuntu kdump-tools[2023]: Modified cmdline:BOOT_IMAGE=/vmlinux-5.0.0-17-generic root=/dev/mapper/ubuntu--vg-root ro systemd.unit=kdump-tools-dump.service maxcpus=1 irqpo
> Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Main process exited, code=killed, status=15/TERM
> Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Failed with result 'signal'.
> Jun 24 06:47:06 ubuntu systemd[1]: Stopped Kernel crash dump capture service.
> Jun 24 06:47:06 ubuntu systemd[1]: Starting Kernel crash dump capture service...
> Jun 24 06:47:06 ubuntu kdump-tools[2071]: Starting kdump-tools: * Creating symlink /var/lib/kdump/vmlinuz
> Jun 24 06:47:06 ubuntu kdump-tools[2071]: * Creating symlink /var/lib/kdump/initrd.img
> Jun 24 06:47:06 ubuntu kdump-tools[2071]: Modified cmdline:BOOT_IMAGE=/vmlinux-5.0.0-17-generic root=/dev/mapper/ubuntu--vg-root ro systemd.unit=kdump-tools-dump.service maxcpus=1 irqpo
> Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Main process exited, code=killed, status=15/TERM
> Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Failed with result 'signal'.
> Jun 24 06:47:06 ubuntu systemd[1]: Stopped Kernel crash dump capture service.
> Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Start request repeated too quickly.
> Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Failed with result 'signal'.
> Jun 24 06:47:06 ubuntu systemd[1]: Failed to start Kernel crash dump capture service.
>
> ---
> Looks like a ratelimit issue with systemd. Is there some systemd option to workaround it?
>
> I am running the below command on a PowerVM machine:
>
> # drmgr -c cpu -r -q 1 (to remove a core)
> # drmgr -c cpu -a -q 1 (to add it back -> this triggers 8 CPU online udev events as SMT is 8)
>
> To conclude, udev rule alone is not sufficient. Need a way to address the multiple
> requests at once..

There are these systemd options, which default to a burst limit of 5 restart in
the interval of 10s.

       StartLimitIntervalSec=interval, StartLimitBurst=burst

One other option that I prefer, howerver, is resetting the start rate limit
counter by using systemctl reset-failed kdump-tools.service on the udev rule.

Can you try that?

Thanks.
Cascardo.

Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Confirmed → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-07-15 06:36 EDT-------
Cascardo, I did not tinker with other options but disabling ratelimit helped:

"StartLimitInterval=0"

"systemctl reset-failed kdump-tools.service" seems like a good option but
may not be needed if ratelimit is disabled..

Thanks
Hari

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Disabling the ratelimit in general would break other failure modes, so I would rather just reset-failed when calling try-restart because of the hotplug events.

Can you try the package in ppa:cascardo/kdump2? Packages for eoan, disco and bionic available.

Thanks.
Cascardo.

description: updated
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-07-25 05:43 EDT-------
(In reply to comment #27)
> Disabling the ratelimit in general would break other failure modes, so I
> would rather just reset-failed when calling try-restart because of the
> hotplug events.
>
> Can you try the package in ppa:cascardo/kdump2? Packages for eoan, disco and
> bionic available.

Cascardo, is the fix package you are proposing still here? I see the below
package version:

ii kdump-tools 1:1.6.5-1ubuntu2~18.04.1

which doesn't seem to have "systemctl reset-failed kdump-tools" invoked anywhere.
I was trying this out on bionic with 5.0.0-17-generic kernel and the issue is reproducible..

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Hari, did you manage to try the package in https://launchpad.net/~cascardo/+archive/ubuntu/kdump2?

I've downloaded the kdump-tools deb package for ppc64 from the above PPA, and could check that it contains the udev rule:
"SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"

I understand by reading the latest comments that above rule is the fix for this LP, correct?
Can you manually download the package from the above PPA, install it and verify that /lib/udev/rules.d/50-kdump-tools.rules contains the fixing rule?

In case it has that and still fails your testing, then we need to understand why the fix is not working.
Thanks,

Guilherme

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-07-26 06:48 EDT-------
Guilherme, the initial fix (udev rule) is still available. But while testing I observed failure
due to systemd ratelimiting. I proposed to disable ratelimit but IIUC, Cascardo
preferred a different approach that does not involve disabling systemd ratelimit
and provided an updated package with a different approach to solve ratelimiting.
My recent comment is that there is no updated package but just the initial fix.
Hope that clears it up..

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Hari, thanks for clarifying! I can now understand, seems we need to wait for Cascardo's input, to see if he already implemented the systemd reset-failed thing or not.

Cheers,

Guilherme

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

It was implemented, but the upload did not build on that ppa, because I used different versions. I am still catching up after vacation time, so will post some updates as soon as I have them.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Incomplete → Triaged
Eric Desrochers (slashd)
Changed in makedumpfile (Ubuntu Disco):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
status: New → In Progress
Changed in makedumpfile (Ubuntu Bionic):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
status: New → In Progress
Changed in makedumpfile (Ubuntu Cosmic):
status: New → Won't Fix
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Disco):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco
Revision history for this message
Andy Whitcroft (apw) wrote :

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1~18.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : kdump tools service is not restarting with the latest change to udev rules

------- Comment on attachment From <email address hidden> 2019-08-29 08:34 EDT-------

udev rules are not triggering kdump-tools service restart after hot adding
CPU or hot adding/removing memory with kdump-tools package version
1.6.5-1ubuntu1~18.04.2

tags: added: verification-failed-bionic
removed: verification-needed-bionic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (makedumpfile/1:1.6.5-1ubuntu1.1)

All autopkgtests for the newly accepted makedumpfile (1:1.6.5-1ubuntu1.1) for disco have finished running.
The following regressions have been reported in tests triggered by the package:

makedumpfile/1:1.6.5-1ubuntu1.1 (s390x, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/disco/update_excuses.html#makedumpfile

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
bugproxy (bugproxy) wrote : kdump tools service is not restarting with the latest change to udev rules (Disco)

------- Comment (attachment only) From <email address hidden> 2019-08-30 02:48 EDT-------

bugproxy (bugproxy)
tags: added: verification-failed verification-failed-disco
removed: verification-needed verification-needed-disco
Mathew Hodson (mhodson)
no longer affects: kexec-tools (Ubuntu Eoan)
no longer affects: kexec-tools (Ubuntu Disco)
no longer affects: kexec-tools (Ubuntu Cosmic)
no longer affects: kexec-tools (Ubuntu Bionic)
no longer affects: kexec-tools (Ubuntu Xenial)
no longer affects: kexec-tools (Ubuntu)
Revision history for this message
Steve Langasek (vorlon) wrote : Proposed package removed from archive

The version of makedumpfile in the proposed pocket of Bionic that was purported to fix this bug report has been removed because one or more bugs that were to be fixed by the upload have failed verification and been in this state for more than 10 days.

Changed in makedumpfile (Ubuntu Bionic):
status: Fix Committed → Won't Fix
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, can you try the package in ppa:cascardo/ppa ?

Thanks.
Cascardo.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

There is some interaction between systemd MemoryDenyWriteExecute=yes setting on udevd and how grep has been built (possibly pointing out at the toolchain), so the new solution on the ppa isn't working on bionic.

We will work on this bug, and see how this behaves on disco and eoan. In case either of those is fine, we will ask IBM to test it there, while we move forward with this systemd/toolchain interaction bug.

Thanks.
Cascardo.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1844524

This should be fixed by rebuilding grep. I have uploaded grep to my ppa, so if you install kdump-tools and grep from ppa:cascardo/ppa, you will be able to test this.

Can you please do it, so we reduce the risk of the next upload being remove from -proposed again?

Thanks.
Cascardo.

Changed in makedumpfile (Ubuntu Eoan):
status: Fix Released → In Progress
Changed in makedumpfile (Ubuntu Disco):
status: Fix Committed → In Progress
Changed in makedumpfile (Ubuntu Bionic):
status: Won't Fix → In Progress
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

A fixed grep is already in bionic-updates. Either that or the one on my ppa must be installed in order for the makedumpfile version in my ppa to work. I will wait for testing feeback before I get this fix uploaded to eoan, disco and bionic.

Thanks.
Cascardo.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Hari, did you have a chance to re-test using the latest version Cascardo pointed in his last comment?

We wait on your testing to be sure all is working now, and we can re-upload the package to -proposed pocket.
Thanks in advance,

Guilherme

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-09-23 13:28 EDT-------
Sorry about the delay. Observed that kdump/fadump is loaded even when
kdump-tools service is disabled. Not desirable, I guess. Probably need to
check if kdump-tools service is active before trying a reload?

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Hari.

makedumpfile 1:1.6.5-1ubuntu1~18.04.2+cascardo2 on ppa:cascardo/ppa uses a try-reload instead. Can you test it, please?

Thanks.
Cascardo.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-09-25 02:41 EDT-------
(In reply to comment #47)
> Hi, Hari.
>
> makedumpfile 1:1.6.5-1ubuntu1~18.04.2+cascardo2 on ppa:cascardo/ppa uses a
> try-reload instead. Can you test it, please?

Cascardo, try-reload is not considering fadump case (supported on powerpc).
For fadump case, need to check whether "/sys/kernel/fadump_registered" is `1`
before proceeding with unload/load..

A suggestion I have is to check for "systemctl is-active kdump-tools" and run
"kdump-config reload" if it returns true, instead of "kdump-config try-reload"
as that should cover for both kdump and fadump cases.

Also, shouldn't we account for races when multiple udev events are triggered
simultaneously by using locks or such?

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

New version on ppa:cascardo/ppa for bionic. That should handle fadump and lock in the case of try-reload/condreload.

Hari, can you give it a try?

Thanks.
Cascardo.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-10-25 06:29 EDT-------
(In reply to comment #49)
> New version on ppa:cascardo/ppa for bionic. That should handle fadump and
> lock in the case of try-reload/condreload.
>
> Hari, can you give it a try?

Thanks, Cascardo. That works well for FADump case.
But is failing for KDump case though. With the below change
on top of the kdump-tools package you shared, things work
as expected for KDump case too:

---
diff --git a/usr/sbin/kdump-config.orig b/usr/sbin/kdump-config
index 08fe301..fd5e469 100755
--- a/usr/sbin/kdump-config.orig
+++ b/usr/sbin/kdump-config
@@ -923,7 +923,7 @@ reload()
condreload()
{
- local $sys_loaded="$sys_kexec_crash"
+ local sys_loaded="$sys_kexec_crash"
if [ "$DUMP_MODE" == "fadump" ] ; then
check_fadump_support
sys_loaded="$sys_fadump_registered"
---

Thanks
Hari

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Changed in makedumpfile (Ubuntu Disco):
status: In Progress → Won't Fix
Changed in makedumpfile (Ubuntu Xenial):
status: New → In Progress
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Dan Streetman (ddstreet)
Changed in makedumpfile (Ubuntu Disco):
status: Won't Fix → In Progress
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Focal was uploaded but it's failing in ppc64el again (see LP #1851663).
I've submitted a merge request[0] in Britney to skip this test once more.

Cheers,

Guilherme

[0] https://code.launchpad.net/~gpiccoli/britney/hints-ubuntu/+merge/377160

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

A recent issue on makedumpfile was reported and investigated on LP#1857616. Due to the relevance of that, it makes sense to respin the debdiffs. So, I've reworked debdiffs for Bionic and Eoan, that should replace the ones above. Disco is not affected.
Since those packages are not in -proposed, the respin is on top of the same version, to replace the old upload.

Now, Focal is present in -proposed for some time, so the respin is on top of -proposed version (which I expect to be released to -updates soon), with version increment.
Thanks,

Guilherme

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

I'm re-attaching the untouched disco debdiff, just for organization/completeness.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

I'm re-attaching the untouched Xenial debdiff, just for organization/completeness.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.6-4ubuntu1

---------------
makedumpfile (1:1.6.6-4ubuntu1) focal; urgency=medium

  [ Thadeu Lima de Souza Cascardo ]
  * Merge from Debian unstable. Remaining changes:
    - Bump amd64 crashkernel from 384M-:128M to 512M-:192M.
  * Use reset_devices as a cmdline parameter. (LP: #1800566)
  * Use kdump-config reload after cpu or memory hotplug. (LP: #1828596)

  [ Guilherme G. Piccoli ]
  * Add a systemd-resolved service dependency in order kdump-tools is able
    to resolve DNS when in kdump boot. (LP: #1856323)

makedumpfile (1:1.6.6-4) unstable; urgency=medium

  * Let the kernel decide the crashkernel offset for ppc64el (LP: #1741860)
  * kdump-config: implement try-reload
  * udev: hotplug: use try-reload
  * Set Rules-Requires-Root to no

makedumpfile (1:1.6.6-3) unstable; urgency=medium

  * Add a reload command.
  * Use kdump-config reload after cpu or memory hotplug.
  * Use reset_devices as a cmdline parameter.

 -- Thadeu Lima de Souza Cascardo <email address hidden> Wed, 18 Dec 2019 14:38:51 -0300

Changed in makedumpfile (Ubuntu Focal):
status: In Progress → Fix Released
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Guilherme, Thadeu,

The re-spinned packages have been uploaded to Bionic and Eoan.

Xenial didn't change, and is still in the upload queue (which
I'll go check about on next week.)

Disco didn't change, and has been removed from the upload queue.
(After discussions with other people, it turns out uploading to
Disco is not necessary given the short span until its EOL date.)

cheers,
Mauricio

Changed in makedumpfile (Ubuntu Disco):
status: In Progress → Won't Fix
assignee: Thadeu Lima de Souza Cascardo (cascardo) → nobody
Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.6-2ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Eoan):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-eoan
removed: verification-failed
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (makedumpfile/1:1.6.6-2ubuntu2)

All autopkgtests for the newly accepted makedumpfile (1:1.6.6-2ubuntu2) for eoan have finished running.
The following regressions have been reported in tests triggered by the package:

makedumpfile/1:1.6.6-2ubuntu2 (i386, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/eoan/update_excuses.html#makedumpfile

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1~18.04.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
removed: verification-failed-bionic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (makedumpfile/1:1.6.5-1ubuntu1~18.04.4)

All autopkgtests for the newly accepted makedumpfile (1:1.6.5-1ubuntu1~18.04.4) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

makedumpfile/1:1.6.5-1ubuntu1~18.04.4 (ppc64el, s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#makedumpfile

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

bugproxy (bugproxy)
tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks for the verification! Can you please verify the Eoan version too?
Cheers,

Guilherme

Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.3-2~16.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-01-15 15:34 EDT-------
Verified successfully on eoan with kdump-tools
version 1:1.6.6-2ubuntu2

------- Comment From <email address hidden> 2020-01-15 15:36 EDT-------
Verified successfully on xenial with kdump-tools version
1:1.6.3-2~16.04.2

------- Comment From <email address hidden> 2020-01-15 15:38 EDT-------
Verified successfully on bionic with kdump-tools version
1:1.6.5-1ubuntu1~18.04.4

tags: added: verification-done-eoan verification-done-xenial
removed: verification-needed-eoan verification-needed-xenial
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-01-15 15:44 EDT-------
With EOL for disco this month, marking as verification done
as this is verified successfully on bionic, eoan & xenial.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Frank Heimes (fheimes) wrote :

Yes, that's fine - disco was already set to Won't Fix on the Launchpad bug side, since it will reach it's EOL on Jan the 23rd.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hello Hari,

Glad to see you around on bugs again. :)

Could you please confirm whether the fix for this bug also addresses LP bug 1655280?
(i.e., the verification for xenial would be equivalent/also validate that other bug.)

Thanks,
Mauricio

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-01-16 11:02 EDT-------
(In reply to comment #78)
> Hello Hari,
>
> Glad to see you around on bugs again. :)

Thanks, Mauricio :)

> Could you please confirm whether the fix for this bug also addresses LP bug
> 1655280?
> (i.e., the verification for xenial would be equivalent/also validate that
> other bug.)

Yes, it does. Updated that bug..

bugproxy (bugproxy)
tags: added: targetmilestone-inin18043
removed: targetmilestone-inin---
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for makedumpfile has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.6-2ubuntu2

---------------
makedumpfile (1:1.6.6-2ubuntu2) eoan; urgency=medium

  [ Thadeu Lima de Souza Cascardo ]
  * Fixes for DLPAR cpu add operation (LP: #1828596)
    - d/kdump-config.in: Add a reload command.
    - d/kdump-config.in: implement try-reload.
    - d/50-kdump-tools.rules: Use kdump-config reload after cpu or memory hotplug
    - d/50-kdump-tools.rules: use try-reload instead.
  * d/rules: Use reset_devices as a cmdline parameter. (LP: #1800566)

  [ Guilherme G. Piccoli ]
  * d/kdump-tools-dump.service: Add a systemd-resolved service dependency
    in order to make kdump-tool able to resolve DNS when in kdump boot.
    (LP: #1856323)
  * d/p/0003-Increase-SECTION_MAP_LAST_BIT-to-4.patch: x86_64: Fix an error due
    to makedumpfile being out-of-sync with recent kernels. (LP: #1857616)

 -- <email address hidden> (Guilherme G. Piccoli) Fri, 03 Jan 2020 16:10:19 -0300

Changed in makedumpfile (Ubuntu Eoan):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu1~18.04.4

---------------
makedumpfile (1:1.6.5-1ubuntu1~18.04.4) bionic; urgency=medium

  [ Thadeu Lima de Souza Cascardo ]
  * Fixes for DLPAR cpu add operation (LP: #1828596)
    - d/kdump-config.in: Add a reload command.
    - d/kdump-config.in: implement try-reload.
    - d/50-kdump-tools.rules: Use kdump-config reload after cpu or memory hotplug
    - d/50-kdump-tools.rules: use try-reload instead.
  * d/rules: Use reset_devices as a cmdline parameter. (LP: #1800566)

  [ Guilherme G. Piccoli ]
  * d/kdump-tools-dump.service: Add a systemd-resolved service dependency
    in order to make kdump-tool able to resolve DNS when in kdump boot.
    (LP: #1856323)
  * Fix an error due to makedumpfile being out-of-sync with recent kernels.
    (LP: #1857616)
    - d/p/0004-x86_64-fix-get_kaslr_offset_x86_64-to-return-kaslr_offset-correctly.patch
    - d/p/0005-Increase-SECTION_MAP_LAST_BIT-to-4.patch

 -- <email address hidden> (Guilherme G. Piccoli) Fri, 03 Jan 2020 13:14:39 -0300

Changed in makedumpfile (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.3-2~16.04.2

---------------
makedumpfile (1:1.6.3-2~16.04.2) xenial; urgency=medium

  * Let the kernel decide the crashkernel offset for ppc64el (LP: #1741860)
  * Reload kdump after memory/CPU hotplug. (LP: #1655280)
  * Use a different service for vmcore dump. (LP: #1811692)
  * Reload kdump when CPU is brought online. (LP: #1828596)
  * Add a reload command. (LP: #1828596)
  * kdump-config: implement try-reload (LP: #1828596)
  * udev: hotplug: use try-reload (LP: #1828596)
  * Use reset_devices as a cmdline parameter. (LP: #1800566)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Wed, 18 Dec 2019 16:06:16 -0300

Changed in makedumpfile (Ubuntu Xenial):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.