ISST-LTE:pVM:roselp4:ubuntu 16.04: cp: error reading '/proc/vmcore': Bad address when trying to dump vmcore

Bug #1655280 reported by bugproxy
24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Canonical Kernel Team
makedumpfile (Ubuntu)
Fix Released
High
Unassigned
Xenial
Fix Released
High
Unassigned
Bionic
Fix Released
High
Unassigned
Cosmic
Fix Released
High
Unassigned
Disco
Fix Released
High
Unassigned

Bug Description

[Impact]
After a DLPAR memory/CPU add/remove operation, kdump tools need to be restarted or else kdump tools will fail to capture the crash.

[Test]

== Comment: #0 - Ping Tian Han <email address hidden> - 2017-01-09 02:51:00 ==
---Problem Description---
Vmcore cannot be saved when triggering bug 150353 on roselp4:

Copying data : [ 2.0 %] \/usr/sbin/kdump-config: line 591: 5502 Bus error makedumpfile $MAKEDUMP_ARGS $vmcore_file $KDUMP_CORETEMP
[ 512.833872] kdump-tools[5450]: * kdump-tools: makedumpfile failed, falling back to 'cp'
[ 573.595449] kdump-tools[5450]: cp: error reading '/proc/vmcore': Bad address
[ 573.605717] kdump-tools[5450]: * kdump-tools: failed to save vmcore in /var/crash/201701090223
[ 573.765417] kdump-tools[5450]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/201701090223/dmesg.201701090223
[ 574.285506] kdump-tools[5450]: The kernel version is not supported.
[ 574.285672] kdump-tools[5450]: The makedumpfile operation may be incomplete.
[ 574.285767] kdump-tools[5450]: The dmesg log is saved to /var/crash/201701090223/dmesg.201701090223.
[ 574.305422] kdump-tools[5450]: makedumpfile Completed.
[ 574.315363] kdump-tools[5450]: * kdump-tools: saved dmesg content in /var/crash/201701090223
[ 574.615688] kdump-tools[5450]: Mon, 09 Jan 2017 02:24:26 -0600
[ 574.705384] kdump-tools[5450]: Rebooting.
         Stopping ifup for ib0...
[ OK ] Stopped ifup for ib0.
[ 1008.579897] reboot: Restarting system

Contact Information = Ping Tian <email address hidden> Carrie <email address hidden>

---uname output---
Linux roselp4 4.8.0-34-generic #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = lpar

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. config kdump on roselp4
2. try to trigger bug 150353

*Additional Instructions for Ping Tian <email address hidden> Carrie <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.

== Comment: #3 - Brahadambal Srinivasan <email address hidden> - 2017-01-10 02:42:25 ==

root@roselp4:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinux-4.8.0-34-generic root=UUID=0bcf3431-df8b-499c-9a13-33070f242e0c ro splash quiet crashkernel=384M-:512M

root@roselp4:~# dmesg | grep Reser
[ 0.000000] Reserving 512MB of memory at 128MB for crashkernel (System RAM: 21760MB)

[Regression Potential]
The fix applies to makedumpfile, and could impact dump capture.

Revision history for this message
bugproxy (bugproxy) wrote : /etc/default/kdump-tools

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-150355 severity-high targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-01-10 12:59 EDT-------
Assigning to Hari for some assistance but would like to know if this issue can be recreated consistently and if so how and does this occur on other LPARs running the same level?

tags: added: targetmilestone-inin16042
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-01-10 20:20 EDT-------
(In reply to comment #7)
> Assigning to Hari for some assistance but would like to know if this issue
> can be recreated consistently and if so how and does this occur on other
> LPARs running the same level?

Looks like this bug only can be reproduced on roselp4 when triggering bug 150353. But looks like it is quite easy to trigger it when testing dlpar.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-01-23 01:45 EDT-------
Hi Canonical,

After a DLPAR operation, kdump-tools.service must be restarted for kdump/fadump
to work properly..

Thanks
Hari

Revision history for this message
Manoj Iyer (manjo) wrote : Re: ISST-LTE:pVM:roselp4:ubuntu 16.04.2: cp: error reading '/proc/vmcore': Bad address when trying to dump vmcore

Steve, could your team please take a look ?

Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Steve Langasek (vorlon)
importance: Undecided → High
Steve Langasek (vorlon)
Changed in linux (Ubuntu):
assignee: Steve Langasek (vorlon) → Canonical Kernel Team (canonical-kernel-team)
Manoj Iyer (manjo)
tags: added: ubuntu-16.04
Revision history for this message
bugproxy (bugproxy) wrote : console log

------- Comment (attachment only) From <email address hidden> 2017-07-14 03:58 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-07-14 05:02 EDT-------
Issue is observed even on 16.04.03. When crash is triggered after memory remove operation.

Thanks,
Pavithra

Manoj Iyer (manjo)
Changed in linux (Ubuntu):
importance: High → Medium
importance: Medium → High
Changed in ubuntu-power-systems:
importance: Undecided → High
Manoj Iyer (manjo)
tags: added: triage-r
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: New → Triaged
tags: added: kernel-da-key
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote : Re: ISST-LTE:pVM:roselp4:ubuntu 16.04.2: cp: error reading '/proc/vmcore': Bad address when trying to dump vmcore

Hi, hbathini.

Why is it needed to restart kdump-tools after a DLPAR? Does it refer to a memory hotplug operation? What is LTC bug #150353? Is it mirrored to launchpad already?

Thank you very much.
Cascardo.

Changed in linux (Ubuntu):
status: New → In Progress
assignee: Canonical Kernel Team (canonical-kernel-team) → Thadeu Lima de Souza Cascardo (cascardo)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-08-22 10:32 EDT-------
(In reply to comment #20)
> Hi, hbathini.
>

Hello Cascardo,

> Why is it needed to restart kdump-tools after a DLPAR? Does it refer to a

Yes. It refers to either of memory or CPU hot add/remove operations, when
kdump kernel needs to be reloaded - to rebuild the vmcore elf notes.
> memory hotplug operation? What is LTC bug #150353? Is it mirrored to
> launchpad already?

Yes, it is mirrored. LTC Bug 150353 refers to Launchpad bug 1658968.

Thanks
Hari

Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
status: Triaged → In Progress
tags: added: triage-g
removed: triage-r
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote : Re: ISST-LTE:pVM:roselp4:ubuntu 16.04.2: cp: error reading '/proc/vmcore': Bad address when trying to dump vmcore

Hi, Hari.

My understanding is that the kernel should handle that. Otherwise, whenever the user manually loads a different kdump kernel, a memory hotplug would cause the default kdump kernel to be loaded, not what the user has loaded.

Can you comment on that?

Regards.
Cascardo.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-08-31 04:54 EDT-------
(In reply to comment #22)
> Hi, Hari.

Hello Cascardo,

>
> My understanding is that the kernel should handle that. Otherwise, whenever

Sadly, it can't as kexec-tools is the one that creates elf headers for /proc/vmcore.
So, with a change in available CPUs/memory, kdump kernel needs to be reloaded.
Probably, with a udev event that does a try-restart of kdump-tool service on
CPU/Memory hot add/remove operation.

> the user manually loads a different kdump kernel, a memory hotplug would
> cause the default kdump kernel to be loaded, not what the user has loaded.

Can't do much about it. If need be, a user has the option to workaround this
problem by adjusting the settings in /etc/default/kdump-tools file to load
a different kdump kernel..

Thanks
Hari

Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: ISST-LTE:pVM:roselp4:ubuntu 16.04.2: cp: error reading '/proc/vmcore': Bad address when trying to dump vmcore

bug 1664545 may be related.

Manoj Iyer (manjo)
tags: added: triage-r
removed: triage-g
tags: added: triage-a
removed: triage-r
tags: added: ppc64el-kdump
Revision history for this message
Manoj Iyer (manjo) wrote :

IBM, could you please confirm this bug is already fixed using the workaround mentioned here and in bug 1664545

summary: - ISST-LTE:pVM:roselp4:ubuntu 16.04.2: cp: error reading '/proc/vmcore':
- Bad address when trying to dump vmcore
+ ISST-LTE:pVM:roselp4:ubuntu 16.04: cp: error reading '/proc/vmcore': Bad
+ address when trying to dump vmcore
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Frank Heimes (fheimes)
tags: added: triage-g
removed: triage-a
Revision history for this message
bugproxy (bugproxy) wrote : /etc/default/kdump-tools

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : console log

------- Comment (attachment only) From <email address hidden> 2017-07-14 03:58 EDT-------

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

So, the workaround would be reloading the dump after DLPAR. In order to do that, run:

kdump-config unload ; kdump-config load

Regards.
Cascardo.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

It's not clear what the logs in comments #16 and #17 refer to. Could you please clarify?

Thanks.
Cascardo.

tags: removed: kernel-da-key
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-03-08 14:01 EDT-------
(In reply to comment #29)
> So, the workaround would be reloading the dump after DLPAR. In order to do
> that, run:
>
> kdump-config unload ; kdump-config load
>

The "console log" attachment is basically the error seen when the above workaround
is not used.

>
> It's not clear what the logs in comments #16 and #17 refer to. Could you
> please clarify?

The other attachment is the sosreport on the failed system..

Thanks
Hari

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Could you please comment as to whether the workaround resolved the issue?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-21 09:04 EDT-------
(In reply to comment #31)
> Could you please comment as to whether the workaround resolved the issue?

Yes, the workaround resolves the issue. But could we have a udev rule added in kdump-tools
to trigger kdump-tools.service restart after a DLPAR memory/CPU add/remove operation
instead of manually restarting kdump-tools service after a DLPAR operation..

Thanks
Hari

Changed in ubuntu-power-systems:
status: Incomplete → Triaged
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-22 13:18 EDT-------
(In reply to comment #32)
> (In reply to comment #31)
> > Could you please comment as to whether the workaround resolved the issue?
>
> Yes, the workaround resolves the issue. But could we have a udev rule added
> in kdump-tools
> to trigger kdump-tools.service restart after a DLPAR memory/CPU add/remove
> operation
> instead of manually restarting kdump-tools service after a DLPAR operation..

Basically, a udev rule file that looks like this:

SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl try-restart kdump-tools.service"

put in "/lib/udev/rules.d" dir as part of the kdump-tools package to avoid the need to manually
reload kdump..

Thanks
Hari

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
tags: added: triage-r
removed: triage-g
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Can you try the package in ppa:cascardo/ppa?

Thanks.
Cascardo.

Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (4.9 KiB)

------- Comment From <email address hidden> 2018-08-11 08:48 EDT-------
After upgrading to kdump-tools_1.6.4-1~16.04.0cascardo2_ppc64el.deb package,
dump capture succeeds without any complaints:

The udev rules trigger kdump-tools service reload as can be seen below:
--
root@ubuntu:~# dpkg -l | grep kdump-tools
ii kdump-tools 1:1.6.4-1~16.04.0cascardo2 ppc64el scripts and tools for automating kdump (Linux crash dumps)
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~# systemctl status kdump-tools.service -l
? kdump-tools.service - Kernel crash dump capture service
Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled; vendor pres
Active: active (exited) since Sat 2018-08-11 08:33:27 EDT; 2min 8s ago
Process: 25166 ExecStop=/etc/init.d/kdump-tools stop (code=exited, status=0/SU
Process: 25200 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0/
Main PID: 25200 (code=exited, status=0/SUCCESS)

Aug 11 08:33:26 ubuntu systemd[1]: Starting Kernel crash dump capture service...
Aug 11 08:33:26 ubuntu kdump-tools[25200]: Starting kdump-tools: * Creating sym
Aug 11 08:33:26 ubuntu kdump-tools[25200]: * Creating symlink /var/lib/kdump/in
Aug 11 08:33:26 ubuntu kdump-tools[25200]: Modified cmdline:BOOT_IMAGE=/boot/vml
Aug 11 08:33:26 ubuntu kdump-tools[25200]: * loaded kdump kernel
Aug 11 08:33:26 ubuntu kdump-tools[25248]: /sbin/kexec -p --command-line="BOOT_I
Aug 11 08:33:27 ubuntu kdump-tools[25249]: loaded kdump kernel
Aug 11 08:33:27 ubuntu systemd[1]: Started Kernel crash dump capture service.
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~# drmgr -c mem -r -q 1
Validating Memory DLPAR capability...yes.
[1459216.723209] pseries-hotplug-mem: Attempting to hot-remove 1 LMB(s)
[1459216.749014] Offlined Pages 4096
[1459193.925176] kdump-tools[25419]: Stopping kdump-tools: * unloaded kdump kernel
[1459193.985143] kdump-tools[25453]: Starting kdump-tools: * Creating symlink /var/lib/kdump/vmlinuz
[1459193.986120] kdump-tools[25453]: * Creating symlink /var/lib/kdump/initrd.img
[1459194.024961] kdump-tools[25453]: Modified cmdline:BOOT_IMAGE=/boot/vmlinux-4.15.0-24-generic root=UUID=1aa9458c-3974-4cb4-9ab3-9ee03c0f4e5e ro xmon=on nr_cpus=1 systemd.unit=kdump-tools.service irqpoll noirqdistrib nousb elfcorehdr=158144K
[1459217.005820] pseries-hotplug-mem: Memory at 40000000 was hot-removed
DR_TOTAL_RESOURCES=1
root@ubuntu:~# [1459194.297073] kdump-tools[25453]: * loaded kdump kernel

root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~# systemctl status kdump-tools.service -l
? kdump-tools.service - Kernel crash dump capture service
Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled; vendor pres
Active: active (exited) since Sat 2018-08-11 08:35:47 EDT; 12s ago
Process: 25419 ExecStop=/etc/init.d/kdump-tools stop (code=exited, status=0/SU
Process: 25453 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0/
Main PID: 25453 (code=exited, status=0/SUCCESS)

Aug 11 08:35:47 ubuntu systemd[1]: Starting Kernel crash dump capture service...
Aug 11 08:35:47 ubuntu kdump-tools[25453]: Starting kdump-tools: * Creating sym
Aug 11...

Read more...

Changed in ubuntu-power-systems:
status: Incomplete → In Progress
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This fix has been uploaded to Debian, and will be synced to 19.04 when it opens. After that, it will be backported to cosmic, bionic and xenial.

Thanks.
Cascardo.

Changed in linux (Ubuntu):
status: In Progress → Invalid
Changed in makedumpfile (Ubuntu):
status: New → In Progress
importance: Undecided → High
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

I am working on the sync/merge from Debian to Ubuntu.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This should be in disco soon.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu1

---------------
makedumpfile (1:1.6.5-1ubuntu1) disco; urgency=low

  [ Ubuntu Merge-o-Matic ]
  * Merge from Debian unstable. Remaining changes:
    - Bump amd64 crashkernel from 384M-:128M to 512M-:192M.

  [ Thadeu Lima de Souza Cascardo ]
  * Use a different service for vmcore dump. (LP: #1811692)

makedumpfile (1:1.6.5-1) unstable; urgency=medium

  * Update to new upstream version 1.6.5.
  * debian: remove debian/source/local-options
  * [i18n] Move PT debconf translation (Closes: #910465)

makedumpfile (1:1.6.4-3) unstable; urgency=medium

  * Reload kdump after memory/CPU hotplug. (LP: #1655280)
  * Fix adding crashkernel to zipl.conf when no quotation mark is used.
    (LP: #1790788)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Mon, 14 Jan 2019 15:42:44 -0200

Changed in makedumpfile (Ubuntu Disco):
status: In Progress → Fix Released
Revision history for this message
Manoj Iyer (manjo) wrote :

Required fix is to makedumpfile package therefore marking the linux track as invalid.

Changed in linux (Ubuntu Bionic):
status: New → Invalid
Revision history for this message
Manoj Iyer (manjo) wrote :

Since the required fix is in makedumpfile, marking the linux track as invalid.

Changed in linux (Ubuntu Xenial):
status: New → Invalid
Revision history for this message
Manoj Iyer (manjo) wrote :

@cascardo, the issue was reported in Xenial, now that the fix is released in Disco does it make it a good candidate for a backport to Xenial and Bionic?

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, @manjo.

This is in progress, with some versions of the package in the SRU queue. First thing we need is add the SRU template on this bug. Can you help with that?

Thanks.
Cascardo.

Manoj Iyer (manjo)
description: updated
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Thx Manoj for updating the SRU template.
@thadeu, does that give everything that's needed?

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Andrew.

We needed a few other SRUs for bugs, including one introduced by this fix, which is LP #1811692, as they went all on the same upload. Now that that is left out of the way, I will ask someone from the SRU team to look at this.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Update: LP#1811692 is now "Fix Released". makedumpfile was added to the SRU queue on 11th Feb. Waiting on SRU queue approval.

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1~18.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Cosmic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1~18.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-05-07 15:03 EDT-------
While I initially suggested the udev below rules:

SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl try-restart kdump-tools.service"

request you use the below rules instead:

SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"

because ppc64 does not follow the standard CPU hot add framework where add/remove event is
ejected on CPU hot add/remove operation. It only ejects online/offline event to user space
when CPU is hot added/removed. This is because /sys/devices/system/cpu/cpuX nodes are present
for all "possible", irrespective of whether a CPU is hot-added/removed. So, a udev event only
for CPU hot add (online) is sufficient to pass on an updated DT blob for kdump kernel
to scan for CPUs...

Also, ran into a couple of other bugs (IBM bug 177451 & IBM bug 177452) while validating this
fix on bionic 4.15.0-48-generic and 4.18.0-18-generic.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Ok, in that case this seems like a verification-failed. @cascardo can you follow up on this SRU?

tags: added: verification-failed-bionic
removed: verification-needed-bionic
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Hari.

Is this fixed for the memory hotplug case? Can you verify that? Then, we could proceed with the SRU, and open a new bug to fix the CPU hotplug case, if there is one.

And what are those other issues/bugs you mention? Have they been mirrored yet to launchpad? Are they regressions against makedumpfile/kdump-tools on bionic? I know newer kernels may have regressed when crashing from the non-boot CPU, which should affect 4.18 kernels and later, but not 4.15.

Thank you.
Cascardo.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-10 06:43 EDT-------
(In reply to comment #53)
> Hi, Hari.

Hi Cascardo,
> Is this fixed for the memory hotplug case? Can you verify that? Then, we
> could proceed with the SRU, and open a new bug to fix the CPU hotplug case,
> if there is one.

It does work for memory hotplug case. Sure.
Let me follow-up on the CPU hotplug issue with a separate bug..

> And what are those other issues/bugs you mention? Have they been mirrored
> yet to launchpad? Are they regressions against makedumpfile/kdump-tools on

No regression in the fix packages but issues observed while trying to validate them.
Pursuing them in separate bugs. Bug 177452 (mirrored as LP bug 1828187)
and Bug 177451 (being screened internally.. will let you know the LP bug # once it
is mirrored) Sorry if that confused you..

tags: added: verification-done verification-done-bionic verification-done-cosmic
removed: verification-failed-bionic verification-needed verification-needed-cosmic
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-10 07:10 EDT-------
(In reply to comment #53)
[...]
> could proceed with the SRU, and open a new bug to fix the CPU hotplug case,
> if there is one.

Raised bug 177551 to follow-up on the problem in CPU hot-add case which
is currently being screened internally. Will update the launchpad bug number
once we have it..

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Hari.

Thanks a lot for the updates. No problem about the confusion. I will mark this one as verified, then, and follow up with the other bugs.

Thank you very much.
Cascardo.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-10 11:32 EDT-------
(In reply to comment #55)
> (In reply to comment #53)
> [...]
> > could proceed with the SRU, and open a new bug to fix the CPU hotplug case,
> > if there is one.
>
> Raised bug 177551 to follow-up on the problem in CPU hot-add case which

Bug 1828596 is the corresponding launchpad bug..

> is currently being screened internally. Will update the launchpad bug number
> once we have it..

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-10 11:40 EDT-------
(In reply to comment #53)
[...]

> bionic? I know newer kernels may have regressed when crashing from the
> non-boot CPU, which should affect 4.18 kernels and later, but not 4.15.

Raised launchpad bug 1828597 to follow-up on this..

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu1~18.10.1

---------------
makedumpfile (1:1.6.5-1ubuntu1~18.10.1) cosmic; urgency=low

  * Backport back to cosmic. (LP: #1655280) (LP: #1790788)

makedumpfile (1:1.6.5-1ubuntu1) disco; urgency=low

  [ Ubuntu Merge-o-Matic ]
  * Merge from Debian unstable. Remaining changes:
    - Bump amd64 crashkernel from 384M-:128M to 512M-:192M.

  [ Thadeu Lima de Souza Cascardo ]
  * Use a different service for vmcore dump. (LP: #1811692)

makedumpfile (1:1.6.5-1) unstable; urgency=medium

  * Update to new upstream version 1.6.5.
  * debian: remove debian/source/local-options
  * [i18n] Move PT debconf translation (Closes: #910465)

makedumpfile (1:1.6.4-3) unstable; urgency=medium

  * Reload kdump after memory/CPU hotplug. (LP: #1655280)
  * Fix adding crashkernel to zipl.conf when no quotation mark is used.
    (LP: #1790788)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Thu, 07 Feb 2019 09:22:23 -0200

Changed in makedumpfile (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for makedumpfile has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu1~18.04.1

---------------
makedumpfile (1:1.6.5-1ubuntu1~18.04.1) bionic; urgency=low

  * Backport back to bionic. (LP: #1655280) (LP: #1790788)

makedumpfile (1:1.6.5-1ubuntu1) disco; urgency=low

  [ Ubuntu Merge-o-Matic ]
  * Merge from Debian unstable. Remaining changes:
    - Bump amd64 crashkernel from 384M-:128M to 512M-:192M.

  [ Thadeu Lima de Souza Cascardo ]
  * Use a different service for vmcore dump. (LP: #1811692)

makedumpfile (1:1.6.5-1) unstable; urgency=medium

  * Update to new upstream version 1.6.5.
  * debian: remove debian/source/local-options
  * [i18n] Move PT debconf translation (Closes: #910465)

makedumpfile (1:1.6.4-3) unstable; urgency=medium

  * Reload kdump after memory/CPU hotplug. (LP: #1655280)
  * Fix adding crashkernel to zipl.conf when no quotation mark is used.
    (LP: #1790788)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Thu, 07 Feb 2019 09:22:23 -0200

Changed in makedumpfile (Ubuntu Bionic):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Issue is now "Fix Released" everywhere except Xenial. Should this now be backported to Xenial or would that incur too much regression risk?

Changed in ubuntu-power-systems:
status: Fix Committed → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-05-20 13:34 EDT-------
(In reply to comment #61)
> Issue is now "Fix Released" everywhere except Xenial. Should this now be
> backported to Xenial or would that incur too much regression risk?

No regressions expected but would be better off taking this along with
the fix for follow-up LP bug 1828596

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Well, there was a regression when I first uploaded the fix and we noticed that during dump, if there was a hotplug event, it would kill makedumpfile before it completed. When restarting, makedumpfile would find the same file already there, which would prevent the new dump to even start, so the system would reboot with an incomplete dump. That was reported and fixed as LP bug 1811692.

So, I agree that a backport to xenial should already include the fix to bug 1828596, but these changes are not without their regression potentials. The good thing is that we caught this with our current testing, so I would say our testing so far has helped us prevent a big regression.

Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This has been blocked because we need to get changes on the development release first, and recent changes on testing has caused ppc64el testing failures. Those changes allowed tests to be run on VMs with more memory, which allow the test to run instead of being skipped. So now we are detecting a failure, though I couldn't be able to reproduce it so far, which is what is blocking progress here.

I will try other approaches to try to reproduce and investigate.

Cascardo.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Backport to xenial attached at https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1828596/comments/26. Waiting for sponsorship now. Should get in -proposed by next week.

Cascardo.

Frank Heimes (fheimes)
Changed in makedumpfile (Ubuntu Xenial):
status: New → In Progress
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This is going to be fixed on xenial once we fix #1828596 as well.

Revision history for this message
Frank Heimes (fheimes) wrote :

Changing to Incomplete until LP 1828596 is fixed.

Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.3-2~16.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
removed: verification-done
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-01-16 11:01 EDT-------
Issue resolved on xenial with kdump-tools version
1:1.6.3-2~16.04.2

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Incomplete → Fix Committed
Mathew Hodson (mhodson)
no longer affects: linux (Ubuntu)
no longer affects: linux (Ubuntu Xenial)
no longer affects: linux (Ubuntu Bionic)
no longer affects: linux (Ubuntu Cosmic)
no longer affects: linux (Ubuntu Disco)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.3-2~16.04.2

---------------
makedumpfile (1:1.6.3-2~16.04.2) xenial; urgency=medium

  * Let the kernel decide the crashkernel offset for ppc64el (LP: #1741860)
  * Reload kdump after memory/CPU hotplug. (LP: #1655280)
  * Use a different service for vmcore dump. (LP: #1811692)
  * Reload kdump when CPU is brought online. (LP: #1828596)
  * Add a reload command. (LP: #1828596)
  * kdump-config: implement try-reload (LP: #1828596)
  * udev: hotplug: use try-reload (LP: #1828596)
  * Use reset_devices as a cmdline parameter. (LP: #1800566)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Wed, 18 Dec 2019 16:06:16 -0300

Changed in makedumpfile (Ubuntu Xenial):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Mathew Hodson (mhodson)
Changed in makedumpfile (Ubuntu Xenial):
importance: Undecided → High
Changed in makedumpfile (Ubuntu Bionic):
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.