ISST-LTE:pVM:roselp4:ubuntu 16.04: cp: error reading '/proc/vmcore': Bad address when trying to dump vmcore

Bug #1655280 reported by bugproxy on 2017-01-10
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Kernel Team
linux (Ubuntu)
Status tracked in Disco
Xenial
Undecided
Unassigned
Bionic
Undecided
Unassigned
Cosmic
High
Thadeu Lima de Souza Cascardo
Disco
High
Thadeu Lima de Souza Cascardo
makedumpfile (Ubuntu)
Status tracked in Disco
Xenial
Undecided
Unassigned
Bionic
Undecided
Unassigned
Cosmic
High
Unassigned
Disco
High
Unassigned

Bug Description

== Comment: #0 - Ping Tian Han <email address hidden> - 2017-01-09 02:51:00 ==
---Problem Description---
Vmcore cannot be saved when triggering bug 150353 on roselp4:

Copying data : [ 2.0 %] \/usr/sbin/kdump-config: line 591: 5502 Bus error makedumpfile $MAKEDUMP_ARGS $vmcore_file $KDUMP_CORETEMP
[ 512.833872] kdump-tools[5450]: * kdump-tools: makedumpfile failed, falling back to 'cp'
[ 573.595449] kdump-tools[5450]: cp: error reading '/proc/vmcore': Bad address
[ 573.605717] kdump-tools[5450]: * kdump-tools: failed to save vmcore in /var/crash/201701090223
[ 573.765417] kdump-tools[5450]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/201701090223/dmesg.201701090223
[ 574.285506] kdump-tools[5450]: The kernel version is not supported.
[ 574.285672] kdump-tools[5450]: The makedumpfile operation may be incomplete.
[ 574.285767] kdump-tools[5450]: The dmesg log is saved to /var/crash/201701090223/dmesg.201701090223.
[ 574.305422] kdump-tools[5450]: makedumpfile Completed.
[ 574.315363] kdump-tools[5450]: * kdump-tools: saved dmesg content in /var/crash/201701090223
[ 574.615688] kdump-tools[5450]: Mon, 09 Jan 2017 02:24:26 -0600
[ 574.705384] kdump-tools[5450]: Rebooting.
         Stopping ifup for ib0...
[ OK ] Stopped ifup for ib0.
[ 1008.579897] reboot: Restarting system

Contact Information = Ping Tian <email address hidden> Carrie <email address hidden>

---uname output---
Linux roselp4 4.8.0-34-generic #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = lpar

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. config kdump on roselp4
2. try to trigger bug 150353

*Additional Instructions for Ping Tian <email address hidden> Carrie <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.

== Comment: #3 - Brahadambal Srinivasan <email address hidden> - 2017-01-10 02:42:25 ==

root@roselp4:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinux-4.8.0-34-generic root=UUID=0bcf3431-df8b-499c-9a13-33070f242e0c ro splash quiet crashkernel=384M-:512M

root@roselp4:~# dmesg | grep Reser
[ 0.000000] Reserving 512MB of memory at 128MB for crashkernel (System RAM: 21760MB)

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-150355 severity-high targetmilestone-inin---
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)

------- Comment From <email address hidden> 2017-01-10 12:59 EDT-------
Assigning to Hari for some assistance but would like to know if this issue can be recreated consistently and if so how and does this occur on other LPARs running the same level?

tags: added: targetmilestone-inin16042
removed: targetmilestone-inin---
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-01-10 20:20 EDT-------
(In reply to comment #7)
> Assigning to Hari for some assistance but would like to know if this issue
> can be recreated consistently and if so how and does this occur on other
> LPARs running the same level?

Looks like this bug only can be reproduced on roselp4 when triggering bug 150353. But looks like it is quite easy to trigger it when testing dlpar.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-01-23 01:45 EDT-------
Hi Canonical,

After a DLPAR operation, kdump-tools.service must be restarted for kdump/fadump
to work properly..

Thanks
Hari

Steve, could your team please take a look ?

Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Steve Langasek (vorlon)
importance: Undecided → High
Steve Langasek (vorlon) on 2017-01-23
Changed in linux (Ubuntu):
assignee: Steve Langasek (vorlon) → Canonical Kernel Team (canonical-kernel-team)
Manoj Iyer (manjo) on 2017-06-01
tags: added: ubuntu-16.04

------- Comment (attachment only) From <email address hidden> 2017-07-14 03:58 EDT-------

------- Comment From <email address hidden> 2017-07-14 05:02 EDT-------
Issue is observed even on 16.04.03. When crash is triggered after memory remove operation.

Thanks,
Pavithra

Manoj Iyer (manjo) on 2017-07-19
Changed in linux (Ubuntu):
importance: High → Medium
importance: Medium → High
Changed in ubuntu-power-systems:
importance: Undecided → High
Manoj Iyer (manjo) on 2017-08-07
tags: added: triage-r
Manoj Iyer (manjo) on 2017-08-14
Changed in ubuntu-power-systems:
status: New → Triaged
tags: added: kernel-da-key

Hi, hbathini.

Why is it needed to restart kdump-tools after a DLPAR? Does it refer to a memory hotplug operation? What is LTC bug #150353? Is it mirrored to launchpad already?

Thank you very much.
Cascardo.

Changed in linux (Ubuntu):
status: New → In Progress
assignee: Canonical Kernel Team (canonical-kernel-team) → Thadeu Lima de Souza Cascardo (cascardo)

------- Comment From <email address hidden> 2017-08-22 10:32 EDT-------
(In reply to comment #20)
> Hi, hbathini.
>

Hello Cascardo,

> Why is it needed to restart kdump-tools after a DLPAR? Does it refer to a

Yes. It refers to either of memory or CPU hot add/remove operations, when
kdump kernel needs to be reloaded - to rebuild the vmcore elf notes.
> memory hotplug operation? What is LTC bug #150353? Is it mirrored to
> launchpad already?

Yes, it is mirrored. LTC Bug 150353 refers to Launchpad bug 1658968.

Thanks
Hari

Manoj Iyer (manjo) on 2017-08-28
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
status: Triaged → In Progress
tags: added: triage-g
removed: triage-r

Hi, Hari.

My understanding is that the kernel should handle that. Otherwise, whenever the user manually loads a different kdump kernel, a memory hotplug would cause the default kdump kernel to be loaded, not what the user has loaded.

Can you comment on that?

Regards.
Cascardo.

------- Comment From <email address hidden> 2017-08-31 04:54 EDT-------
(In reply to comment #22)
> Hi, Hari.

Hello Cascardo,

>
> My understanding is that the kernel should handle that. Otherwise, whenever

Sadly, it can't as kexec-tools is the one that creates elf headers for /proc/vmcore.
So, with a change in available CPUs/memory, kdump kernel needs to be reloaded.
Probably, with a udev event that does a try-restart of kdump-tool service on
CPU/Memory hot add/remove operation.

> the user manually loads a different kdump kernel, a memory hotplug would
> cause the default kdump kernel to be loaded, not what the user has loaded.

Can't do much about it. If need be, a user has the option to workaround this
problem by adjusting the settings in /etc/default/kdump-tools file to load
a different kdump kernel..

Thanks
Hari

Manoj Iyer (manjo) on 2017-11-06
tags: added: triage-r
removed: triage-g
tags: added: triage-a
removed: triage-r
tags: added: ppc64el-kdump
Manoj Iyer (manjo) wrote :

IBM, could you please confirm this bug is already fixed using the workaround mentioned here and in bug 1664545

summary: - ISST-LTE:pVM:roselp4:ubuntu 16.04.2: cp: error reading '/proc/vmcore':
- Bad address when trying to dump vmcore
+ ISST-LTE:pVM:roselp4:ubuntu 16.04: cp: error reading '/proc/vmcore': Bad
+ address when trying to dump vmcore
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
tags: added: triage-g
removed: triage-a

Default Comment by Bridge

Default Comment by Bridge

------- Comment (attachment only) From <email address hidden> 2017-07-14 03:58 EDT-------

So, the workaround would be reloading the dump after DLPAR. In order to do that, run:

kdump-config unload ; kdump-config load

Regards.
Cascardo.

It's not clear what the logs in comments #16 and #17 refer to. Could you please clarify?

Thanks.
Cascardo.

tags: removed: kernel-da-key

------- Comment From <email address hidden> 2018-03-08 14:01 EDT-------
(In reply to comment #29)
> So, the workaround would be reloading the dump after DLPAR. In order to do
> that, run:
>
> kdump-config unload ; kdump-config load
>

The "console log" attachment is basically the error seen when the above workaround
is not used.

>
> It's not clear what the logs in comments #16 and #17 refer to. Could you
> please clarify?

The other attachment is the sosreport on the failed system..

Thanks
Hari

Andrew Cloke (andrew-cloke) wrote :

Could you please comment as to whether the workaround resolved the issue?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-21 09:04 EDT-------
(In reply to comment #31)
> Could you please comment as to whether the workaround resolved the issue?

Yes, the workaround resolves the issue. But could we have a udev rule added in kdump-tools
to trigger kdump-tools.service restart after a DLPAR memory/CPU add/remove operation
instead of manually restarting kdump-tools service after a DLPAR operation..

Thanks
Hari

Changed in ubuntu-power-systems:
status: Incomplete → Triaged
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-22 13:18 EDT-------
(In reply to comment #32)
> (In reply to comment #31)
> > Could you please comment as to whether the workaround resolved the issue?
>
> Yes, the workaround resolves the issue. But could we have a udev rule added
> in kdump-tools
> to trigger kdump-tools.service restart after a DLPAR memory/CPU add/remove
> operation
> instead of manually restarting kdump-tools service after a DLPAR operation..

Basically, a udev rule file that looks like this:

SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl try-restart kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl try-restart kdump-tools.service"

put in "/lib/udev/rules.d" dir as part of the kdump-tools package to avoid the need to manually
reload kdump..

Thanks
Hari

Changed in ubuntu-power-systems:
status: Triaged → In Progress
tags: added: triage-r
removed: triage-g

Can you try the package in ppa:cascardo/ppa?

Thanks.
Cascardo.

Changed in ubuntu-power-systems:
status: In Progress → Incomplete
bugproxy (bugproxy) wrote :
Download full text (4.9 KiB)

------- Comment From <email address hidden> 2018-08-11 08:48 EDT-------
After upgrading to kdump-tools_1.6.4-1~16.04.0cascardo2_ppc64el.deb package,
dump capture succeeds without any complaints:

The udev rules trigger kdump-tools service reload as can be seen below:
--
root@ubuntu:~# dpkg -l | grep kdump-tools
ii kdump-tools 1:1.6.4-1~16.04.0cascardo2 ppc64el scripts and tools for automating kdump (Linux crash dumps)
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~# systemctl status kdump-tools.service -l
? kdump-tools.service - Kernel crash dump capture service
Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled; vendor pres
Active: active (exited) since Sat 2018-08-11 08:33:27 EDT; 2min 8s ago
Process: 25166 ExecStop=/etc/init.d/kdump-tools stop (code=exited, status=0/SU
Process: 25200 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0/
Main PID: 25200 (code=exited, status=0/SUCCESS)

Aug 11 08:33:26 ubuntu systemd[1]: Starting Kernel crash dump capture service...
Aug 11 08:33:26 ubuntu kdump-tools[25200]: Starting kdump-tools: * Creating sym
Aug 11 08:33:26 ubuntu kdump-tools[25200]: * Creating symlink /var/lib/kdump/in
Aug 11 08:33:26 ubuntu kdump-tools[25200]: Modified cmdline:BOOT_IMAGE=/boot/vml
Aug 11 08:33:26 ubuntu kdump-tools[25200]: * loaded kdump kernel
Aug 11 08:33:26 ubuntu kdump-tools[25248]: /sbin/kexec -p --command-line="BOOT_I
Aug 11 08:33:27 ubuntu kdump-tools[25249]: loaded kdump kernel
Aug 11 08:33:27 ubuntu systemd[1]: Started Kernel crash dump capture service.
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~# drmgr -c mem -r -q 1
Validating Memory DLPAR capability...yes.
[1459216.723209] pseries-hotplug-mem: Attempting to hot-remove 1 LMB(s)
[1459216.749014] Offlined Pages 4096
[1459193.925176] kdump-tools[25419]: Stopping kdump-tools: * unloaded kdump kernel
[1459193.985143] kdump-tools[25453]: Starting kdump-tools: * Creating symlink /var/lib/kdump/vmlinuz
[1459193.986120] kdump-tools[25453]: * Creating symlink /var/lib/kdump/initrd.img
[1459194.024961] kdump-tools[25453]: Modified cmdline:BOOT_IMAGE=/boot/vmlinux-4.15.0-24-generic root=UUID=1aa9458c-3974-4cb4-9ab3-9ee03c0f4e5e ro xmon=on nr_cpus=1 systemd.unit=kdump-tools.service irqpoll noirqdistrib nousb elfcorehdr=158144K
[1459217.005820] pseries-hotplug-mem: Memory at 40000000 was hot-removed
DR_TOTAL_RESOURCES=1
root@ubuntu:~# [1459194.297073] kdump-tools[25453]: * loaded kdump kernel

root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~# systemctl status kdump-tools.service -l
? kdump-tools.service - Kernel crash dump capture service
Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled; vendor pres
Active: active (exited) since Sat 2018-08-11 08:35:47 EDT; 12s ago
Process: 25419 ExecStop=/etc/init.d/kdump-tools stop (code=exited, status=0/SU
Process: 25453 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0/
Main PID: 25453 (code=exited, status=0/SUCCESS)

Aug 11 08:35:47 ubuntu systemd[1]: Starting Kernel crash dump capture service...
Aug 11 08:35:47 ubuntu kdump-tools[25453]: Starting kdump-tools: * Creating sym
Aug 11...

Read more...

Changed in ubuntu-power-systems:
status: Incomplete → In Progress

This fix has been uploaded to Debian, and will be synced to 19.04 when it opens. After that, it will be backported to cosmic, bionic and xenial.

Thanks.
Cascardo.

Changed in linux (Ubuntu):
status: In Progress → Invalid
Changed in makedumpfile (Ubuntu):
status: New → In Progress
importance: Undecided → High

I am working on the sync/merge from Debian to Ubuntu.

This should be in disco soon.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu1

---------------
makedumpfile (1:1.6.5-1ubuntu1) disco; urgency=low

  [ Ubuntu Merge-o-Matic ]
  * Merge from Debian unstable. Remaining changes:
    - Bump amd64 crashkernel from 384M-:128M to 512M-:192M.

  [ Thadeu Lima de Souza Cascardo ]
  * Use a different service for vmcore dump. (LP: #1811692)

makedumpfile (1:1.6.5-1) unstable; urgency=medium

  * Update to new upstream version 1.6.5.
  * debian: remove debian/source/local-options
  * [i18n] Move PT debconf translation (Closes: #910465)

makedumpfile (1:1.6.4-3) unstable; urgency=medium

  * Reload kdump after memory/CPU hotplug. (LP: #1655280)
  * Fix adding crashkernel to zipl.conf when no quotation mark is used.
    (LP: #1790788)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Mon, 14 Jan 2019 15:42:44 -0200

Changed in makedumpfile (Ubuntu Disco):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers