[Ubuntu 16.04.4] Unable to analyze the vmcore generated by kdump on 4.13.0-26-generic kernel

Bug #1746088 reported by bugproxy on 2018-01-29
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
crash (Ubuntu)
Critical
Thadeu Lima de Souza Cascardo
Xenial
Critical
Canonical Kernel Team
Artful
Critical
Canonical Kernel Team
Bionic
Critical
Thadeu Lima de Souza Cascardo

Bug Description

[Impact]
It won't be possible to analyze dumps produced by newer kernels (hwe on xenial, for example).

[Test Case]
Tested that this version of crash can analyze both GA (4.4) and hwe (4.15) kernels.

[Regression Potential]
New crash versions may have bugs and some commands not work with older kernels. The smoke test helps a little, but more testing may be desirable.

---Problem Description---
Unable to analyze the vmcore generated by kdump on 4.13.0-26-generic kernel (Ubuntu 16.04.4)

---uname output---
Linux ltc-briggs1 4.13.0-26-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 21:40:36 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = 8001-22C

---Steps to Reproduce---
 This bug follow up bug of https://bugzilla.linux.ibm.com/show_bug.cgi?id=163565
The steps to create dump is as follows

Once you generate the kdump
use crash to analyze the vmcore and we get this error

================console logs ==========

root@ltc-briggs1:/var/crash/201801150227# ls
dmesg.201801150227 vmcore.201801150227
.0-26-generic vmcore.2018011502271150227# crash /usr/lib/debug/boot/vmlinux-4.13.

crash 7.1.4
Copyright (C) 2002-2015 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu"...

please wait... (gathering module symbol data)
WARNING: cannot access vmalloc'd module memory

crash: invalid structure member offset: thread_info_task
       FILE: task.c LINE: 598 FUNCTION: irqstacks_init()

[/usr/bin/crash] error trace: 1008ade0 => 1011552c => 1017d220 => 100833e0

  100833e0: (undetermined)
  1017d220: OFFSET_verify+80
  1011552c: task_init+5084
  1008ade0: main_loop+336

== Comment from Hari Krishna Bathini ==

There are quite a few commits (all available upstream) that are needed for
crash tool to work fine. I think the right thing to do here would be to use
the latest crash tool version 7.2.0 to go with the kernel update. Also, the
below commit would be needed on top of 7.2.0 crash utility:

  commit c8178eca9c74f81a7f803a58d339635cc152e8d9
  Author: Dave Anderson <email address hidden>
  Date: Thu Nov 9 11:39:05 2017 -0500

    Update for support of Linux 4.14 and later PPC64 kernels where the
    hash page table geometry accomodates a larger virtual address range.
    Without the patch, the virtual-to-physical translation of user space
    virtual addresses by "vm -p", "vtop", and "rd -u" may generate an
    invalid translation or otherwise fail.
    (<email address hidden>)

Similar thing holds true for makedumpfile tool..

Thanks
Hari

bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-163583 severity-critical targetmilestone-inin16044
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → crash (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
Manoj Iyer (manjo) on 2018-02-05
Changed in crash (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
Changed in ubuntu-power-systems:
status: New → Triaged
Changed in crash (Ubuntu):
importance: Undecided → Critical

------- Comment From <email address hidden> 2018-02-06 07:41 EDT-------
Please suggest if the required packages have been picked up into latest kernel?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-13 01:26 EDT-------
Reminder 2: Please advice if the fix have been picked up?

I am working on getting this on bionic, and will work to get this to xenial as well.

Cascardo.

Changed in crash (Ubuntu):
status: New → In Progress
assignee: Canonical Kernel Team (canonical-kernel-team) → Thadeu Lima de Souza Cascardo (cascardo)
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-21 04:30 EDT-------
On Ubuntu 1804 Bionic, with 4.15.0-10 kernel. crash analysis failed. I think the fixes are yet to pushed to bionic

# crash /share/10.10.10.43-201802210423/dump.201802210423 /usr/lib/debug/boot/vmlinux-4.15.0-10-generic

crash 7.2.0
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu"...

please wait... (gathering task table data)
crash: cannot resolve "init_task_union"

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-05 08:31 EDT-------
Are we going to get the fixes for 16.04.4 ?

We are working on getting those fixes to 16.04, on the -updates repository. But it's not part of 16.04.4.

Cascardo.

By the way, it's already available on bionic, you need to update the system.

Cascardo.

Changed in crash (Ubuntu Bionic):
status: In Progress → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-05 09:46 EDT-------
Hi Cascardo,

(In reply to comment #19)
> We are working on getting those fixes to 16.04, on the -updates repository.
> But it's not part of 16.04.4.

I am not sure I understood the following statement. Do you mean that the fix will be pushed to -updates archive after the 16.04.4, thus, it will not make the 16.04.4 cut date?

If that is the case, that might cause some impacts, because we are making a release that contains some known regressions. Should we document it somewhere?

Thank you,
Breno

Manoj Iyer (manjo) on 2018-03-05
Changed in crash (Ubuntu Xenial):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in crash (Ubuntu Artful):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in crash (Ubuntu Xenial):
importance: Undecided → Critical
Changed in crash (Ubuntu Artful):
importance: Undecided → Critical
Changed in crash (Ubuntu Artful):
status: New → In Progress
tags: added: ppc64el-kdump

Default Comment by Bridge

Andrew Cloke (andrew-cloke) wrote :

The first area of investigation is to see if we can bump the versions of kexec and makedumpfile for Xenial. Thadeu is investigating.

>------- Comment From <email address hidden> 2018-03-05 09:46 EDT-------
>Hi Cascardo,
>
>(In reply to comment #19)
>> We are working on getting those fixes to 16.04, on the -updates repository.
>> But it's not part of 16.04.4.
>
>I am not sure I understood the following statement. Do you mean that the fix will be pushed to >-updates archive after the 16.04.4, thus, it will not make the 16.04.4 cut date?
>
>If that is the case, that might cause some impacts, because we are making a release that contains >some known regressions. Should we document it somewhere?
>
>Thank you,
>Breno

Hi, Breno.

From my testing on xenial and hwe kernel, kexec still works and makedumpfile will fallback to cp. Then, one can copy that dump to a bionic system and use crash from bionic to analyze it. Can you test that such a produced dump can be analyzed by crash on bionic? I want to make sure we are still producing valid dumps on most cases, or identify when dumps are not valid. Some of the hwe kernels with kpti backports are not dumpable, but the next artful and hwe release, to be in -proposed by next week, should have that fixed in the kernel side.

I am still investigating the possibility of getting all this into a better situation, but I want to make sure that the current situation is not that bad as you are saying.

Thanks.
Cascardo.

Dimitri John Ledkov (xnox) wrote :

@breno re: point release meaning vs updates

Please note point releases refer to installation media only, which is a consistent snapshot of cloud images, container images, server ISO, d-i, etc. Built from xenial-updates & xenial-security pockets as published at the time of the respin. xenial-updates/-security themselves are a moving target.

It's impossible to ship something in "16.04.2" today, as there is no package archive that refers to "16.04.2". One simply continuously ships updates in xenial-updates.

Thus the point of point-releases installation media, is to have installers available with a newer kernel which hopefully supports more hardware, and newer packages making post-install security/updates upgrades quicker & smaller, as one does not need to catch up 2 years+ of updates, just a few weeks of updates.

This issue does not appear to impact the installation media / installation success, therefore it is not critical to ship in installation media as respun for a point release. Hence it is targetted to ship simply as any other SRU in xenial-updates.

Once the solution to this bug is found that is.

tags: added: triage-a
removed: triage-g
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Manoj Iyer (manjo) on 2018-06-11
tags: added: triage-g
removed: triage-a

I have been working on the backport for this to xenial, but in order to do so, we are taking care of some build failures on the latest version that we are going to add to cosmic. I'll update when there is more progress on that.

Regards.
Cascardo.

There is a package on ppa:cascardo/ppa that should fix the problem. Can you test it and report back?

Cascardo.

------- Comment From <email address hidden> 2018-07-05 01:57 EDT-------
(In reply to comment #30)
> There is a package on ppa:cascardo/ppa that should fix the problem. Can you
> test it and report back?
>
> Cascardo.

Hi Cascardo,

Issue not reproducible with crash-7.2.3+real-1

Thanks
Hari

description: updated
Manoj Iyer (manjo) on 2018-07-16
Changed in crash (Ubuntu Artful):
status: In Progress → Won't Fix

Hello bugproxy, or anyone else affected,

Accepted crash into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/crash/7.2.3+real-1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in crash (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed verification-needed-xenial
tags: added: triage-a
removed: triage-g verification-needed verification-needed-xenial
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
tags: added: triage-g
removed: triage-a

Hi, can you just verify this with xenial-proposed and mark it as verified?

Thank you very much.
Cascardo.

Download full text (3.3 KiB)

------- Comment From <email address hidden> 2018-08-07 07:01 EDT-------
we are able to analyse the kernel dump using crash on 4.4.0-131-generic kernel (Ubuntu 16.04.5). we can close this defect.

====console logs===
0-131-generic dump.201808070540 08070540# crash /usr/lib/debug/boot/vmlinux-4.4.0

crash 7.1.4
Copyright (C) 2002-2015 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu"...

KERNEL: /usr/lib/debug/boot/vmlinux-4.4.0-131-generic
DUMPFILE: dump.201808070540 [PARTIAL DUMP]
CPUS: 160
DATE: Tue Aug 7 05:39:33 2018
UPTIME: 00:05:29
LOAD AVERAGE: 0.10, 0.29, 0.18
TASKS: 1364
NODENAME: ltc-briggs1
RELEASE: 4.4.0-131-generic
VERSION: #157-Ubuntu SMP Thu Jul 12 15:47:54 UTC 2018
MACHINE: ppc64le (2926 Mhz)
MEMORY: 256 GB
PANIC: "sysrq: SysRq : Trigger a crash"
PID: 3257
COMMAND: "bash"
TASK: c000001f61bc93e0 [THREAD_INFO: c000001fa9f14000]
CPU: 14
STATE: TASK_RUNNING (SYSRQ)

crash> bt
PID: 3257 TASK: c000001f61bc93e0 CPU: 14 COMMAND: "bash"
#0 [c000001fa9f17680] crash_kexec at c0000000001776f4
#1 [c000001fa9f17820] die at c000000000020ed8
#2 [c000001fa9f178b0] bad_page_fault at c000000000051d98
#3 [c000001fa9f17920] handle_page_fault at c000000000008800
Data Access [300] exception frame:
R0: c00000000067d8a8 R1: c000001fa9f17c10 R2: c00000000160aa00
R3: 0000000000000063 R4: c000001ff4b89c50 R5: c000001ff4b9b4e0
R6: c000003fff010000 R7: 0000000000000573 R8: 0000000000000007
R9: 0000000000000001 R10: 0000000000000000 R11: c000003fff030208
R12: c00000000067c7a0 R13: c000000007ae8c00 R14: ffffffffffffffff
R15: 0000000022000000 R16: 0000000010170dd0 R17: 0000010015780298
R18: 0000000010140568 R19: 00000000100c7000 R20: 0000000000000000
R21: 000000001017dd78 R22: 0000000010140400 R23: 0000000000000000
R24: 00000000101532c0 R25: 000000001017b628 R26: c000000001549d18
R27: 0000000000000004 R28: c00000000154a0d8 R29: 0000000000000063
R30: c0000000015013bc R31: 0000000000000000
NIP: c00000000067c7d4 MSR: 9000000000009033 OR3: c000000000008498
CTR: c00000000067c7a0 LR: c00000000067d8a8 XER: 0000000020000000
CCR: 0000000028242222 MQ: 0000000000000001 DAR: 0000000000000000
D...

Read more...

This was about analyzing a 4.13 kernel on xenial, which meant a linux-hwe kernel, now on version 4.15.0-30.32~16.04.1.

Can you verify the fix with a linux-hwe kernel?

Thank you very much.
Cascardo.

bugproxy (bugproxy) wrote :
Download full text (5.1 KiB)

------- Comment From <email address hidden> 2018-09-04 05:12 EDT-------
crash tool version in proposed works fien with 4.15.0-33-generic kernel:

--
root@ltc-briggs1:/var/crash/201809040315# dpkg -l | grep crash
[...]
ii crash 7.2.3+real-1~16.04.1 ppc64el kernel debugging utility, allowing gdb like syntax
ii kdump-tools 1:1.6.3-2~16.04.1 ppc64el scripts and tools for automating kdump (Linux crash dumps)
[...]
root@ltc-briggs1:/var/crash/201809040315#
root@ltc-briggs1:/var/crash/201809040315#
root@ltc-briggs1:/var/crash/201809040315# uname -a
Linux ltc-briggs1 4.15.0-33-generic #36~16.04.1-Ubuntu SMP Wed Aug 15 17:18:19 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-briggs1:/var/crash/201809040315#
root@ltc-briggs1:/var/crash/201809040315#
root@ltc-briggs1:/var/crash/201809040315#
root@ltc-briggs1:/var/crash/201809040315# crash /usr/lib/debug/boot/vmlinux-4.15.0-33-generic dump.201809040315

crash 7.2.3
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu"...

KERNEL: /usr/lib/debug/boot/vmlinux-4.15.0-33-generic
DUMPFILE: dump.201809040315 [PARTIAL DUMP]
CPUS: 160
DATE: Tue Sep 4 03:15:08 2018
UPTIME: 00:04:15
LOAD AVERAGE: 0.90, 0.81, 0.33
TASKS: 1587
NODENAME: ltc-briggs1
RELEASE: 4.15.0-33-generic
VERSION: #36~16.04.1-Ubuntu SMP Wed Aug 15 17:18:19 UTC 2018
MACHINE: ppc64le (2926 Mhz)
MEMORY: 256 GB
PANIC: "sysrq: SysRq : Trigger a crash"
PID: 3665
COMMAND: "bash"
TASK: c000003c813a0580 [THREAD_INFO: c000003c81478000]
CPU: 2
STATE: TASK_RUNNING (SYSRQ)

crash> bt
PID: 3665 TASK: c000003c813a0580 CPU: 2 COMMAND: "bash"
#0 [c000003c8147b830] crash_kexec at c0000000001e4d34
#1 [c000003c8147b870] oops_end at c000000000026318
#2 [c000003c8147b8f0] bad_page_fault at c00000000006b424
#3 [c000003c8147b960] handle_page_fault at c00000000000a650
Data Access [300] exception frame:
R0: c0000000007c2d14 R1: c000003c8147bc50 R2: c0000000016baa00
R3: 0000000000000063 R4: c000001ff5c8ce18 R5: c000001ff5ca4368
R6: 9000000000009033 R7: 0000000000000012 R8: 0000000000000007
R9: 0000000000...

Read more...

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package crash - 7.2.3+real-1~16.04.1

---------------
crash (7.2.3+real-1~16.04.1) xenial; urgency=medium

  * Backport to xenial. LP: #1746088
    - Build-Depends on debhelper 9.

 -- Thadeu Lima de Souza Cascardo <email address hidden> Tue, 26 Jun 2018 14:32:30 -0300

Changed in crash (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for crash has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Steve Langasek (vorlon) wrote :

An audit of package versions across Ubuntu releases shows that crash is at a higher version number in xenial-updates than it is in bionic. If this was appropriate to SRU to xenial, then it also needs to be SRUed to bionic so that users don't see inconsistent behavior on bionic for upgrades vs. new installs.

Changed in crash (Ubuntu Bionic):
status: Fix Released → Triaged
Changed in ubuntu-power-systems:
status: Fix Released → Triaged

The package is waiting for SRU approval on the -proposed queue. As I have done more tests, I have asked Andy to look at it.

Cascardo.

Changed in crash (Ubuntu Bionic):
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers