[Ubuntu 16.04.4] Unable to analyze the vmcore generated by kdump on 4.13.0-26-generic kernel

Bug #1746088 reported by bugproxy on 2018-01-29
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
crash (Ubuntu)
Critical
Thadeu Lima de Souza Cascardo
Xenial
Critical
Canonical Kernel Team
Artful
Critical
Canonical Kernel Team
Bionic
Critical
Thadeu Lima de Souza Cascardo

Bug Description

---Problem Description---
Unable to analyze the vmcore generated by kdump on 4.13.0-26-generic kernel (Ubuntu 16.04.4)

---uname output---
Linux ltc-briggs1 4.13.0-26-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 21:40:36 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = 8001-22C

---Steps to Reproduce---
 This bug follow up bug of https://bugzilla.linux.ibm.com/show_bug.cgi?id=163565
The steps to create dump is as follows

Once you generate the kdump
use crash to analyze the vmcore and we get this error

================console logs ==========

root@ltc-briggs1:/var/crash/201801150227# ls
dmesg.201801150227 vmcore.201801150227
.0-26-generic vmcore.2018011502271150227# crash /usr/lib/debug/boot/vmlinux-4.13.

crash 7.1.4
Copyright (C) 2002-2015 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu"...

please wait... (gathering module symbol data)
WARNING: cannot access vmalloc'd module memory

crash: invalid structure member offset: thread_info_task
       FILE: task.c LINE: 598 FUNCTION: irqstacks_init()

[/usr/bin/crash] error trace: 1008ade0 => 1011552c => 1017d220 => 100833e0

  100833e0: (undetermined)
  1017d220: OFFSET_verify+80
  1011552c: task_init+5084
  1008ade0: main_loop+336

== Comment from Hari Krishna Bathini ==

There are quite a few commits (all available upstream) that are needed for
crash tool to work fine. I think the right thing to do here would be to use
the latest crash tool version 7.2.0 to go with the kernel update. Also, the
below commit would be needed on top of 7.2.0 crash utility:

  commit c8178eca9c74f81a7f803a58d339635cc152e8d9
  Author: Dave Anderson <email address hidden>
  Date: Thu Nov 9 11:39:05 2017 -0500

    Update for support of Linux 4.14 and later PPC64 kernels where the
    hash page table geometry accomodates a larger virtual address range.
    Without the patch, the virtual-to-physical translation of user space
    virtual addresses by "vm -p", "vtop", and "rd -u" may generate an
    invalid translation or otherwise fail.
    (<email address hidden>)

Similar thing holds true for makedumpfile tool..

Thanks
Hari

bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-163583 severity-critical targetmilestone-inin16044
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → crash (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
Manoj Iyer (manjo) on 2018-02-05
Changed in crash (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
Changed in ubuntu-power-systems:
status: New → Triaged
Changed in crash (Ubuntu):
importance: Undecided → Critical

------- Comment From <email address hidden> 2018-02-06 07:41 EDT-------
Please suggest if the required packages have been picked up into latest kernel?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-13 01:26 EDT-------
Reminder 2: Please advice if the fix have been picked up?

I am working on getting this on bionic, and will work to get this to xenial as well.

Cascardo.

Changed in crash (Ubuntu):
status: New → In Progress
assignee: Canonical Kernel Team (canonical-kernel-team) → Thadeu Lima de Souza Cascardo (cascardo)
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-21 04:30 EDT-------
On Ubuntu 1804 Bionic, with 4.15.0-10 kernel. crash analysis failed. I think the fixes are yet to pushed to bionic

# crash /share/10.10.10.43-201802210423/dump.201802210423 /usr/lib/debug/boot/vmlinux-4.15.0-10-generic

crash 7.2.0
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-unknown-linux-gnu"...

please wait... (gathering task table data)
crash: cannot resolve "init_task_union"

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-05 08:31 EDT-------
Are we going to get the fixes for 16.04.4 ?

We are working on getting those fixes to 16.04, on the -updates repository. But it's not part of 16.04.4.

Cascardo.

By the way, it's already available on bionic, you need to update the system.

Cascardo.

Changed in crash (Ubuntu Bionic):
status: In Progress → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-05 09:46 EDT-------
Hi Cascardo,

(In reply to comment #19)
> We are working on getting those fixes to 16.04, on the -updates repository.
> But it's not part of 16.04.4.

I am not sure I understood the following statement. Do you mean that the fix will be pushed to -updates archive after the 16.04.4, thus, it will not make the 16.04.4 cut date?

If that is the case, that might cause some impacts, because we are making a release that contains some known regressions. Should we document it somewhere?

Thank you,
Breno

Manoj Iyer (manjo) on 2018-03-05
Changed in crash (Ubuntu Xenial):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in crash (Ubuntu Artful):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in crash (Ubuntu Xenial):
importance: Undecided → Critical
Changed in crash (Ubuntu Artful):
importance: Undecided → Critical
Changed in crash (Ubuntu Artful):
status: New → In Progress
tags: added: ppc64el-kdump

Default Comment by Bridge

Andrew Cloke (andrew-cloke) wrote :

The first area of investigation is to see if we can bump the versions of kexec and makedumpfile for Xenial. Thadeu is investigating.

>------- Comment From <email address hidden> 2018-03-05 09:46 EDT-------
>Hi Cascardo,
>
>(In reply to comment #19)
>> We are working on getting those fixes to 16.04, on the -updates repository.
>> But it's not part of 16.04.4.
>
>I am not sure I understood the following statement. Do you mean that the fix will be pushed to >-updates archive after the 16.04.4, thus, it will not make the 16.04.4 cut date?
>
>If that is the case, that might cause some impacts, because we are making a release that contains >some known regressions. Should we document it somewhere?
>
>Thank you,
>Breno

Hi, Breno.

From my testing on xenial and hwe kernel, kexec still works and makedumpfile will fallback to cp. Then, one can copy that dump to a bionic system and use crash from bionic to analyze it. Can you test that such a produced dump can be analyzed by crash on bionic? I want to make sure we are still producing valid dumps on most cases, or identify when dumps are not valid. Some of the hwe kernels with kpti backports are not dumpable, but the next artful and hwe release, to be in -proposed by next week, should have that fixed in the kernel side.

I am still investigating the possibility of getting all this into a better situation, but I want to make sure that the current situation is not that bad as you are saying.

Thanks.
Cascardo.

@breno re: point release meaning vs updates

Please note point releases refer to installation media only, which is a consistent snapshot of cloud images, container images, server ISO, d-i, etc. Built from xenial-updates & xenial-security pockets as published at the time of the respin. xenial-updates/-security themselves are a moving target.

It's impossible to ship something in "16.04.2" today, as there is no package archive that refers to "16.04.2". One simply continuously ships updates in xenial-updates.

Thus the point of point-releases installation media, is to have installers available with a newer kernel which hopefully supports more hardware, and newer packages making post-install security/updates upgrades quicker & smaller, as one does not need to catch up 2 years+ of updates, just a few weeks of updates.

This issue does not appear to impact the installation media / installation success, therefore it is not critical to ship in installation media as respun for a point release. Hence it is targetted to ship simply as any other SRU in xenial-updates.

Once the solution to this bug is found that is.

tags: added: triage-a
removed: triage-g
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Manoj Iyer (manjo) on 2018-06-11
tags: added: triage-g
removed: triage-a

I have been working on the backport for this to xenial, but in order to do so, we are taking care of some build failures on the latest version that we are going to add to cosmic. I'll update when there is more progress on that.

Regards.
Cascardo.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers