ISST-LTE:pVM:roselp4:ubuntu 16.04.2: vmcore cannot be analysed by crash
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Released
|
High
|
Unassigned | ||
crash (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Xenial |
Fix Released
|
High
|
Unassigned | ||
makedumpfile (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Xenial |
Fix Released
|
High
|
Unassigned |
Bug Description
[SRU justification]
This fix is required to make the crash tool usable. It does also improve makedumpfile filtering of pages.
[Impact]
Kernel crashes cannot be analysed with the crash tool.
makedumpfile incorrectly filter pages.
[Fix]
Cherry-pick upstream commits fixing those issues.
[Test Case]
Running crash tool on a kernel crash file will display something like :
# crash -s usr/lib/
crash: read error: kernel virtual address: ffffffff81e29ff0 type: "pv_init_ops"
crash: this kernel may be configured with CONFIG_
renders /dev/mem unusable as a live memory source.
crash: trying /proc/kcore as an alternative to /dev/mem
crash: seek error: kernel virtual address: ffffffff81e29ff0 type: "pv_init_ops"
crash: seek error: kernel virtual address: ffffffff82166130 type: "shadow_timekeeper xtime_sec"
crash: seek error: kernel virtual address: ffffffff81e0d304 type: "init_uts_ns"
crash: usr/lib/
With the fix, the crash command will work as expected
Running the crash tool on a vmcore file produced by makedumpfile may return :
crash: page excluded: kernel virtual address: <> type: "fill_task_struct"
[Regression]
None expected as those modifications are part of the Zesty and upstream version.
The makedumpfile patches are in Yakkety and Zesty 1.6.0 & after
[Original description of the problem]
vmcore captured by kdump cannot be opened with crash:
% sudo crash -d1 /usr/lib/
... ...
base kernel version: 0.8.0
linux_banner:
????????
crash: /usr/lib/
Usage:
crash [OPTION]... NAMELIST MEMORY-
crash [OPTION]... [NAMELIST] (live system form)
Enter "crash -h" for details.
Looks like the 'linux_banner' cannot be understood by crash.
And when the vmcore was dumping, this message being showed:
[ 729.609196] kdump-tools[5192]: The kernel version is not supported.
[ 729.609447] kdump-tools[5192]: The makedumpfile operation may be incomplete.
---uname output---
Linux roselp4 4.8.0-34-generic #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = lpar
---Debugger---
A debugger is not configured
---Steps to Reproduce---
1. config kdump
2. trigger kdump
3. analyse vmcore with crash
Userspace tool common name: crash/makedumpfile
The userspace tool has the following bit modes: 64-bit
Userspace rpm: makedumpfile 1.5.9-5ubuntu0.
Userspace tool obtained from project website: na
*Additional Instructions for Ping Tian <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach ltrace and strace of userspace application.
xtime timespec.tv_sec: 586481e8: Wed Dec 28 21:24:24 2016
utsname:
sysname: Linux
nodename: boblp1
release: 4.8.0-32-generic
version: #34~16.04.1-Ubuntu SMP Tue Dec 13 17:01:57 UTC 2016
machine: ppc64le
domainname: (none)
base kernel version: 4.8.0
verify_namelist:
dumpfile /proc/version:
Linux version 4.8.0-32-generic (buildd@
/usr/lib/
Linux version 4.8.0-32-generic (buildd@
hypervisor: (undetermined)
crash: per_cpu_
ppc64_vmemmap_init: vmemmap base: f000000000000000
crash: PPC64: cannot find 'cpu_possible_map', 'cpu_present_map', 'cpu_online_map' or 'cpu_active_map' symbols
root@boblp1:
Linux boblp1 4.8.0-32-generic #34~16.04.1-Ubuntu SMP Tue Dec 13 17:01:57 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
root@boblp1:
1. Missing v4.8 support related patches in crash tool
commit 098cdab16dfa6a8
Author: Dave Anderson <email address hidden>
Date: Fri Feb 12 14:32:53 2016 -0500
Fix for the changes made to the kernel module structure introduced by
this kernel commit for Linux 4.5 and later kernels:
commit 8244062ef1e5450
modules: fix longstanding /proc/kallsyms vs module insertion race.
Without the patch, the crash session fails during initialization
with the error message: "crash: invalid structure member offset:
module_
(<email address hidden>)
commit 6f1f78e33474d00
Author: Dave Anderson <email address hidden>
Date: Wed Jan 20 09:56:36 2016 -0500
Fix for the changes made to the kernel module structure introduced by
this kernel commit for Linux 4.5 and later kernels:
commit 7523e4dc5057e15
module: use a structure to encapsulate layout.
Without the patch, the crash session fails during initialization
with the error message: "crash: invalid structure member offset:
module_
(<email address hidden>)
commit 1e92f9fad3a7e30
Author: Dave Anderson <email address hidden>
Date: Mon Feb 1 16:10:49 2016 -0500
Fix for the replacements made to the kernel's cpu_possible_mask,
cpu_
this kernel commit for Linux 4.5 and later kernels:
commit 5aec01b834fd6f8
kernel/cpu.c: eliminate cpu_*_mask
Without the patch, behavior is architecture-
whether the cpu mask values are used to calculate the number of cpus.
For example, ARM64 crash sessions fail during session initialization
with the error message "crash: zero-size memory allocation! (called
from <address>)", whereas X86_64 sessions come up normally, but
cpu mask values of zero are stored internally.
(<email address hidden>)
commit 182914debbb9a26
Author: Dave Anderson <email address hidden>
Date: Fri Sep 23 09:09:15 2016 -0400
With the introduction of radix MMU in Power ISA 3.0, there are
changes in kernel page table management accommodating it. This patch
series makes appropriate changes here to work for such kernels.
Also, this series fixes a few bugs along the way:
ppc64: fix vtop page translation for 4K pages
ppc64: Use kernel terminology for each level in 4-level page table
ppc64/book3s: address changes in kernel v4.5
ppc64/book3s: address change in page flags for PowerISA v3.0
ppc64: use physical addresses and unfold pud for 64K page size
ppc64/book3s: support big endian Linux page tables
The patches are needed for Linux v4.5 and later kernels on all
ppc64 hardware.
commit 8ceb1ac628bf6a0
Author: Dave Anderson <email address hidden>
Date: Mon May 23 11:23:01 2016 -0400
Fix for Linux commit 0139aa7b7fa12ce
renamed the page._count member to page._refcount. Without the patch,
certain "kmem" commands fail with the "kmem: invalid structure member
offset: page_count".
(<email address hidden>)
commit 7136bf8495948cb
Author: Dave Anderson <email address hidden>
Date: Thu May 19 14:01:19 2016 -0400
Fix for Linux commit edf14cdbf9a0e5a
has appended a NULL entry as the final member of the pageflag_names[]
array. Without the patch, a message that indicates "crash: failed to
read pageflag_names entry" is displayed during session initialization
in Linux 4.6 kernels.
(<email address hidden>)
2. The following makedumpfile commits are needed:
commit 5bc1f520cc7ab6e
Author: Atsushi Kumagai <email address hidden>
Date: Tue Jan 26 10:11:33 2016 +0900
[PATCH] Looking for page.compound_
* Required for kernel 4.4
Due to some changes in struct page, hugepages wouldn't be removed on
linux 4.4. makedumpfile reads page.lru.prev to get "order" (number of hugepages)
and page.lru.next to get "dtor" (destructor for hugepages) to detect hugepages,
but the offsets of the two was changed in linux 4.4.
kernel version | where is order | where is dtor
--
- v3.19 | lru.prev | lru.next
v4.0 - v4.3 | compound_
v4.4 - | compound_order | compound_dtor
As above, OFFSET(
definitely necessary in VMCOREINFO on linux 4.4 and later.
Further, the content of page.compound_dtor was changed from direct address
of dtor to the ID of it in linux 4.4.
Signed-off-by: Atsushi Kumagai <email address hidden>
commit 13b4233e91a9d5a
Author: Atsushi Kumagai <email address hidden>
Date: Wed Feb 24 17:09:44 2016 +0900
[PATCH] Skip examining compound tail pages
* Required for kernel 4.5
For filtering user pages, we check whether each page's
page->mapping have PAGE_MAPPING_ANON bit.
However, unexcludable compound tail pages can have
PAGE_
as user page wrong.
Now, we don't need to check compound tail pages because
excludable compound pages must be excluded at a time by
exclude_range() when the corresponding head page is checked.
So just skipping tail pages can avoid wrong filtering.
Signed-off-by: Atsushi Kumagai <email address hidden>
3. The linux-image dbgsym version installed must be pulled from a different repo
instead of the one meant for 16.04.2 because the gcc version of kernel
image (/boot/
symbols(
Please use the following repos
sudo tee /etc/apt/
deb http://
deb http://
deb http://
deb http://
EOF
to install linux-image-
Thanks
[snip]
>
> 3. The linux-image dbgsym version installed must be pulled from a different
> repo
s/must be pulled/must have been pulled/
Applied crash utility's missing patches on top of
crash-7.
makedumpfile-
patched binaries. The binaries were working as expected.
tags: | added: architecture-ppc64le bugnameltc-150136 severity-high targetmilestone-inin--- |
Changed in ubuntu: | |
assignee: | nobody → Taco Screen team (taco-screen-team) |
affects: | ubuntu → crash (Ubuntu) |
Changed in crash (Ubuntu Xenial): | |
assignee: | nobody → Louis Bouchard (louis-bouchard) |
Changed in makedumpfile (Ubuntu): | |
assignee: | nobody → Louis Bouchard (louis-bouchard) |
Changed in makedumpfile (Ubuntu Xenial): | |
assignee: | nobody → Louis Bouchard (louis-bouchard) |
Changed in crash (Ubuntu Xenial): | |
importance: | Undecided → High |
Changed in makedumpfile (Ubuntu): | |
importance: | Undecided → High |
Changed in makedumpfile (Ubuntu Xenial): | |
importance: | Undecided → High |
Changed in crash (Ubuntu Xenial): | |
status: | New → Confirmed |
Changed in makedumpfile (Ubuntu): | |
status: | New → Confirmed |
Changed in makedumpfile (Ubuntu Xenial): | |
status: | New → Confirmed |
Changed in crash (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → High |
description: | updated |
tags: |
added: targetmilestone-inin16042 removed: targetmilestone-inin--- |
tags: |
added: verification-done removed: verification-needed |
Changed in crash (Ubuntu): | |
assignee: | Taco Screen team (taco-screen-team) → nobody |
description: | updated |
tags: |
added: verification-done removed: verification-needed |
Changed in makedumpfile (Ubuntu): | |
assignee: | Louis Bouchard (louis) → Nish Aravamudan (nacc) |
Changed in crash (Ubuntu): | |
assignee: | nobody → Nish Aravamudan (nacc) |
Changed in ubuntu-power-systems: | |
status: | New → Fix Committed |
Changed in crash (Ubuntu Xenial): | |
assignee: | Louis Bouchard (louis) → nobody |
Changed in makedumpfile (Ubuntu Xenial): | |
assignee: | Louis Bouchard (louis) → nobody |
Changed in ubuntu-power-systems: | |
importance: | Undecided → High |
tags: | added: triage-r |
Changed in makedumpfile (Ubuntu): | |
assignee: | Nish Aravamudan (nacc) → nobody |
status: | Confirmed → Fix Released |
tags: |
added: triage-g removed: triage-r |
Changed in ubuntu-power-systems: | |
status: | Fix Committed → Fix Released |
Hello,
First of all, two of the listed crash patches are in the 7.1.5-1ubuntu2 package awaiting in zesty-proposed. It is blocked there by an FTBS on z390x. But even with those patches, it still fails so I assume that the other three patches are also needed.
But while trying to test your solution on x86_64, it turns out that it still fails with your patches. I suspect that other patches are needed for this specific architecture.
Regarding your makedumpfile patches, as far as I can tell,they are not required for the crash command to work. They will only improve the filtering of some specific kind of pages.
I am preparing an SRU for makedumpfile, so I will add them for completeness.
I am now trying to identify the missing commit for x86_64 and will see to get that situation fixed.
Kind regards,
...Louis