makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

Bug #1970672 reported by Kellen Renshaw
28
This bug affects 2 people
Affects Status Importance Assigned to Milestone
makedumpfile (Ubuntu)
Fix Released
Medium
Kellen Renshaw
Focal
Fix Released
Medium
Heather Lemon

Bug Description

[Impact]
 * On Focal with an HWE (>=5.12) kernel, makedumpfile can sometimes fail with "__vtop4_x86_64: Can't get a valid pmd_pte."

 * makedumpfile falls back to cp for the dump, resulting in extremely large vmcores. This can impact both collection and analysis due to lack of space for the resulting vmcore.

 * This is fixed in upstream commit present in versions 1.7.0 and 1.7.1:
https://github.com/makedumpfile/makedumpfile/commit/646456862df8926ba10dd7330abf3bf0f887e1b6

commit 646456862df8926ba10dd7330abf3bf0f887e1b6
Author: Kazuhito Hagio <email address hidden>
Date: Wed May 26 14:31:26 2021 +0900

    [PATCH] Increase SECTION_MAP_LAST_BIT to 5

    * Required for kernel 5.12

    Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about
    ZONE_DEVICE section collisions") added a section flag
    (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on
    some machines like this:

      __vtop4_x86_64: Can't get a valid pmd_pte.
      readmem: Can't convert a virtual address(ffffe2bdc2000000) to physical address.
      readmem: type_addr: 0, addr:ffffe2bdc2000000, size:32768
      __exclude_unnecessary_pages: Can't read the buffer of struct page.
      create_2nd_bitmap: Can't exclude unnecessary pages.

    Increase SECTION_MAP_LAST_BIT to 5 to fix this. The bit had not
    been used until the change, so we can just increase the value.

    Signed-off-by: Kazuhito Hagio <email address hidden>

[Test Plan]
 * Confirm that makedumpfile works as expected by triggering a kdump.

 * Confirm that the patched makedumpfile works as expected on a system known to experience the issue.

 * Confirm that the patched makedumpfile is able to work with a cp-generated known affected vmcore to compress it. The unpatched version fails.

[Where problems could occur]

 * This change could adversely affect the collection/compression of vmcores during a kdump situation resulting in fallback to cp.

Changed in makedumpfile (Ubuntu):
assignee: nobody → Kellen Renshaw (krenshaw)
tags: added: sts
summary: - makedumpfile fails with __vtop4_x86_64: Can't get a valid pmd_pte.
+ makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid
+ pmd_pte."
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Kellen, thanks a lot for reporting and fixing that!

I'd like to take the opportunity to discuss something related: no matter how many bugs we fix in makedumpfile / crash, more will come as kernel version bumps. Kernel has no stable ABI, so kernel developers can "break" compatibility with such tools, although makedumpfile maintainer (and crash's as well!) are really great in keep up with that and release proactive fixes even before the kernel change is merged.

But the problem is: in Ubuntu ecosystem, despite we have the HWE concept for kernel, these packages are not part of kernel HWE upgrades; hence, they get "stuck" and subject to bugs when kernel HWE is released. It happens all the time and will continue happening...

We had discussions in the past (and I'm hereby CCing the interested parties: DannF, Dan Streetman, Heitor and Cascardo) about sync'ing makedumpfile and crash with kernel HWE upgrades. So, that might be a good opportunity for doing it.

The idea was more or less like this: update makedump/crash on Release to make it sync'ed with Release +1 until the next LTS. So, in the end, we'll have LTS version == LTS +1 and then, we stop upgrading/syncing these packages. And the cycle restarts for LTS+1, up to the release of LTS+2.
Hopefully this plan (or something similar) eventually is followed, I bet all users/customers would be glad to not face makedump/crash bugs due to kernel upgrades anymore!

Cheers, and thanks for the attention =D

Revision history for this message
Kellen Renshaw (krenshaw) wrote :

Hi Guilherme!

I have been looking at the possibility of an SRU exception for makedumpfile, although I like your idea better (LTS N to LTS N+1).

How would we go about implementing that? I am willing to do the building/testing work, but would need guidance on how to do that and properly record/report the results.

Thanks!

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

There are two risks with that plan that we should overcome.

One is testing, such updates should not cause regressions. As of right now, the small testing that makedumpfile receives is not sufficient and gives a lot of false negatives. We should be testing that new kernels are still dumpable (and fix either kernel or makedumpfile when they are not). And test that new makedumpfile versions do not break dumping all the supported kernel versions (which, in my opinion is a little harder, and puts some burden on makedumpfile updates). Users do run outdated kernels and would expect dumps when they crash, so this is a bit of a challenge. We do not need to be perfect and test all kernels in all scenarios, but we definitively need to do better.

The second one is kernel support. It's not unusual that we release an Ubuntu version with a makedumpfile that cannot dump the GA kernel. So, even without considering HWE kernels, an LTS release may need a newer makedumpfile. One of the reasons is that as we don't test as we upload new kernels to the development series, we don't realize makedumpfile needs additional support for that new kernel. Sometimes, just having the latest released makedumpfile is sufficient. But it's too often the case that upstream makedumpfile is only able to catch up with latest kernel releases after a while.

Cascardo.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks Kellen and Cascardo.

Kellen, nice that you're willing to work on that - this is a long standing problem and that work would be definitely appreciated by the Ubuntu community, be it free users or the Ubuntu Advantage customers!

Cascardo, about your two risks:

(a) Partially agree with that. I agree with the part of testing, definietly this is the big chunk of work here. But I disagree with the retrocompatibility claim: of course we need to enforce that, but it's not that difficult in the LTS->LTS+1 model. See, we have a small number of HWE kernel per LTS release, I guess 4 or 5 correct? We need to be sure the makedumpfile updates are compatible with them, and that's it.

If I'm talking Focal and some makedumpfile update (unintentionally) breaks dumps for kernel 4.19 or prior, why should we care if that's an unsupported scenario?
IMHO it's much better to ensure that every HWE kernel receives a proper functional makedumpfile update, instead of an overly cautious attitude with older/unsupported kernels.

(b) I agree here, but I guess the effort of SRU exception/ LTS->LTS+1 model will only make it easier. Imagine if when Ubuntu version X is released the upstream makedumpfile is not handling well the recent kernel version used by X - so we could either fix (or report) the makedumpfile issue quickly (especially due to part (a) above, the improved testing). Then, once it's fixed either by Canonical or community, this could quickly be integrated through a fast process, a version bump for makedumpfile for example.

In the end, I think testing is the key word here - the more serious and thorough tests makedumpfile has, the more confidence in such model we'd have. But hopefully with Kellen's effort this stops preventing a more proactive approach with makedumpfile from happening, by updating it before users report bugs (which has been happening since forever for this package).

Cheers!

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Guilherme.

I think you misunderstood and we are in agreement. What I mean by the first point is that, on Focal, we need to support 5.4 and 5.15. I don't even think we need to support 5.8 and 5.11 any longer, though 5.13, as it will still be supported for a while needs supporting. But I also mean that we should support not only 5.4.0-113 and 5.4.0-114 (versions on -updates and -proposed), we should also support older 5.4 versions (one could argue all the versions that are still livepatchable at least). I don't think the risk of regressions is big here, but it exists. We ought to balance that risk and how much we can test.

About the second point, it's just an argument that we should not restrict ourselves to the upstream major makedumpfile version that is in LTS+1. Take focal, which will receive 5.15 from jammy. It may require a newer makedumpfile than the one in Jammy because the one in Jammy may not be sufficient to support 5.15.

Cascardo.

Revision history for this message
dann frazier (dannf) wrote :

Thanks everyone for trying to tackle this long-standing issue. fwiw, here's my $0.02 no how we could proceed:

Someone should draft a special case page for makedumpfile:
https://wiki.ubuntu.com/StableReleaseUpdates#Documentation_for_Special_Cases
I'm happy to review/provide feedback, but I'd rather someone who would be carrying out the plan drive it.

As others have mentioned, testing is the hard part, and we need to define what will be tested in the special case documentation. Since makedumpfile is really just a filter, I don't think we need to (or reasonably could) boot a bunch of systems in different configs and generate crashdumps for every new update. Rather, i think we could build a repository of representative, unfiltered, /proc/vmcore files that focal's existing makedumpfile can parse. Then we can just check that all of those files can still be parsed by the proposed makedumpfile. With some scripting and a multi-architecture cloud, this could be automated. In fact, if this vmcore repo were online, we could implement this an autopkgtest (w/ needs-internet set). But we should also do at least one end-to-end kdump, just to make sure the kdump-tools->makedumpfile interface hasn't been broken.

What is a representative sample? One of each of the current LTS and HWE kernels on amd64, arm64, ppc64el and s390x seems like an obvious start (or the subset of those that actually work today). I don't think the machine type is as important, VMs should be fine IMO. If we know of examples where different machines expose structures differently in a way that makedumpfile cares about, then perhaps add those as well. Once a new makedumpfile lands that adds support for a new HWE kernel, we should probably then update the repo w/ vmcore samples from that kernel, so we can make sure the next update doesn't regress that support (probably convenient to do when verifying the SRU, since I imagine we'd be testing that it works w/ the new HWE kernel then anyway).

It'd be good to note in the special case request that kdump-tools does fall back to a raw /proc/vmcore file cp if makedumpfile fails, which can mitigate regressions for a subset of users
 - those with the necessary disk space and lack of time constraints.

While I agree that crash falls into the same category, I don't think it necessarily needs to happen at the same time. Obviously users running focal need to dump their vmcore using focal - bug for developers debugging a crash, I don't think it is to onerous to use a newer version of Ubuntu. Again, I'm no saying we *shouldn't* add crash to the special case, it just seems like a makedumpfile exception is significantly more important.

Finally, I don't think we need to commit to a frequency of backports, or a point at which they will stop. Rather we can just stick to agreeing on how it *can* be done when someone has the time/interest in doing it. Guillherme's LTS->LTS+1 scheme sounds like a reasonable pattern to shoot for, but if that doesn't happen every time, we're still improving the situation over the status quo.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks a lot Cascardo and Dann - very good points.

Cascardo: I agree with you, I misunderstood and didn't consider the minor kernel releases. I think that Dann's idea of testing makes it much simpler though. Maybe kernel team could create the vmcore images, as part of the release process, for some random kernels (like generic + 3 cloud kernels randomly) and "dump" into this server. So, makedumpfile test infrastructure would then consume the vmcores and execute the test, checking against bugs/regressions. The test could be quite simple, just checking return value of makedumpfile and if the file created is in fact a compressed dump (the "file" tool could be used by that).

I agree with you as well Dann, makedumpfile is much more important than the crash tool and should have priority. Also, testing makedump is easier than checking the crash tool I guess heheh

Cheers!

Revision history for this message
Kellen Renshaw (krenshaw) wrote :

Thanks everyone for the awesome input and ideas! I will draft the special case exception and figure out some way to post it here for review.

Regarding testing, I really like Dann and Guilherme's idea for automated testing of vmcores.

Where would that infrastructure live and how would we maintain a repo (or just directory) of vmcores in a well-known location suitable for package testing?

Revision history for this message
Kellen Renshaw (krenshaw) wrote :

During a discussion about the potential for autopkgtests it was brought up that a representative sample of unfiltered vmcores would be large (even compressed). This would impose bandwidth and disk space constraints on anyone attempting to run the autopkgtests locally.

That tends to make me lean toward setting up a library of cores with test infrastructure somewhere that is run as part of the SRU exception process, but not as an autopkgtest.

Revision history for this message
dann frazier (dannf) wrote :

Yeah, I don't think an autopkgtest is a requirement. Having the test live within the package itself would make it more accessible, and autopkgtest was my first thought as to how to do that. But I don't see a "disable by default" flag, and I agree that huge downloads on every test run could be a problem. It would still be nice IMO if the test was integrated into the package though, even if it needs to run manually.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in makedumpfile (Ubuntu):
status: New → Confirmed
Revision history for this message
Kellen Renshaw (krenshaw) wrote :

I have put up a draft SRU exception page at https://wiki.ubuntu.com/MakedumpfileUpdates. Comments/edits are welcome. I am working on finding an appropriate place for test vmcores to be stored.

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1970672] Re: makedumpfile falls back to cp with "__vtop4_x86_64: Can't get a valid pmd_pte."

On Fri, Sep 23, 2022 at 11:15 AM Kellen Renshaw
<email address hidden> wrote:
>
> I have put up a draft SRU exception page at
> https://wiki.ubuntu.com/MakedumpfileUpdates. Comments/edits are welcome.
> I am working on finding an appropriate place for test vmcores to be
> stored.

Awesome! Thanks for starting this. I made some edits already, but
here's some other things I'd recommend:
- Provide links under resources (it's a web page!)
- be more precise about what will be tested, ideally in a checklist
form. Make it clear enough that you could give the list to another
developer and they would run the same tests and get the same results.
- Testing other architectures should not be optional IMO. makedumpfile
has architecture specific knowledge. You could easily break one w/o
noticing on another.
- Mention the plan to filter a pool of saved dump files as a
regression test. That is key IMO.
- If you have the regression testing pool, I don't see any reason to
have to do the full kdump process everywhere. Just do that once (one
kernel, one arch) to make sure the kdump-tools<->makedumpfile
interface has been preserved. Testing the full crash dump process is
painful, but makedumpfile really is just used as a filter, and we can
test that easily.

  -dann

Revision history for this message
Kellen Renshaw (krenshaw) wrote :

Thanks Dann!

Oof, the links didn't make it from the draft document, fixed now.

If I understand your previous comment correctly, a better variant of this testing procedure would be a regression testing checklist against a library (linked to in the page) of dump files and a single manual test of the kdump<>makedumpfile functionality?

Revision history for this message
dann frazier (dannf) wrote :

On Wed, Sep 28, 2022 at 12:41 PM Kellen Renshaw
<email address hidden> wrote:
>
> Thanks Dann!
>
> Oof, the links didn't make it from the draft document, fixed now.
>
> If I understand your previous comment correctly, a better variant of
> this testing procedure would be a regression testing checklist against a
> library (linked to in the page) of dump files and a single manual test
> of the kdump<>makedumpfile functionality?

As someone who in no way speaks for the SRU team, yes - that is what
I'd suggest.

  -dann

Revision history for this message
Heather Lemon (hypothetical-lemon) wrote :
Revision history for this message
Heather Lemon (hypothetical-lemon) wrote :
Revision history for this message
Heather Lemon (hypothetical-lemon) wrote :

focal debdiff of backport patch

Changed in makedumpfile (Ubuntu Focal):
assignee: nobody → Heather Lemon (hypothetical-lemon)
status: New → Confirmed
importance: Undecided → Medium
Changed in makedumpfile (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "lp1970672-focal.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

$ dput ubuntu makedumpfile_1.6.7-1ubuntu2.5_source.changes
D: Setting host argument.
Checking signature on .changes
gpg: /tmp/makedumpfile_1.6.7-1ubuntu2.5_source.changes: Valid signature from 9B8EC849D5EF70ED
Checking signature on .dsc
gpg: /tmp/makedumpfile_1.6.7-1ubuntu2.5.dsc: Valid signature from 9B8EC849D5EF70ED
Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading makedumpfile_1.6.7-1ubuntu2.5.dsc: done.
  Uploading makedumpfile_1.6.7-1ubuntu2.5.debian.tar.xz: done.
  Uploading makedumpfile_1.6.7-1ubuntu2.5_source.buildinfo: done.
  Uploading makedumpfile_1.6.7-1ubuntu2.5_source.changes: done.
Successfully uploaded packages.

Changed in makedumpfile (Ubuntu):
status: Confirmed → Fix Released
Changed in makedumpfile (Ubuntu Focal):
status: Confirmed → In Progress
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Kellen, or anyone else affected,

Accepted makedumpfile into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.7-1ubuntu2.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Heather Lemon (hypothetical-lemon) wrote :
Download full text (5.1 KiB)

### VERIFICATION-FAILED FOCAL-PROPOSED ###

makedumpfile original version: 1:1.6.7-1ubuntu2.4
makedumpfile proposed version: 1:1.6.7-1ubuntu2.5
kernel version: 5.15.0-89-generic hwe

sudo apt install -y linux-crashdump

sudo vim /etc/default/grub.d/kdump-tools.cfg
 # Add the line at the top of the file below USE_KDUMP=1
 LOAD_KEXEC=true
 # Uncomment the makedumpfile line and change 31 to 32
 MAKEDUMP_ARGS="-c -d 32"
 exit vim

sudo vim /etc/default/grub.d/kdump-tools.cfg
change 192 to either 256M or 512M

sudo sysctl -w kernel.sysrq=1

sudo apt-get update-grub
sudo reboot

sudo su
kdump-config show

Needs to look similar to:
root@focal:/home/ubuntu# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x73000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.15.0-89-generic
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.15.0-89-generic
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.15.0-89-generic root=UUID=975b9a95-b58e-48da-bd23-dd01b13bcbad ro quiet splash vt.handoff=7 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

*most importantly the addr is not zero and the state is: ready to kdump.

# trigger crash

echo c > /proc/sysrq-trigger

... wait for reboot and login

cd /var/crash/

There is a folder with a datetimestamp of the crash, inside the folder is the vmcore and dmesg files
There is also a file called linux-image*.crash in /var/crash/

*Note: In bionic we get the cp message

[ 6.946187] kdump-tools[622]: Starting kdump-tools: * running makedumpfile -c -d 32 /proc/vmcore /var/crash/202311221422/dump-incomplete
[ 6.959932] kdump-tools[622]: Dump_level(32) is invalid.
[ 6.964316] kdump-tools[622]: makedumpfile Failed.
[ 6.976231] kdump-tools[622]: * kdump-tools: makedumpfile failed, falling back to 'cp'
[ 25.084729] kdump-tools[622]: * kdump-tools: saved vmcore in /var/crash/202311221422
[ 26.355039] kdump-tools[622]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/202311221422/dmesg.202311221422
[ 26.436513] kdump-tools[622]: The dmesg log is saved to /var/crash/202311221422/dmesg.202311221422.
[ 26.443208] kdump-tools[622]: makedumpfile Completed.
[ 26.449066] kdump-tools[622]: * kdump-tools: saved dmesg content in /var/crash/202311221422
[ 26.463950] kdump-tools[622]: Wed, 22 Nov 2023 14:22:50 +0000
[ 26.490318] kdump-tools[622]: Rebooting.

*Note: In Focal we don't

Nov 28 13:35:54 focal kdump-tools[604]: Starting kdump-tools:
Nov 28 13:35:54 focal kdump-tools[667]: Starting kdump-tools:
Nov 28 13:35:54 focal kdump-tools[667]: * Creating symlink /
Nov 28 13:35:54 focal kdump-tools[708]: * Creating symlink /
Nov 28 13:35:54 focal kdump-tools[708]: n: failed to create symbolic link '/va: No such file or directory
Nov 28 13:35:54 focal kdump-tools[713]: kdump-tools: Generating /var/lib/kdump/initrd.img-5.15.0-89-generic
Nov 28 13:35:57 focal kdump-tools[667]: * Creating syml...

Read more...

tags: added: verification-failed-focal
removed: verification-needed-focal
Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

I've tested makedumpfile from -proposed on Focal and it looks good to me.

Using a vmcore file with 2TB as an input:

- Original makedumpfile 1.6.7-1ubuntu2.4 fails:

ubuntu@kdump-instance:~$ sudo apt-cache policy makedumpfile
makedumpfile:
  Installed: 1:1.6.7-1ubuntu2.4
  Candidate: 1:1.6.7-1ubuntu2.4
  Version table:
 *** 1:1.6.7-1ubuntu2.4 500
        500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:1.6.7-1ubuntu2 500
        500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 Packages

ubuntu@kdump-instance:/mnt/202204202351$ makedumpfile -c -d 31 ./vmcore.202204202351 ./dump-incomplete-fabio
The kernel version is not supported.
The makedumpfile operation may be incomplete.
Checking for memory holes : [100.0 %] / __vtop4_x86_64: Can't get a valid pmd_pte.
readmem: Can't convert a virtual address(ffffecff81800000) to physical address.
readmem: type_addr: 0, addr:ffffecff81800000, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.

makedumpfile Failed.

- Makedumpfile 1.6.7-1ubuntu2.5 from proposed works:

ubuntu@kdump-instance:~$ sudo apt-cache policy makedumpfile
makedumpfile:
  Installed: 1:1.6.7-1ubuntu2.5
  Candidate: 1:1.6.7-1ubuntu2.5
  Version table:
 *** 1:1.6.7-1ubuntu2.5 500
        500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:1.6.7-1ubuntu2.4 500
        500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
     1:1.6.7-1ubuntu2 500
        500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 Packages

ubuntu@kdump-instance:/mnt/202204202351$ makedumpfile -c -d 31 ./vmcore.202204202351 ./dump-incomplete-fabio
The kernel version is not supported.
The makedumpfile operation may be incomplete.
Copying data : [100.0 %] - eta: 0s

The dumpfile is saved to ./dump-incomplete-fabio.

makedumpfile Completed.

It reduced the dump file from 2TB down to 4.5G:

ubuntu@kdump-instance:/mnt/202204202351$ ls -lh vmcore.202204202351
-r-------- 1 ubuntu ubuntu 2.0T Apr 21 2022 vmcore.202204202351

ubuntu@kdump-instance:/mnt/202204202351$ ls -lh dump-incomplete-fabio
-rw------- 1 ubuntu ubuntu 4.5G Dec 12 14:23 dump-incomplete-fabio

The reason for having a vmcore file with the size of the installed RAM in the comment reported by Heather, is that you are forcing makedumpfile to fail, by providing "-c -d 32" (which is a level that doesn't exist, as the max is 31) or moving the makedumpfile binary away, so kdump fails over to cp, which hence will produce the vmcore file with the size of the installed RAM.

Let me know if this is enough to have focal verification concluded.

tags: added: verification-done-focal
removed: verification-failed-focal verification-needed
Revision history for this message
Heather Lemon (hypothetical-lemon) wrote :

Ah okay that makes sense, thanks for re-testing. What you've done looks good to me.

Revision history for this message
Chris Halse Rogers (raof) wrote :

So, the test plan was:
```
[Test Plan]
 * Confirm that makedumpfile works as expected by triggering a kdump.

 * Confirm that the patched makedumpfile works as expected on a system known to experience the issue.

 * Confirm that the patched makedumpfile is able to work with a cp-generated known affected vmcore to compress it. The unpatched version fails.
```

I'm not familiar with this area, but as far as I can tell only the 3rd step has been tested here? Am I misunderstanding the comments, or have we missed some of the test plan?

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Hi Chris,

You're correct, I'm sorry. My test on comment #23 is the 3rd item you listed.

Let me work on 1 and 2 and I'll get back here.

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

For item 2:

 * Confirm that the patched makedumpfile works as expected on a system known to experience the issue.

Unfortunately I'm no longer able to reproduce the original issue.

Even running on the same hardware where this was originally noticed, with the same kernel version (5.13.0-1027-oracle), makedumpfile from focal-updates/main (1:1.6.7-1ubuntu2.4) is just working well:

[ 53.223512] kdump-tools[693]: Starting kdump-tools:
[ 53.623944] kdump-tools[702]: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/202312151415/dump-incomplete
Copying data : [ 196.965120] reboot: Restarting system
[ 22.0 %] |

Unfortunately I don't have the information and I don't have access to the original system to check what version of makedumpfile it was using back then, so I could test the exact same makedumpfile+kernel versions.

I also tested kernel 5.13.0-1027-oracle + makedumpfile 1:1.6.7-1ubuntu2 from focal/main, and in this combinarion, makedumpfile fails with a similar, but slightly different error, then falls back to cp:

[ 53.721130] kdump-tools[690]: Starting kdump-tools:
[ 54.121624] kdump-tools[699]: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/202312151434/dump-incomplete
[ 54.249624] kdump-tools[719]: get_mm_sparsemem: Can't get the address of mem_section.
[ 54.345410] kdump-tools[719]: The kernel version is not supported.
[ 54.425405] kdump-tools[719]: The makedumpfile operation may be incomplete.
[ 54.517391] kdump-tools[719]: makedumpfile Failed.
[ 54.577916] kdump-tools[699]: * kdump-tools: makedumpfile failed, falling back to 'cp'

However, using the latest makedumpfile from focal-updates/main (1:1.6.7-1ubuntu2.4) fixes this situation, as mentioned / shown above.

Due to this reason, I can't conclude the item 2.

I'll work now on 1.

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

For item 1:

 * Confirm that makedumpfile works as expected by triggering a kdump.

I can confirm that makedumpfile 1:1.6.7-1ubuntu2.5 from focal-proposed/main worked well when I triggered a dump in a system:

ubuntu@fabio-small-makedumpfile:~$ sudo hostnamectl
   Static hostname: fabio-small-makedumpfile
         Icon name: computer-vm
           Chassis: vm
        Machine ID: dee0adfb9aa54246b4d1e2fc62dd50f7
           Boot ID: adba6ba3977f4c758a7008013a7a6d1e
    Virtualization: oracle
  Operating System: Ubuntu 20.04.6 LTS
            Kernel: Linux 5.15.0-1049-oracle
      Architecture: x86-64
ubuntu@fabio-small-makedumpfile:~$ sudo kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x2c000000
0xfd7f000000
   /boot/vmlinuz-5.15.0-1049-oracle
kdump initrd:
   /boot/initrd.img-5.15.0-1049-oracle
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.15.0-1049-oracle root=UUID=7d8611b4-d3e7-4f1a-a8f9-e1a7e5a2d2f9 ro console=tty1 console=ttyS0 nvme.shutdown_timeout=10 libiscsi.debug_libiscsi_eh=1 crash_kexec_post_notifiers reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb" --initrd=/boot/initrd.img-5.15.0-1049-oracle /boot/vmlinuz-5.15.0-1049-oracle
ubuntu@fabio-small-makedumpfile:~$ sudo dpkg -l makedumpfile
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-==================-============-=================================
ii makedumpfile 1:1.6.7-1ubuntu2.5 amd64 VMcore extraction tool
ubuntu@fabio-small-makedumpfile:~$ sudo apt-cache policy makedumpfile
makedumpfile:
  Installed: 1:1.6.7-1ubuntu2.5
  Candidate: 1:1.6.7-1ubuntu2.5
  Version table:
 *** 1:1.6.7-1ubuntu2.5 500
        500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:1.6.7-1ubuntu2.4 500
        500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
     1:1.6.7-1ubuntu2 500
        500 http://phx-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 Packages

Output showing that it completed well:

[ 54.490112] kdump-tools[676]: Starting kdump-tools:
[ 54.876357] kdump-tools[686]: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/202312151524/dump-incomplete
Checking for memory holes : [100.0 %] \ [ 204.391465] reboot: Restarting system

And when I look at the crash, it's properly compressed (system had 1TB of RAM):

ubuntu@fabio-small-makedumpfile:~$ ls -lh /var/crash/202312151524
total 2.3G
-rw------- 1 root root 126K Dec 15 15:26 dmesg.202312151524
-rw------- 1 root root 2.3G Dec 15 15:26 dump.202312151524

Regards,
Fabio Martins

Revision history for this message
Chris Halse Rogers (raof) wrote :

Ok. My understanding of this is that when (3) fails it fails for the same reason that (2) would. We've verified that making a dumpfile works in general, and definitely fixed (3), so I'm going to release this now.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.7-1ubuntu2.5

---------------
makedumpfile (1:1.6.7-1ubuntu2.5) focal; urgency=medium

  * makedumpfile falls back to cp with 5.12 kernel (LP: #1970672)
    - can also fail with __vtop4_x86_64: Can't get a valid pmd_pte.
    - d/p/lp1970672-PATCH-Increase-SECTION_MAP_LAST_BIT-to-5.patch

 -- Heather Lemon <email address hidden> Tue, 21 Nov 2023 15:19:22 +0000

Changed in makedumpfile (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for makedumpfile has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.