kdump just hung out of the box

Bug #1931779 reported by Rich
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
makedumpfile (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I tried installing kdump-tools (1.6.7-1ubuntu2.2) on my up to date 20.04 system, installed specifically to try reproducing a bug.

But when I tried, after kdump-config status reported "ready to dump" on reboot, echo 'c' | sudo tee /proc/sysrq-trigger, it printed the panic to console and then just hung forever.

After some blind guessing and twiddling both variables, I found that crashkernel=512M-:256M works on this particular setup. (MS7850 motherboard, i5-4670 CPU, 5.4.0-42-generic kernel)

Rich (rincebrain)
description: updated
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thank you for the report! I think the problem was memory then, you nailed it - by reserving more memory, you were able to make it work. This is a policy and currently, the default is a bit conservative in order to not waste memory. It works in most of the systems...

We have a memory estimator project [0] that (hopefully) will help users to determine the proper amount of reserved memory for their systems. For now, I guess we can consider this bug fixed (by the reporter!).
Cheers,

Guilherme

[0] https://salsa.debian.org/debian/kdump-tools/-/merge_requests/13

Changed in makedumpfile (Ubuntu):
status: New → Fix Released
Revision history for this message
Rich (rincebrain) wrote :

I suppose I should have made it clear that just changing the amount of memory reserved was not sufficient.

Perhaps it would be useful to print a message saying "hey, I know we set something by default, but we don't expect that to actually work, so you should probably test and adjust it"?

Because it sounds like it's not expected to work out of the box, and configuring it to "on, but doesn't actually work or warn you" when you install the package seems worse than not configuring it, similar to how I wouldn't expect installing ccache to result in it sticking itself in my PATH, reporting "ccache enabled and working" when asked, but then defaulting to a max cache size of 4 kilobytes.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Oh yeah, it would be awesome to set such message...but I'm not sure how we can do it! If we know beforehand that the reserved memory amount won't work for your system, to show the message..why we don't just fix it?

The problem is that this is a policy, there is no technical mechanism (currently) that foresees that a quantity of RAM equals X is required for your system in order kdump to work. We have the heuristic (aka sets crashkernel to 192M), which is expected to work on the majority of machines. Hence, the recommendation for kdump users is to install the package and perform a dummy kdump, as you did! If it fails, go ahead and fine tune that.

With the estimator I've pointed you in the last comment, things should improve, having a mechanism that will suggest a proper amount, pre-calculated on boot time. But until we get that merged, this situation is not ideal and will remain like this - I understand it's not the best, if you have a technical suggestion on how we could improve it, it's highly appreciated =)
Cheers,

Guilherme

Revision history for this message
Rich (rincebrain) wrote :

My suggestion would be to print a warning unconditionally on first install that you should test it and not assume it will just work, and/or to not set crashkernel=... with the default in the first place so people have to go explicitly set it (and, implicitly, read about the fact that one size does not fit all when finding out how). (I would even do the former if your very next commit landed the heuristic estimator to improve this, until you were confident it worked in the majority of cases.)

I know it's not readily programmatically possible to tell "will this work" short of actually doing it, because haha hardware, my lament is that in my experience, the default has _never_ worked, on my VMs or real hardware, and I would not expect something to configure itself with defaults, resulting in e.g. kdump-config status reporting "ready to kdump" when asked, and then...not.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

This is a good suggestion Rich! I've looped Dann and Cascardo, both maintainers of kdump-tools to also discuss this subject.

We could have a more explicit message when installing the package, maybe in the same screen that asks if we want to enable kdump by default. I'm against the idea of _not_ setting the default crashkernel though - good suggestion for intermediate/advanced users, bad suggestion for beginners I think. Also, I must say: my experience was pretty reciprocal to yours. The default always worked for me in regular/simple HW or VMs, it fails in more peculiar setups, like 200 PCI devices, 1 TiB of RAM, etc.

Thanks for bringing-up the discussion =)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.