ISST-LTE: kdump failed: second kernel booting hangs after /scripts/init-bottom when large min_free_kbytes value being set

Bug #1528101 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kexec-tools (Ubuntu)
Invalid
Undecided
Louis Bouchard
linux (Ubuntu)
New
Undecided
Canonical Kernel Team

Bug Description

== Comment: #0 - Ping Tian Han <email address hidden> - 2015-07-15 04:21:23 ==
---Problem Description---
kdump can be triggered by "echo c > /proc/sysrq-trigger', but the second kernel hangs here:

...
[ 7.311129] sd 0:2:4:0: alua: rtpg failed with 8070002
[ 7.311232] sd 0:2:4:0: alua: port group 1dc state A preferred supports TOlUSNA
done.
Begin: Running /scripts/local-premount ... done.
[ 12.894379] EXT4-fs (dm-12): mounted filesystem with ordered data mode. Opts: (null)
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.
[ 13.463955] init: plymouth-upstart-bridge main process (1681) terminated with status 1
[ 13.463996] init: plymouth-upstart-bridge main process ended, respawning
[ 13.471552] init: plymouth-upstart-bridge main process (1691) terminated with status 1
[ 13.471586] init: plymouth-upstart-bridge main process ended, respawning
[ 13.479547] init: plymouth-upstart-bridge main process (1694) terminated with status 1
[ 13.479580] init: plymouth-upstart-bridge main process ended, respawning
[ 13.487503] init: plymouth-upstart-bridge main process (1696) terminated with status 1
[ 13.487536] init: plymouth-upstart-bridge main process ended, respawning
[ 13.496113] init: plymouth-upstart-bridge main process (1698) terminated with status 1
[ 13.496146] init: plymouth-upstart-bridge main process ended, respawning
[ 13.504363] init: plymouth-upstart-bridge main process (1700) terminated with status 1
[ 13.504397] init: plymouth-upstart-bridge main process ended, respawning
[ 13.512840] init: plymouth-upstart-bridge main process (1702) terminated with status 1
[ 13.512873] init: plymouth-upstart-bridge main process ended, respawning
[ 13.521159] init: plymouth-upstart-bridge main process (1704) terminated with status 1
[ 13.521196] init: plymouth-upstart-bridge main process ended, respawning
[ 13.531934] init: plymouth-upstart-bridge main process (1706) terminated with status 1
[ 13.531968] init: plymouth-upstart-bridge main process ended, respawning
[ 13.548264] init: plymouth-upstart-bridge main process (1708) terminated with status 1
[ 13.548301] init: plymouth-upstart-bridge main process ended, respawning
[ 14.774719] EXT4-fs (dm-12): re-mounted. Opts: errors=remount-ro
<-----------
and no vmcore dumpped.

Contact Information = Ping Tian <email address hidden>, Mikhail <email address hidden>

---uname output---
Linux dilllp1 3.19.0-22-generic #22~14.04.1-Ubuntu SMP Wed Jun 17 10:03:39 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

---System Hang---
 the booting process of the second kernel hangs. We have to reboot the system.

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. install ubuntu on dilllp1, which root device is on a mpath device
2. sudo apt-get install linux-crashdump
3. change the crashkerenl= parameter in /boot/grub/grub.cfg to 'crashkernel=768M', then reboot
4. trigger kdump by 'echo c > /proc/sysrq-trigger'

Userspace tool common name: kdump-tools

The userspace tool has the following bit modes: 64-bit

== Comment: #8 - Ping Tian Han <email address hidden> - 2015-12-09 02:09:30 ==
We can reproduce this bug with kernel 4.2.0-19.

== Comment: #12 - Hari Krishna Bathini <email address hidden> - 2015-12-09 08:22:27 ==
Is the issue only seen with sysctl configuration
"vm.min_free_kbytes = 312721" ?

Thanks
Hari

== Comment: #13 - Ping Tian Han <email address hidden> - 2015-12-09 21:08:54 ==
(In reply to comment #12)
> Is the issue only seen with sysctl configuration
> "vm.min_free_kbytes = 312721" ?
>

Yes, I just found that if use the default min_free_kbytes value then kdump works just fine!

== Comment: #14 - Hari Krishna Bathini <email address hidden> - 2015-12-10 00:51:43 ==
(In reply to comment #13)
> (In reply to comment #12)
> > Is the issue only seen with sysctl configuration
> > "vm.min_free_kbytes = 312721" ?
> >
>
> Yes, I just found that if use the default min_free_kbytes value then kdump
> works just fine!

It is blocking kdump from utilising the memory needed to boot.
So, crashkernel=768M with min_free_kbytes configured to
312721 translates to ~450 MB for second kernel boot (kdump)
which doesn't seem to be sufficient for booting.

How to handle this:

1. Reserve memory for crashkernel, that augments it by min_free_kbytes

    OR

2. Use default min_free_kbytes

Thanks
Hari

== Comment: #16 - Ping Tian Han <email address hidden> - 2015-12-10 01:27:40 ==
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > Is the issue only seen with sysctl configuration
> > > "vm.min_free_kbytes = 312721" ?
> > >
> >
> > Yes, I just found that if use the default min_free_kbytes value then kdump
> > works just fine!
>
> It is blocking kdump from utilising the memory needed to boot.
> So, crashkernel=768M with min_free_kbytes configured to
> 312721 translates to ~450 MB for second kernel boot (kdump)
> which doesn't seem to be sufficient for booting.
>
> How to handle this:
>
> 1. Reserve memory for crashkernel, that augments it by min_free_kbytes
>
> OR
>
> 2. Use default min_free_kbytes
>
> Thanks
> Hari

Is it possible that let kdump ignoreis the min_free_kbytes when being triggered? We need a large min_free_kbytes to run stress tests, but kdump doesn't need it.

== Comment: #19 - Hari Krishna Bathini <email address hidden> - 2015-12-20 13:44:12 ==
Ubuntu uses default initrd and kernel image to boot into kdump
as well. So, the same sysctl settings (runtime parameters)
apply for kdump as well. It is difficult to have one sysctl
setting for production kernel and another for kdump kernel
for this reason.

---
You can workaround this by using a non-persistent sysctl
setting in production kernel for vm.min_free_kbytes.

I mean, remove "vm.min_free_kbytes = 312721" from sysctl
configuration files and set it with the below command:

 # sudo sysctl vm.min_free_kbytes=312721

This way, kdump gets to use the default vm.min_free_kbytes
setting and production kernel could use the value you set.
---

If you wish to have a different soultion, please mirror the
bug and see what Ubuntu has to say about this..

Thanks
Hari

Revision history for this message
bugproxy (bugproxy) wrote : console log when kdump being triggered

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-127681 severity-high targetmilestone-inin14043
Revision history for this message
bugproxy (bugproxy) wrote : console output of pinelp2 during kdump

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1528101/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-12-28 10:39 EDT-------
Hi Launchpad,

Can you please take a look and assign appropriate developer to solve this bug?

Thanks for your support..!

- Henish

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-11 20:55 EDT-------
Hi,

This problem occurs on the system which has been set a high vm.min_free_kbytes value in the /etc/sysctl.conf. On those systems, kdump will fail. The workaround is set the value each time by running 'sysctl'. This isn't convenient. We'd like to have a solution that the value of vm.min_free_kbytes in production system's /etc/sysctl.con doesn't affect running of kdump.

Thanks.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-11 22:10 EDT-------
This bug can be reproduced on 14.04.4. I'd like to change the version to 14.04.4. Thanks.

Revision history for this message
Louis Bouchard (louis) wrote :

Hello,

The situation in this context is that the modification of vm.min_free_kbytes has a negative impact on the boot sequence of the kexec kernel (the kernel that allows for the capture of the kernel dump).

The output of the console that you provide clearly shows that the kdump sequence hasn't even started to execute so kexec-tools is not the culprit here. Now I agree that kernel parameter modifications made for normal execution of the kernel should not impair the execution of the kexec kernel when this one needs to run.

So we need to find a solution that will let us boot the kexec kernel with the default value for vm.min_free_kbytes and not the modified one.

I will look into that.

Changed in kexec-tools (Ubuntu):
assignee: nobody → Louis Bouchard (louis-bouchard)
bugproxy (bugproxy)
tags: added: targetmilestone-inin14044
removed: targetmilestone-inin14043
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-22 17:18 EDT-------
Any update to this bug?

Revision history for this message
Louis Bouchard (louis) wrote :

Hello,

The context here is that your modification of vm.min_free_kbytes brings the value of vm_free_kbytes above the available memory defined by the crashkernel boot parameter.

A definitive fix for this situation requires some non trivial development, which will take time.

I can only suggest for the time being to raise the value of crashkernel for your system, in order to alleviate the problem until a definitive solution is implemented.

Changed in kexec-tools (Ubuntu):
status: New → Confirmed
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-28 12:28 EDT-------
(In reply to comment #31)
> Hello,
>
> The context here is that your modification of vm.min_free_kbytes brings the
> value of vm_free_kbytes above the available memory defined by the
> crashkernel boot parameter.
>
> A definitive fix for this situation requires some non trivial development,
> which will take time.
>
> I can only suggest for the time being to raise the value of crashkernel for
> your system, in order to alleviate the problem until a definitive solution
> is implemented.

Hello, Canonical.

Just for clarification, is this something you'll be working to fix?

Revision history for this message
Louis Bouchard (louis) wrote :

Yes, I have started to work on a possible solution.

Louis Bouchard (louis)
Changed in kexec-tools (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-03-15 12:53 EDT-------
(In reply to comment #33)
> Yes, I have started to work on a possible solution.

Do you have any update on the solution to this issue?

Revision history for this message
Louis Bouchard (louis) wrote :

Hello,

After investigating the issue, it turns out that an adequate solution is not possible without sensible modification to the kernel.

As explained before, the kernel allows for the definition of a vm_free_kbytes which is above the size of the total memory available.

This parameter is enable very early during the boot phase and cannot easily be disabled.

kdump is not at fault here since it doesn't even get a chance to run prior to when the OOM starts killing processes.

The only workaround is to remove the definition of vm.vm_free_kbytes from the /etc/sysctl.conf file so the kernel dump may proceed.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-03-16 23:03 EDT-------
(In reply to comment #35)
> Hello,
>
> After investigating the issue, it turns out that an adequate solution is not
> possible without sensible modification to the kernel.
>
> As explained before, the kernel allows for the definition of a
> vm_free_kbytes which is above the size of the total memory available.
>
> This parameter is enable very early during the boot phase and cannot easily
> be disabled.
>
> kdump is not at fault here since it doesn't even get a chance to run prior
> to when the OOM starts killing processes.
>
> The only workaround is to remove the definition of vm.vm_free_kbytes from
> the /etc/sysctl.conf file so the kernel dump may proceed.

Looks like kdump of ubuntu uses the default initrd of production kernel and it will read the /etc/sysctl.conf. I think we can add some code to figure which context is before trying to read it and don't use it when running kdump.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-03-28 16:01 EDT-------
Minor changes to the kernel handling of user_min_free_kbytes appears to be the ideal solution here since it would have benefits outside of the kdump environment. Currently, it appears that it is too easy to manually set a completely inappropriate value. Elsewhere, the value is capped at either 65536 or 5% of lowmem, but it appears that the user provided value is not run through the same sanity checks (see init_per_zone_wmark_min() as an example).

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-03-28 21:29 EDT-------
Hi,

This bug only occurs when min_free_kbytes=312721 is written in /etc/sysctl.conf. If we didn't write it into /etc/sysctl.conf and just set it with "sysctl min_free_kbytes=312721", then this bug won't be triggered when triggering kdump.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-04 21:50 EDT-------
(In reply to comment #42)
> The documentation is not fixed and recommends the usage of 5% of your whole
> memory as the min_free_kbyte is documented at:
>
> https://wiki.ubuntu.com/ppc64el/
> Recommendations#Min_free_kbytes_kernel_configuration

Hi,

On some situations, such as bug 139648, we have to use a large min_free_kbytes vaule to prevent oom killer being triggered when running stress tests. So looks like a large min_free_kbytes is needed.

And I think this bug isn't a problem of min_free_kbytes too large. It is because that when performing kdump, the value is read from /etc/sysctl.conf. If we can change this (don't using min_free_kbytes of /etc/sysctl.conf), this bug can be fxied.

Thanks.

Revision history for this message
bugproxy (bugproxy) wrote : console output of pinelp2 during kdump

Default Comment by Bridge

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.