Incorrect handling of overcommit_memory kernel sysctl.

Bug #1808225 reported by Chris Friesen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

Title
-----
Incorrect handling of overcommit_memory kernel sysctl.

Brief Description
-----------------
In the code for collectd (and similar code for rmon) there is logic that reads /proc/sys/vm/overcommit_memory and treats it as 'strict" if the value is 1.

Unfortunately, this is wrong. A value of 0 means "heuristic overcommit", a value of 1 means "always overcommit", while a value of 2 means "don't overcommit". (See https://www.kernel.org/doc/Documentation/vm/overcommit-accounting)

The end result is that we end up raising critical memory alarms when there is still 50% of the memory unused.

Severity
--------
Major: it results in a spurious critical memory alarm, which causes the node to go "degraded"

Steps to Reproduce
------------------
Install StarlingX, configure with "--kubernetes", install openstack application. Active controller node will be degraded due to spurious critical memory alarm.

Expected Behavior
------------------
There is over 50% of memory still unused, should not have critical memory alarm.

Actual Behavior
----------------
There's a spurious critical memory alarm.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Dedicated storage, kubernetes

Branch/Pull Time/Commit
-----------------------
starlingx master branch, 2018-12-10_20-18-00

Timestamp/Logs
--------------
N/A

Chris Friesen (cbf123)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-integ (master)

Reviewed: https://review.openstack.org/624821
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=0ec172537192932c11f7a9cdc799fbc7e49a22e1
Submitter: Zuul
Branch: master

commit 0ec172537192932c11f7a9cdc799fbc7e49a22e1
Author: Eric MacDonald <email address hidden>
Date: Wed Dec 12 17:15:10 2018 -0500

    Fix collectd Memory plugin Strict Mode learning

    Existing code sets overcommit strict mode to True
    if any non-zero value is returned from a read
    of /proc/sys/vm/overcommit_memory.

    This is incorrect.

    Strict mode should only be set when the returned
    value is 2.

    Change-Id: I2c5328624571bb3b2f478d5a79615650bb92cbd2
    Closes-Bug: 1808225
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: New → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
tags: added: stx.metal
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.2019.03
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-integ (f/centos76)

Fix proposed to branch: f/centos76
Review: https://review.openstack.org/626688

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-integ (f/centos76)
Download full text (7.1 KiB)

Reviewed: https://review.openstack.org/626688
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=26cb76275997f060064d439e050384bead77b21b
Submitter: Zuul
Branch: f/centos76

commit acc1863b269fa974cd6c19b31c224dd88154e09d
Author: zhipengl <email address hidden>
Date: Tue Dec 11 00:30:56 2018 +0800

    Refactor source code patches for dhcp package

    3 source patches can be removed.
    2 patches adds support for wrs_install_uuid in the dhclient script.
    This added script part just copy the whole content of dhclient-enter-hooks.
    Following this script part, it will call this hook script if the hook
    exist under /etc/. However, our hook file existed in /etc/dhcp/ folder will
    be called by sbin/dhclient-script as well. I'd like to use dhcp config
    package to creat /etc/dhclient-enter-hooks soft linked to
    /etc/dhcp/dhclient-enter-hooks, so that it can call dhclient script and
    no need to add this 2 patches.

    Support-disable-nsupdate.patch can be removed as we already fixed port
    conflict issue in https://review.openstack.org/#/c/622711/

    Deployment test pass and related script file check pass!

    Story: 2004473
    Task: 28164

    Change-Id: If50ae697062a7d0c8a2831fbcc0f5641aaa41ec7
    Signed-off-by: zhipengl <email address hidden>

commit 61b8055a14f61851b9f70c76849bbb4f8f28ed55
Author: Steven Webster <email address hidden>
Date: Mon Dec 17 12:22:48 2018 -0500

    Fix remote logging traffic control filter priority

    Previous commit 01f5fdd made a required change to filter
    infrastructure traffic on the management interface with an 802.1q
    protocol in the case of a consolidated interface.

    However, this has caused the remote logging tc script to have a
    failure. The script tries to install 'ip' protocol filters at the
    same priority as the 802.1q filters, which is rejected by the
    kernel.

    This commit detects a consolidated interface situation and bumps
    the priority of the remote logging tc filter priority on the
    management interface, similarly to what is done in the main
    cgcs_tc_setup script.

    The file has also been cleaned up to pass bashate.

    Related-Bug: #1807055
    Change-Id: Id11625c0f9bcbf109f574563ff284d4a36bc6377
    Signed-off-by: Steven Webster <email address hidden>

commit 4dd1d96eddc84433ee3f6cf6f61db5b71a2d3b4c
Author: zhipengl <email address hidden>
Date: Sat Dec 15 01:34:18 2018 +0800

    Fix SFTP service is not working issue

    The root cause is that sftp path in sshd_config is not right.
    It should be changed from /usr/libexec/sftp-server
    to /usr/libexec/openssh/sftp-server

    Verified in my deployment environment
    sftp can connect to controller.

    Closes-Bug: 1808054

    Change-Id: Ia8d00abc1f18bc3b46faadd87f8ed153a446b7b0
    Signed-off-by: zhipengl <email address hidden>

commit 43514ea7fbd18d518511a165b59c82b7e20ebd8d
Author: Kwan, Louie <email address hidden>
Date: Wed Dec 12 15:54:30 2018 -0500

    [Enhancement] Add system active alarms in collect logs

    Currently the collect tool does not c...

Read more...

tags: added: in-f-centos76
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.