Reducing 2M hugepages to 0 is allowed, but 'vm_hp_total_2M' value after unlock did not actually go to 0 (vm_hp_pending_2M remains in Pending 0 state)

Bug #1795066 reported by Wendy Mitchell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Tao Liu

Bug Description

Brief Description
-----------------
Reducing 2M hugepages to 0 is allowed, but 'vm_hp_total_2M' value after unlock did not actually go to 0 (vm_hp_pending_2M remains in Pending 0 state)

[After reducing 2M hugepages to 0 on a single numa node system, the sysinv-agent does not send memory audit report after the host is unlocked and enabled. As a result, the CLI/horizon does not display the updated memory information.]

Severity
--------
Standard

Steps to Reproduce
------------------
*This appears to be a problem where there is a single processor.
1G Huge pages was already 0 prior to the configuration change

1. Lock the host and reduce all Huge pages to 0 so only 4K pages exist
(~or continue to reduce incrementally)
2. unlock the host.

Although this would not be practical configuration change in a real deployment scenario, the change is accepted but does not work

Expected Behavior
------------------
Expected vm_hp_total_2M to go to 0 instead of eg. 24177 in this case (also should not have stayed in Pending 0' long after the host unlock operation.

Actual Behavior
----------------
The 4K pages appear to increase on the controller node for example to 13287168, but the '2M hugepages' (ie. vm_hp_total_2M) did not go to 0 but stays in Pending 0 long after the host is unlocked.
"Total: 24177 Pending: 0 (and Available: 24177)"

$ system host-memory-list controller-0
| processor | mem_tot | mem_platfo | mem_ava | hugepages(hp)_ | vs_hp_ | vs_hp_ | vs_hp_ | vm_total_ | vm_hp_ | vm_hp_avai | vm_hp_pending_2M | vm_hp_total_1G | vm_hp_avail_1G | vm_hp_pending_1G | vm_hp_use_1G |
| | al(MiB) | rm(MiB) | il(MiB) | configured | size(M | total | avail | 4K | total_ | l_2M | | | | | |
| | | | | | iB) | | | | 2M | | | | | | |
| 0 | 52878 | 11000 | 51854 | True | 1024 | 1 | 0 | 13287168 | 24177 | 24177 | 0 | 0 | 0 | 0 | True |

Reproducibility
---------------
Reproducible

System Configuration
--------------------
2 node (single processor)

Branch/Pull Time/Commit
-----------------------
Master as of date: 2018-06-11

Timestamp/Logs
--------------
see horizon.log (controller-1 is active controller, controller-0 was standby where the change was posted - 2M hugepages set to 0

2018-06-13 16:12:00,185 [INFO] horizon.operation_log: [admin a0319006b53c41248584f38265014321] [admin eaea21324a5643fb8487a7be36ffb1f5] [POST /admin/inventory/ 302] parameters:[{"action": "hostscontroller__lock__1", "csrfmiddlewaretoken": "HI9VFlV55BHDrBk4WpgEOjtFrr7ey9DD"}] message:[success: Locking Host: controller-0]
2018-06-13 16:12:26,639 [INFO] horizon.operation_log: [admin a0319006b53c41248584f38265014321] [admin eaea21324a5643fb8487a7be36ffb1f5] [POST /admin/inventory/1/updatememory/ 200] parameters:[{"vm_hugepages_nr_2M_three": "", "vm_hugepages_nr_2M": "0", "platform_memory": "11000", "vm_hugepages_nr_1G_two": "", "platform_memory_two": "", "platform_memory_three": "", "platform_memory_four": "", "host": "<Host: {'requires_reboot': 'N/A', 'ttys_dcd': False, 'subfunctions': [u'controller', u'compute'], 'bm_ip': None, 'updated_at': u'2018-06-13T16:12:07.022686+00:00', 'install_state': None, 'rootfs_device': u'/dev/disk/by-path/pci-0000:00:1f.2-ata-1.0', 'bm_username': None, 'id': 1, 'serialid': None, 'availability': <django.utils.functional.__proxy__ object at 0x452a410>, 'vim_progress_status': u'services-disabled', 'uptime': 91009, 'console': u'ttyS0,115200n8', 'uuid': u'54108ca3-cd05-48f5-8612-c7812f0e1574', 'mgmt_ip': u'abcd::3', 'config_status': None, 'hostname': u'controller-0', 'capabilities': {u'Personality': u'Controller-Standby'}, 'operational': <django.utils.functional.__proxy__ object at 0x452a210>, 'location': '', 'patch_state': <django.utils.functional.__proxy__ object at 0x452a550>, 'invprovision': u'provisioned', 'administrative': <django.utils.functional.__proxy__ object at 0x452a190>, 'personality': <django.utils.functional.__proxy__ object at 0x9fe2410>, 'patch_current': 'N/A', 'boot_device': u'/dev/disk/by-path/pci-0000:00:1f.2-ata-1.0', 'mgmt_mac': u'0c:c4:7a:97:20:55', 'subfunction_oper': <django.utils.functional.__proxy__ object at 0x452a210>, 'peers': '', 'task': u'', 'allow_insvc_patching': True, 'install_state_info': None, 'created_at': u'2018-06-12T14:14:31.964860+00:00', 'subfunction_avail': <django.utils.functional.__proxy__ object at 0x452a410>, 'install_output': u'text', 'bm_type': ''}>", "vm_hugepages_nr_1G_four": "", "host_id": "1", "vm_hugepages_nr_2M_two": "", "csrfmiddlewaretoken": "HI9VFlV55BHDrBk4WpgEOjtFrr7ey9DD", "vm_hugepages_nr_2M_four": "", "vm_hugepages_nr_1G_three": "", "vm_hugepages_nr_1G": "0"}] message:[success: Memory allocation has been successfully updated.]
2018-06-13 16:12:44,902 [INFO] openstack_dashboard.dashboards.admin.inventory.tables: Unlocked Host: "controller-0"
2018-06-13 16:12:44,904 [INFO] horizon.operation_log: [admin a0319006b53c41248584f38265014321] [admin eaea21324a5643fb8487a7be36ffb1f5] [POST /admin/inventory/ 302] parameters:[{"action": "hostscontroller__unlock__1", "csrfmiddlewaretoken": "HI9VFlV55BHDrBk4WpgEOjtFrr7ey9DD"}] message:[success: Unlocked Host: controller-0]

Tao Liu (tliu88)
Changed in starlingx:
assignee: nobody → Tao Liu (tliu88)
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Targeting stx.2019.03 as this is a very specific test scenario

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.2019.03 stx.metal
Changed in starlingx:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/607083

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/607083
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=0cd0d284a6f680960bf7e581393816875fb5020b
Submitter: Zuul
Branch: master

commit 0cd0d284a6f680960bf7e581393816875fb5020b
Author: Tao Liu <email address hidden>
Date: Mon Oct 1 20:06:45 2018 -0500

    Reducing 2M hugepages to 0 did not update the display

    After reducing 2M hugepages to 0 on a single numa node system,
    sysinv-agent does not send memory audit report after the host is
    unlocked. As a result, CLI/horizon does not display the updated
    memory information.

    This problem is due to the method that determines whether the hugepages
    have been allocated checks if the output of /proc/sys/vm/nr_hugepages
    is great than 0. The output is 0 in this case.

    This update uses the compute config completed flag to ensure the
    hugepages have been allocated by manifest.

    Closes-Bug: #1795066

    Change-Id: I5ab48fdc5864969b25f9745ac7de2f603670e095
    Signed-off-by: Tao Liu <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.