error messages popped up when changing huge page settings

Bug #1797187 reported by mhg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Tao Liu

Bug Description

Brief Description
-----------------
When modify the memory setting of a processor of a compute node (processor 0 of compute-0 in this test) using GUI by reducing 512 2M-page and adding 1 1G-page, a error message window popped up, showing
'Error: Processor 1: No available space for ...'

Severity
--------
Minor

Steps to Reproduce
------------------
1 login Horizon dashboard as 'admin'
1 lock the compute node to test (compute-0 in this test)
2 select page Adim -> Inventory -> Memory -> Update Memory
    a. reduce 512 from the number of 2M-page
    b. add 1 1G-page
   Save the changes

Expected Behavior
------------------
The changes were accepted without any errors

Actual Behavior
----------------
The changes were accepted but a error message popped up and showing:
'Error: Processor 1: No available space for 2M huge page allocation, max 2M pages: 30349'

Other than the error messages, the changes were actually made to the system; the compute node was able to be unlocked to 'unlocked' + 'available' status; a VM using 1G page was able to be launched on the compute node.

Reproducibility
---------------
yes

System Configuration
--------------------
Multi-node system

Branch/Pull Time/Commit
-----------------------
r/2018.10 as of 2018-10-09_01-52-01

Timestamp/Logs
--------------
2018-10-10 09:34:00

Ghada Khalil (gkhalil)
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Targeting stx.2019.03 as this is not a common use-case

Changed in starlingx:
assignee: nobody → Tao Liu (tliu88)
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.2019.03 stx.metal
Revision history for this message
Erich Cordoba (ericho) wrote :

Thanks for the report.

This could be an issue that can be solved by configuring your compute's kernel command line.

Could you please share the output of /proc/cmdline and cat /proc/meminfo | grep Huge from the compute machine?

Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Revision history for this message
Abraham Arce (xe1gyq) wrote :

The "Steps To Reproduce" has been executed in a Bare Metal Dedicated Storage 2+3+2 having both, the same error message as reported and without the error message, depending on the value and CPU core. The system used to initially test was used for CEPH validation, another system is being installed to repeat the steps and give more clarity in the process followed and the results obtained.

For specific details about the process followed including the requests from Erich:
https://github.com/xe1gyq/starlingx/blob/master/bugs/1797187.md

The high level overview of tasks done are:

Bare Metal Overall Huge Pages Information

- From controller-1, look at the number of hosts and memory information for compute-0 via host-memory-list
- From compute-0, look at the memory information via kernel boot arguments (/etc/default/grub /proc/cmdline) and memory information (/proc/meminfo)

Bare Metal 2M Hugepages

- Get default values for compute-0 2M Hugepages via host-memory-list
- Modify to default values for compute-0 2M Hugepages via host-memory-modify:

$ system host-memory-modify compute-0 0 -2M 42477 [Fail]
Processor 0:No available space for 1G vswitch huge page allocation, max 1G vswitch pages: 0
$ system host-memory-modify compute-0 1 -2M 43232 [Ok]

- Decreasing to 512 2M huge pages

$ system host-memory-modify compute-0 0 -2M 512 [Ok]
$ system host-memory-modify compute-0 1 -2M 512 [Ok]

- Going back to default 2M huge pages values

$ system host-memory-modify compute-0 0 -2M 42477 [Fail]
Processor 0:No available space for 1G vswitch huge page allocation, max 1G vswitch pages: 0
$ system host-memory-modify compute-0 1 -2M 43232 [Ok]

Bare Metal 1G Hugepages

- From controller-0, reserve at least one 1G hugepage

$ system host-memory-modify compute-0 0 -1G 1 [Fail]
Processor 0:No available space for new VM hugepage settings.Max 1G pages is 0 when 2M is 42477, or Max 2M pages is 39966 when 1G is 5.

- Reboot, modify bootargs to include: hugepagesz=1G hugepages=2
- reserve at least one 1G hugepage:

$ system host-memory-modify compute-0 0 -1G 1 [Ok]
$ system host-memory-modify compute-0 0 -1G 2 [Ok]

- Verify 1G values

$ system host-memory-show compute-0 0
| Application Huge Pages (1G): Total | 0
| Total Pending | 2
| Available | 0

$ system host-memory-modify compute-0 0 -1G 10 [Ok]

Will get back with more information...

Revision history for this message
Abraham Arce (xe1gyq) wrote :

This was tested again in a brand new deployment:

  BUILD_ID="20190523T013000Z"
  Configuration: Bare Metal 2 + 2 + 2

The message is seen only when the number of huge pages required, exceeds the limit as in the following commands:

  [wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-modify compute-0 0 -2M 42482
  Processor 0:No available space for 2M VM huge page allocation, max 2M VM pages: 42204

  [wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-modify compute-0 0 -2M 42204
  Processor 0:No available space for 1G vswitch huge page allocation, max 1G vswitch pages: 0

It suggests a number but the 1G vswitch is not considered so by getting the 1G vswitch 1 huge page number into the equation is possible to get the max 2M / 1G VM pages that can be allocated.

      max 2M VM pages 42204 * 2048 = 86433792 < compute-0 memory
      compute-0 memory 86433792 − 1G huge page size 1048576 = 85385216 < compute-0 memory available
      compute-0 memory available 85385216 * 2048 = 41692 < max 2M VM pages

  [wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-modify compute-0 0 -2M 41692

Full details:
  https://github.com/xe1gyq/starlingx/edit/master/bugs/1797187.md

Revision history for this message
Tao Liu (tliu88) wrote :

Most of the reported error messages have been eliminated by my patch which was integrated to https://review.opendev.org/#/c/667811/.

There are two scenarios that still triggers unwanted error messages. One, when the vswitch huge page size changes, the total request memory is calculated using the previous vswitch page size. Two, when the platform reserved memory is decreased to allow additional VM huge pages allocation, an error message is displayed which hinders allocation of additional VM huge pages. This is due to a check against the VM huge pages possible setting that is calculated using the previous platform reserved value. This check was removed from my original patch, but was added back in the final code submission.

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/671553

Revision history for this message
Tao Liu (tliu88) wrote :

A clarification on the following test steps:
1. reduce 512 from the number of 2M-page
2. add 1 1G-page
Expect the changes are accepted without any errors.

However, there are few exceptions that might still be an error message, prompting the user that “No available space for …”. This might happen, because the semantic check uses the current available memory to determine, whether the requested pages exceeded the current available. The existing allocation however, is based on the available memory prior to the last reboot. The host available memory might have changed after reboot.

The error message should tell you how many 2M or 1G is supported based on the total requested pages vs how much are available. You should be able to use this suggested number to make the changes.

In the future, we will change the user requested page, from a number to % of the available memory in order to remedy this issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/671553
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=16eb0ce4e0492a56458537d8d5205227e79e80c5
Submitter: Zuul
Branch: master

commit 16eb0ce4e0492a56458537d8d5205227e79e80c5
Author: Tao Liu <email address hidden>
Date: Thu Jul 18 14:59:22 2019 -0400

    Fixed unwanted error message when changing huge pages

    Most of the unnecessary error messages have been eliminated by
    a recent update: https://review.opendev.org/#/c/667811/.

    There are two scenarios that still triggers error messages.
    One, when the vswitch huge page size changes, the requested vswitch
    memory is calculated using the previous vswitch page size.
    Two, when the platform reserved memory is decreased to allow
    additional VM huge pages allocation, an error message is displayed
    which hinders allocation of additional VM huge pages. This is due
    to a check against the VM huge pages possible setting that is
    calculated using the previous platform reserved value.
    This check was removed from my original patch,
    which was integrated to https://review.opendev.org/#/c/667811/,
    but was added back in the final code submission.

    This update fixes the above two error scenarios.

    Closes-Bug: 1797187

    Change-Id: I2383e0d949e0af1c86e2546e63d6cc3a8a693175
    Signed-off-by: Tao Liu <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

Verified on 2019-08-12_20-59-00

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.