stx-openstack fails to apply - libvirt pod error

Bug #1844576 reported by Cristopher Lemus
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Tao Liu

Bug Description

Brief Description
-----------------
Application stx-openstack fails to apply due errors on pod libvirt-libvirt-default-XXXXX

Severity
--------
Major: Cannot complete the setup of starlingx

Steps to Reproduce
------------------
Follow up either wiki or docs procedure. Error appears during "Bring up services" stage, on command `system application-apply stx-openstack`

Expected Behavior
------------------
system application-apply stx-openstack --> Application stx-openstack is successfully applied.

Actual Behavior
----------------
Application fails to apply due an error on pod libvirt-libvirt-default-XXXXX

Reproducibility
---------------
100% reproducible on baremetal. Not related to virtual.

System Configuration
--------------------
Baremetal only. Simplex, Duplex, Standard 2+2, Standard 2+2+2

Branch/Pull Time/Commit
-----------------------
BUILD_ID="20190918T013000Z"

Last Pass
---------
This error didn't appeared on build: "20190915T230000Z"

Timestamp/Logs
--------------
http://paste.openstack.org/show/777448/

Full collect attached from a simplex system, same behavior appearing across all configurations.

Test Activity
-------------
Sanity

Revision history for this message
Cristopher Lemus (cjlemusc) wrote :
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

As an additional comment, probably this is related. There is a commit: https://opendev.org/starlingx/config/commit/fdb2159953eb6bf03ff8408caac0ba3f1c621f18 that make changes on huge pages.

Also recently, we added the following steps to our automation:
system host-memory-modify -f vswitch -1G 1 ${host} 0
system host-memory-modify -f vswitch -1G 1 ${host} 1
system host-memory-modify -1G 10 ${host} 0 True
system host-memory-modify -1G 10 ${host} 1 True

Are these commands still required? Need to be adjusted? However, the system is showing free hugepages, and also free memory.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / high priority as this is failing sanity.
Assigning to Tao to investigate; perhaps the steps need to be adjusted based on the commit above

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Tao Liu (tliu88)
tags: added: stx.3.0 stx.config
Revision history for this message
Tao Liu (tliu88) wrote :

The libvirt helm chart assumes that the default hugepage size is mounted at /dev/hugepages.

The default size mounted at /dev/hugepages was fixed to 2M, while the kernel default huge page size was set to 1G on Baremetal, as a result the libvert pod failed to start since it was unable to write to the hugepage mount.

The fix is to change the default hugepag size mounted at /dev/hugepages to be the same as the kernel setting.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/683148

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/683148
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=5545195b1a5c4d4b6536ace3f96121bfb08df1c2
Submitter: Zuul
Branch: master

commit 5545195b1a5c4d4b6536ace3f96121bfb08df1c2
Author: Tao Liu <email address hidden>
Date: Thu Sep 19 08:02:32 2019 -0400

    Change the page size mounted at /dev/hugepages

    The libvirt helm chart assumes that the default hugepage
    size is mounted at /dev/hugepages.

    The default size mounted at /dev/hugepages was fixed to 2M,
    while the kernel default huge page size was set to 1G on
    Baremetal, as a result the libvert pod failed to start since
    it was unable to write to the hugepage mount.

    This update changes the page size mounted at /dev/hugepages
    to be the same as the kernel default hugepage size.

    Change-Id: Icc0326b99338ca7c06b113e6991f01b838030aca
    Closes-Bug: 1844576
    Signed-off-by: Tao Liu <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Tao, thanks for fixing this bug. I confirm that libvirt pod is properly running:

openstack libvirt-libvirt-default-6cdl9 1/1 Running 0 10h

Application was successfully applied:
| stx-openstack | 1.0-18-centos-stable- | armada-manifest | stx-openstack.yaml | applied | completed |

Sanity report will be sent, however I anticipate that Baremetal will be back to green status.

Thanks again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers