stx-openstack fails to apply - libvirt pod error

Bug #1844576 reported by Cristopher Lemus
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Tao Liu

Bug Description

Brief Description
-----------------
Application stx-openstack fails to apply due errors on pod libvirt-libvirt-default-XXXXX

Severity
--------
Major: Cannot complete the setup of starlingx

Steps to Reproduce
------------------
Follow up either wiki or docs procedure. Error appears during "Bring up services" stage, on command `system application-apply stx-openstack`

Expected Behavior
------------------
system application-apply stx-openstack --> Application stx-openstack is successfully applied.

Actual Behavior
----------------
Application fails to apply due an error on pod libvirt-libvirt-default-XXXXX

Reproducibility
---------------
100% reproducible on baremetal. Not related to virtual.

System Configuration
--------------------
Baremetal only. Simplex, Duplex, Standard 2+2, Standard 2+2+2

Branch/Pull Time/Commit
-----------------------
BUILD_ID="20190918T013000Z"

Last Pass
---------
This error didn't appeared on build: "20190915T230000Z"

Timestamp/Logs
--------------
http://paste.openstack.org/show/777448/

Full collect attached from a simplex system, same behavior appearing across all configurations.

Test Activity
-------------
Sanity

Revision history for this message
Cristopher Lemus (cjlemusc) wrote :
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

As an additional comment, probably this is related. There is a commit: https://opendev.org/starlingx/config/commit/fdb2159953eb6bf03ff8408caac0ba3f1c621f18 that make changes on huge pages.

Also recently, we added the following steps to our automation:
system host-memory-modify -f vswitch -1G 1 ${host} 0
system host-memory-modify -f vswitch -1G 1 ${host} 1
system host-memory-modify -1G 10 ${host} 0 True
system host-memory-modify -1G 10 ${host} 1 True

Are these commands still required? Need to be adjusted? However, the system is showing free hugepages, and also free memory.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / high priority as this is failing sanity.
Assigning to Tao to investigate; perhaps the steps need to be adjusted based on the commit above

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Tao Liu (tliu88)
tags: added: stx.3.0 stx.config
Revision history for this message
Tao Liu (tliu88) wrote :

The libvirt helm chart assumes that the default hugepage size is mounted at /dev/hugepages.

The default size mounted at /dev/hugepages was fixed to 2M, while the kernel default huge page size was set to 1G on Baremetal, as a result the libvert pod failed to start since it was unable to write to the hugepage mount.

The fix is to change the default hugepag size mounted at /dev/hugepages to be the same as the kernel setting.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/683148

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/683148
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=5545195b1a5c4d4b6536ace3f96121bfb08df1c2
Submitter: Zuul
Branch: master

commit 5545195b1a5c4d4b6536ace3f96121bfb08df1c2
Author: Tao Liu <email address hidden>
Date: Thu Sep 19 08:02:32 2019 -0400

    Change the page size mounted at /dev/hugepages

    The libvirt helm chart assumes that the default hugepage
    size is mounted at /dev/hugepages.

    The default size mounted at /dev/hugepages was fixed to 2M,
    while the kernel default huge page size was set to 1G on
    Baremetal, as a result the libvert pod failed to start since
    it was unable to write to the hugepage mount.

    This update changes the page size mounted at /dev/hugepages
    to be the same as the kernel default hugepage size.

    Change-Id: Icc0326b99338ca7c06b113e6991f01b838030aca
    Closes-Bug: 1844576
    Signed-off-by: Tao Liu <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Tao, thanks for fixing this bug. I confirm that libvirt pod is properly running:

openstack libvirt-libvirt-default-6cdl9 1/1 Running 0 10h

Application was successfully applied:
| stx-openstack | 1.0-18-centos-stable- | armada-manifest | stx-openstack.yaml | applied | completed |

Sanity report will be sent, however I anticipate that Baremetal will be back to green status.

Thanks again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.