Simultaneous allocation of 1Gb and 2 M HugePages break VMs launching

Bug #1566293 reported by Ksenia Svechnikova
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Sergey Kolekonov
Mitaka
Fix Released
High
Ivan Berezovskiy
Newton
Fix Committed
High
Sergey Kolekonov

Bug Description

ISO 9.0 150 - version http://paste.openstack.org/show/492973/

Steps to reproduce:

    Prepare 2+1 nodes

[root@fuel ~]# fuel node --env 2
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|--------|------------------|---------|-----------|-------------------|-----------------|---------------|--------|---------
5 | ready | Untitled (58:06) | 2 | 10.20.0.6 | 0c:c4:7a:6c:58:06 | cinder, compute | | True | 2
2 | ready | Untitled (55:2c) | 2 | 10.20.0.5 | 0c:c4:7a:34:55:2c | cinder, compute | | True | 2
6 | ready | Untitled (53:8e) | 2 | 10.20.0.4 | 0c:c4:7a:34:53:8e | controller | | True | 2

    There are 4 NUMA nodes on compute node-2. Allocate Nova Huge pages: 2.0 MB - 2030 and 1.0 GB - 10
    Deploy env
    Verify HP are present on node-2
    Create an aggregate for compute nodes with huge pages and CPU pinning:

nova aggregate-create performance_3_cpu
nova aggregate-set-metadata performance_3_cpur hpgs=true
nova aggregate-set-metadata performance_3_cpu pinned=true

    Add hosts to them:

nova aggregate-add-host performance_3_cpu node-2.domain.tld

    Create new flavors for instances with hugepages for each HP on the hosts:

nova flavor-create h1.huge.hpgs auto 512 1 1
nova flavor-create h1.small.hpgs auto 1024 1 1

nova flavor-key h1.huge.hpgs set hw:mem_page_size=1048576
nova flavor-key h1.huge.hpgs set aggregate_instance_extra_specs:hpgs=true

nova flavor-key h1.small.hpgs set hw:mem_page_size=2048
nova flavor-key h1.small.hpgs set aggregate_instance_extra_specs:hpgs=true

    Add to the flavors requirement for the CPU pinning:

nova flavor-key h1.huge.hpgs set hw:cpu_policy=dedicated
nova flavor-key h1.huge.hpgs set aggregate_instance_extra_specs:pinned=true

nova flavor-key h1.small.hpgs set hw:cpu_policy=dedicated
nova flavor-key h1.small.hpgs set aggregate_instance_extra_specs:pinned=true

    Create instance with 2M flavor HP size
    Delete instace
    Create instance with 1Gb flavor size

Expected result:

   VMs are created

Actual result:

Vm with 1Gb is in Error state with "No valid host" error.

From libvirt log (node-2): qemuBuildNumaArgStr:6760 : internal error: Unable to find any usable hugetlbfs mount for 1048576 KiB

No mount point for 1Gb HugePages are present

summary: - Simultaneously allocation 1Gb and 2 M HugePages prevent launching VM
+ Simultaneous allocation of 1Gb and 2 M HugePages break VMs launching
description: updated
Revision history for this message
Arthur Svechnikov (asvechnikov) wrote :

From node-2:

 root@node-2:~# mount
 /dev/mapper/os-root on / type ext4 (rw,errors=panic)
 proc on /proc type proc (rw,noexec,nosuid,nodev)
 sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
 none on /sys/fs/cgroup type tmpfs (rw)
 none on /sys/fs/fuse/connections type fusectl (rw)
 none on /sys/kernel/debug type debugfs (rw)
 none on /sys/kernel/security type securityfs (rw)
 udev on /dev type devtmpfs (rw,mode=0755)
 devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
 tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
 none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
 none on /run/shm type tmpfs (rw,nosuid,nodev)
 none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
 none on /sys/fs/pstore type pstore (rw)
 /dev/sda3 on /boot type ext2 (rw)
 /dev/mapper/vm-nova on /var/lib/nova type xfs (rw)
 systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd)
 hugetlbfs-kvm on /run/hugepages/kvm type hugetlbfs (rw,mode=775,gid=114)
 none on /sys/kernel/config type configfs (rw)

Libvirt configuration file /etc/libvirt/qemu.com contains line:

 # If provided by the host and a hugetlbfs mount point is configured,
 # a guest may request huge page backing. When this mount point is
 # unspecified here, determination of a host mount point in /proc/mounts
 # will be attempted. Specifying an explicit mount overrides detection
 # of the same in /proc/mounts. Setting the mount point to "" will
 # disable guest hugepage backing. If desired, multiple mount points can
 # be specified at once, separated by comma and enclosed in square
 # brackets, for example:
 #
 # hugetlbfs_mount = ["/dev/hugepages2M", "/dev/hugepages1G"]
 #
 # The size of huge page served by specific mount point is determined by
 # libvirt at the daemon startup.
 #
 # NB, within these mount points, guests will create memory backing
 # files in a location of $MOUNTPOINT/libvirt/qemu
 #
 #hugetlbfs_mount = "/dev/hugepages"

 hugetlbfs_mount = "/run/hugepages/kvm"

Seems that there should be 2 mount points for 2M and 1G. However when hugetlbfs_mount is set to ["/run/hugepages/kvm_2M", "/run/hugepages/kvm_1G"], the same (/run/hugepages/kvm) mount point is created after reboot.

Revision history for this message
Ksenia Svechnikova (kdemina) wrote :
Revision history for this message
Atsuko Ito (yottatsa) wrote :

/run/hugepages/kvm is predefined for /etc/init*/qemu-kvm.

We're explicitly specifying hugetlbfs_mount in /etc/libvirt/qemu.conf to disable autodiscovery of mountpoints (that will lead to bug if dpdk is used).

Dmitry Klenov (dklenov)
Changed in fuel:
status: New → Confirmed
tags: added: team-telco
Revision history for this message
Arthur Svechnikov (asvechnikov) wrote :

Additional configuration should be applied:

Add creation of mount point to

file /usr/share/qemu/init/qemu-kvm-init

            mkdir -p /run/hugepages/kvm
            mount -t hugetlbfs hugetlbfs-kvm -o mode=775,gid=kvm /run/hugepages/kvm

changed to

            mkdir -p /run/hugepages/kvm
            mount -t hugetlbfs hugetlbfs-kvm -o mode=775,gid=kvm /run/hugepages/kvm
            mkdir -p /run/hugepages/kvm_1GB
            mount -t hugetlbfs hugetlbfs-kvm -o mode=775,gid=kvm,pagesize=1GB /run/hugepages/kvm_1GB

Grant the privilege to mount point

file /etc/apparmor.d/abstractions/libvirt-qemu

  # for access to hugepages
  owner "/run/hugepages/kvm/libvirt/qemu/**" rw,

changed to

  # for access to hugepages
  owner "/run/hugepages/kvm/libvirt/qemu/**" rw,
  owner "/run/hugepages/kvm_1GB/libvirt/qemu/**" rw,

Add mount point to libvirt config

file /etc/libvirt/qemu.conf

 hugetlbfs_mount = "/run/hugepages/kvm"

changed to

 hugetlbfs_mount = ["/run/hugepages/kvm", "/run/hugepages/kvm_1GB"]

Changed in fuel:
assignee: Fuel Telco (fuel-telco-team) → Sergey Kolekonov (skolekonov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/303580

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/303580
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=10c3d88b55fbcd4427c4cb39690af9e0ae60175b
Submitter: Jenkins
Branch: master

commit 10c3d88b55fbcd4427c4cb39690af9e0ae60175b
Author: Sergey Kolekonov <email address hidden>
Date: Fri Apr 8 16:11:47 2016 +0300

    Fix simultaneous usage of 1Gb and 2Mb huge pages

    Currently it's impossible to use 1Gb and 2Mb huge pages on one compute node
    as a helper script from qemu package allows to use only one mount point
    for huge pages, so 1 Gb huge pages support has to be managed explicitly.

    This patch adds the following steps:
    - create a mount point for 1 Gb huge pages and mount pages in runtime
    - create a persistent record in /etc/fstab to mount huge pages on boot
    - add related settings for libvirt
    - update apparmor settings to allow an additional mount point

    Due to LP #1560532 Nailgun doesn't provide information about allocated
    huge pages when 1 Gb huge pages are used, so this information is computed
    in a custom fact.

    Closes-bug: #1566293

    Change-Id: Iba421abdc354afa7d89f6f10a94c6ba3edb99148

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/305741

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/305741
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=c2b9eb320798357dc77557d638f995254edea2a9
Submitter: Jenkins
Branch: stable/mitaka

commit c2b9eb320798357dc77557d638f995254edea2a9
Author: Sergey Kolekonov <email address hidden>
Date: Fri Apr 8 16:11:47 2016 +0300

    Fix simultaneous usage of 1Gb and 2Mb huge pages

    Currently it's impossible to use 1Gb and 2Mb huge pages on one compute node
    as a helper script from qemu package allows to use only one mount point
    for huge pages, so 1 Gb huge pages support has to be managed explicitly.

    This patch adds the following steps:
    - create a mount point for 1 Gb huge pages and mount pages in runtime
    - create a persistent record in /etc/fstab to mount huge pages on boot
    - add related settings for libvirt
    - update apparmor settings to allow an additional mount point

    Due to LP #1560532 Nailgun doesn't provide information about allocated
    huge pages when 1 Gb huge pages are used, so this information is computed
    in a custom fact.

    Closes-bug: #1566293

    Change-Id: Iba421abdc354afa7d89f6f10a94c6ba3edb99148
    (cherry picked from commit 10c3d88b55fbcd4427c4cb39690af9e0ae60175b)

Revision history for this message
Mikhail Chernik (mchernik) wrote :

Verified on MOS 9.0 ISO 217, fixed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.