Reserve host pages on compute nodes

Bug #1543149 reported by Sahid Orentino
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Sahid Orentino

Bug Description

In some use cases we may want to avoid Nova to use an amount of hugepages in compute nodes. (example when using ovs-dpdk). We should to provide an option 'reserved_memory_pages' which provides way to determine amount of pages we want to reserved for third part components

Tags: libvirt numa
Changed in nova:
assignee: nobody → sahid (sahid-ferdjaoui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/277422

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/277422
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=70604db8dacd1d8ea8a054a9f548b24dcffc292c
Submitter: Jenkins
Branch: master

commit 70604db8dacd1d8ea8a054a9f548b24dcffc292c
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Mon Feb 8 09:37:37 2016 -0500

    virt: reserved hugepages on compute host

    For some use cases we may need to reserved an amount of pages for
    third part components.

    This commit adds new option 'reserved_memory_pages' which takes a list
    of string format to select on which host NUMA node and from which
    pagesize we want to reserve a certain amount of pages.

    Change-Id: I9d4c07da3594847917c9dc67e6663717d9ab4ba2
    Closes-Bug: #1543149
    DocImpact: reserved_memory_pages

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Critical
importance: Critical → Wishlist
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

The reason this is a bug is because it makes a feature we've put in unusable for a set of our users.

We couldn't predict it when we designed the feature, but now that we have users that complain about it, we want to fix it.

It's not a coding error, but it's a miss-specification. This is a common class of bugs and really should not be treated differently.

The fix for it is wrong though, and the revert that explains why is here: https://review.openstack.org/292290

Changed in nova:
importance: Wishlist → High
Changed in nova:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/292499

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/292500

Matt Riedemann (mriedem)
tags: added: libvirt numa
tags: added: mitaka-rc-potential
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Not sure we can block RC1 to be cut for this one. That sounds a design miss, not really a regression.

tags: removed: mitaka-rc-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/nova 13.0.0.0rc1

This issue was fixed in the openstack/nova 13.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by sahid (<email address hidden>) on branch: master
Review: https://review.openstack.org/292500

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/292499
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d52ceaf269ae64575c48aa45002aa4fc5cfb2a86
Submitter: Jenkins
Branch: master

commit d52ceaf269ae64575c48aa45002aa4fc5cfb2a86
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Mon Feb 8 09:37:37 2016 -0500

    virt: reserved number of mempages on compute host

    Users need to mark as reserved some amount of pages for third party
    components.

    The most common use case for using huge/large pages is NFV. In the
    current state of that feature we can't guarantee the necessary amount
    of pages to allow OVS-DPDK to run properly on the compute node, which
    result in the instance failing to boot on a well selected
    compute-node. OVS-DPDK needs 1 GB hugepages reserved. Since Nova does
    not take into account that page reserved for OVS-DPDK it results in
    the process not being able to acquire the necessary memory which
    results in a failed boot.

    This commit adds a new option 'reserved_huge_pages' which takes a list
    of string format to select on which host NUMA nodes and from which
    pagesize we want to reserve a certain amount of pages. It also updates
    NUMAPageTopology to contain a reserved memory pages attribute, which
    helps compute the available pages size on host for scheduling/claiming
    resources.

    Change-Id: Ie04d6362a4e99dcb2504698fc831a366ba746b44
    Closes-Bug: #1543149

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Charlotte Han (hanrong) wrote :

@sahid (sahid-ferdjaoui)
reserved_huge_pages = node=0,size=2048,count=4

2016-06-02 18:56:04.521 CRITICAL nova [req-e9dd76d9-4a4b-4571-bb88-78d751f74274 None None] TypeError: value_type must be callable

2016-06-02 18:56:04.521 TRACE nova Traceback (most recent call last):
2016-06-02 18:56:04.521 TRACE nova File "/usr/bin/nova-compute", line 10, in <module>
2016-06-02 18:56:04.521 TRACE nova sys.exit(main())
2016-06-02 18:56:04.521 TRACE nova File "/opt/stack/nova/nova/cmd/compute.py", line 76, in main
2016-06-02 18:56:04.521 TRACE nova service.wait()
2016-06-02 18:56:04.521 TRACE nova File "/opt/stack/nova/nova/service.py", line 491, in wait
2016-06-02 18:56:04.521 TRACE nova _launcher.wait()
2016-06-02 18:56:04.521 TRACE nova File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 309, in wait
2016-06-02 18:56:04.521 TRACE nova status, signo = self._wait_for_exit_or_signal()
2016-06-02 18:56:04.521 TRACE nova File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 284, in _wait_for_exit_or_signal
2016-06-02 18:56:04.521 TRACE nova self.conf.log_opt_values(LOG, logging.DEBUG)
2016-06-02 18:56:04.521 TRACE nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2525, in log_opt_values
2016-06-02 18:56:04.521 TRACE nova _sanitize(opt, getattr(group_attr, opt_name)))
2016-06-02 18:56:04.521 TRACE nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2946, in __getattr__
2016-06-02 18:56:04.521 TRACE nova return self._conf._get(name, self._group)
2016-06-02 18:56:04.521 TRACE nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2567, in _get
2016-06-02 18:56:04.521 TRACE nova value = self._do_get(name, group, namespace)
2016-06-02 18:56:04.521 TRACE nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2604, in _do_get
2016-06-02 18:56:04.521 TRACE nova return convert(opt._get_from_namespace(namespace, group_name))
2016-06-02 18:56:04.521 TRACE nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2595, in convert
2016-06-02 18:56:04.521 TRACE nova self._substitute(value, group, namespace), opt)
2016-06-02 18:56:04.521 TRACE nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2671, in _convert_value
2016-06-02 18:56:04.521 TRACE nova return [opt.type(v) for v in value]
2016-06-02 18:56:04.521 TRACE nova File "/usr/lib/python2.7/site-packages/oslo_config/types.py", line 478, in __init__
2016-06-02 18:56:04.521 TRACE nova raise TypeError('value_type must be callable')
2016-06-02 18:56:04.521 TRACE nova TypeError: value_type must be callable
2016-06-02 18:56:04.521 TRACE nova

Revision history for this message
Charlotte Han (hanrong) wrote :
Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

Do you have oslo_config well updated ? The trace don't seem to be related to the change except the fact that the change is using oslo_config.Types which is something new.

BTW since [1] is not merged you will be not able to reserved pages.

[1] https://review.openstack.org/#/c/292500/

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/nova 14.0.0.0b1

This issue was fixed in the openstack/nova 14.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/292500
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fc96434a2c1d1f0319726d7e91be323825ef5e7a
Submitter: Jenkins
Branch: master

commit fc96434a2c1d1f0319726d7e91be323825ef5e7a
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Mon Mar 14 11:42:02 2016 -0400

    libvirt: handle reserved pages size

    Make libvirt handle reserved pages size

    Closes-Bug: #1543149
    Change-Id: I7c63c308296eb7f6e6282257d06f802410b8fd53

Revision history for this message
Ian Wells (ijw-ubuntu) wrote :

I have issues with the fix.

Firstly, and perhaps the most problematic point: it assumes the non-OpenStack consumption of pages ('reserved' pages, though they're not 'reserved' in the same sense as normal memory is 'reserved') is static. There's no reason to believe that's true. This isn't like the reserved memory in normal memory allocation, remember; that tracks system available memory as well, and this doesn't track available hugepages. In this instance it's quite possible to have other processes consuming hugepages over time, and it's also possible to configure the kernel to break up hugepages if necessary, so it's a lot more like normal memory than a countable resource such as number of cores or PCI devices.

Secondly, why calculate total-reserved in the scheduler and not on the compute host? The scheduler needs to know free pages, it doesn't need to know all the information returned. If it was told *only* the number of free pages I think we'd have more flexibility to improve the scheduler code and the compute node, and improve them independently, so it's more future proof. Telling it the number of reserved pages can't help it.

Aside from that, the total number of hugepages is not fixed, as the pre-patch version of the code assumes. It can be changed after initial deployment, but for some reason has been treated as a fixed feature of the system like the number of cores, sockets or CPU flags. Not a criticism of this bug but another flag pointing that a rethink might be in order.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 14.0.0.0b2

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

Hmm. I see that commit was reverted:
https://review.openstack.org/#/c/292290/

So should this bug be reopened?

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

nova]$ git log --pretty=format:"%H%x09%x09%ad%x09%s" | grep 'reserved hugepages'
7581f23804b641d9a98892d17bb5628daed2f811 Mon Mar 14 12:39:30 2016 +0000 Merge "Revert "virt: reserved hugepages on compute host""
56d12936d9cd1ec8f3cfa6be7edab072048503d1 Mon Mar 14 10:24:06 2016 +0000 Revert "virt: reserved hugepages on compute host"
4022d578ccc09e260ad82d684b7f61a6e291e43d Fri Mar 11 17:26:46 2016 +0000 Merge "virt: reserved hugepages on compute host"
70604db8dacd1d8ea8a054a9f548b24dcffc292c Mon Feb 8 09:37:37 2016 -0500 virt: reserved hugepages on compute host

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :
Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

Oh, looks like it actually was merged:
nova]$ git log --pretty=format:"%H%x09%x09%ad%x09%s" | grep 'mempages'
755c2cb0c76a4c642b2d89053b63a528ee79013c Fri May 13 10:56:37 2016 +0000 Merge "virt: reserved number of mempages on compute host"
d52ceaf269ae64575c48aa45002aa4fc5cfb2a86 Mon Feb 8 09:37:37 2016 -0500 virt: reserved number of mempages on compute host

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.