nova-scheduler exception when trying to use hugepages

Bug #1417201 reported by Chris Friesen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Sahid Orentino

Bug Description

I'm trying to make use of huge pages as described in "http://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-large-pages.html". I'm running nova kilo as of Jan 27th. The other openstack services are juno. Libvirt is 1.2.8.

I've allocated 10000 2MB pages on a compute node. "virsh capabilities" on that node contains:

    <topology>
      <cells num='2'>
        <cell id='0'>
          <memory unit='KiB'>67028244</memory>
          <pages unit='KiB' size='4'>16032069</pages>
          <pages unit='KiB' size='2048'>5000</pages>
          <pages unit='KiB' size='1048576'>1</pages>
...
        <cell id='1'>
          <memory unit='KiB'>67108864</memory>
          <pages unit='KiB' size='4'>16052224</pages>
          <pages unit='KiB' size='2048'>5000</pages>
          <pages unit='KiB' size='1048576'>1</pages>

I then restarted nova-compute, I set "hw:mem_page_size=large" on a flavor, and then tried to boot up an instance with that flavor. I got the error logs below in nova-scheduler. Is this a bug?

Feb 2 16:23:10 controller-0 nova-scheduler Exception during message handling: Cannot load 'mempages' in the base class
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/server.py", line 139, in inner
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher return func(*args, **kwargs)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/scheduler/manager.py", line 86, in select_destinations
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher filter_properties)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 67, in select_destinations
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher filter_properties)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 138, in _schedule
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher filter_properties, index=num)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/scheduler/host_manager.py", line 391, in get_filtered_hosts
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher hosts, filter_properties, index)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/filters.py", line 77, in get_filtered_objects
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher list_objs = list(objs)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/filters.py", line 43, in filter_all
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher if self._filter_one(obj, filter_properties):
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/scheduler/filters/__init__.py", line 27, in _filter_one
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher return self.host_passes(obj, filter_properties)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/scheduler/filters/numa_topology_filter.py", line 45, in host_passes
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher limits_topology=limits))
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/virt/hardware.py", line 1161, in numa_fit_instance_to_host
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher host_cell, instance_cell, limit_cell)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/virt/hardware.py", line 851, in _numa_fit_instance_cell
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher host_cell, instance_cell)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/virt/hardware.py", line 692, in _numa_cell_supports_pagesize_request
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher avail_pagesize = [page.size_kb for page in host_cell.mempages]
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/objects/base.py", line 72, in getter
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher self.obj_load_attr(name)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/nova/objects/base.py", line 507, in obj_load_attr
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher _("Cannot load '%s' in the base class") % attrname)
2015-02-02 16:23:10.746 37521 TRACE oslo.messaging.rpc.dispatcher NotImplementedError: Cannot load 'mempages' in the base class

As far as nova-compute, at the end of nova.virt.libvirt.driver.LibvirtDriver.get_available_resource() I've confirmed that data['numa_topology'] looks like this:

'{"nova_object.version": "1.2", "nova_object.changes": ["cells"], "nova_object.name": "NUMATopology", "nova_object.data": {"cells": [{"nova_object.version": "1.2", "nova_object.changes": ["cpu_usage", "memory_usage", "cpuset", "pinned_cpus", "siblings", "memory", "mempages", "id"], "nova_object.name": "NUMACell", "nova_object.data": {"cpu_usage": 0, "memory_usage": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "pinned_cpus": [], "siblings": [], "memory": 65457, "mempages": [{"nova_object.version": "1.0", "nova_object.changes": ["total", "size_kb", "used"], "nova_object.name": "NUMAPagesTopology", "nova_object.data": {"total": 16032069, "used": 0, "size_kb": 4}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.0", "nova_object.changes": ["total", "size_kb", "used"], "nova_object.name": "NUMAPagesTopology", "nova_object.data": {"total": 5000, "used": 0, "size_kb": 2048}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.0", "nova_object.changes": ["total", "size_kb", "used"], "nova_object.name": "NUMAPagesTopology", "nova_object.data": {"total": 1, "used": 0, "size_kb": 1048576}, "nova_object.namespace": "nova"}], "id": 0}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.2", "nova_object.changes": ["cpu_usage", "memory_usage", "cpuset", "pinned_cpus", "siblings", "memory", "mempages", "id"], "nova_object.name": "NUMACell", "nova_object.data": {"cpu_usage": 0, "memory_usage": 0, "cpuset": [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], "pinned_cpus": [], "siblings": [], "memory": 65536, "mempages": [{"nova_object.version": "1.0", "nova_object.changes": ["total", "size_kb", "used"], "nova_object.name": "NUMAPagesTopology", "nova_object.data": {"total": 16052224, "used": 0, "size_kb": 4}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.0", "nova_object.changes": ["total", "size_kb", "used"], "nova_object.name": "NUMAPagesTopology", "nova_object.data": {"total": 5000, "used": 0, "size_kb": 2048}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.0", "nova_object.changes": ["total", "size_kb", "used"], "nova_object.name": "NUMAPagesTopology", "nova_object.data": {"total": 1, "used": 0, "size_kb": 1048576}, "nova_object.namespace": "nova"}], "id": 1}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "nova"}'

I printed out str(host_topology) in NUMATopologyFilter.host_passes() and it gave:

Feb 2 17:07:43 controller-0 nova-scheduler host_topology: NUMATopology(cells=[NUMACell(UNKNOWN),NUMACell(1)])

Chris Friesen (cbf123)
description: updated
Changed in nova:
assignee: nobody → sahid (sahid-ferdjaoui)
importance: Undecided → High
Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

I was not able to reproduce the problem with trunk. Please reopen with more information of your environment if still present.

Changed in nova:
status: New → Incomplete
Revision history for this message
Chris Friesen (cbf123) wrote :
Download full text (3.7 KiB)

I was able to reproduce the problem with current devstack, using the local.conf file below. The nova-compute log contained the following, notice in particular the "Cannot load 'mempages' in the base class" logs.

2015-02-04 07:40:35.395 DEBUG oslo_concurrency.lockutils [-] Lock "compute_resources" acquired by "instance_claim" :: wai
ted 0.000s from (pid=25343) inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:430
2015-02-04 07:40:35.396 DEBUG nova.compute.resource_tracker [-] Memory overhead for 4096 MB instance; 0 MB fro
m (pid=25343) instance_claim /opt/stack/nova/nova/compute/resource_tracker.py:130
2015-02-04 07:40:35.398 AUDIT nova.compute.claims [-] [instance: 2de1b982-13e7-44e9-96ce-ab40ad7b975d] Attempting claim:
memory 4096 MB, disk 40 GB
2015-02-04 07:40:35.399 AUDIT nova.compute.claims [-] [instance: 2de1b982-13e7-44e9-96ce-ab40ad7b975d] Total memory: 1569
1 MB, used: 512.00 MB
2015-02-04 07:40:35.399 AUDIT nova.compute.claims [-] [instance: 2de1b982-13e7-44e9-96ce-ab40ad7b975d] memory limit: 23536.50 MB, free: 23024.50 MB
2015-02-04 07:40:35.399 AUDIT nova.compute.claims [-] [instance: 2de1b982-13e7-44e9-96ce-ab40ad7b975d] Total disk: 82 GB, used: 0.00 GB
2015-02-04 07:40:35.400 AUDIT nova.compute.claims [-] [instance: 2de1b982-13e7-44e9-96ce-ab40ad7b975d] disk limit not specified, defaulting to unlimited
2015-02-04 07:40:35.401 DEBUG oslo_concurrency.lockutils [-] Lock "compute_resources" released by "instance_claim" :: hel
d 0.005s from (pid=25343) inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:442
2015-02-04 07:40:35.402 DEBUG nova.compute.utils [-] [instance: 2de1b982-13e7-44e9-96ce-ab40ad7b975d] Cannot load 'mempages' in the base class from (pid=25343) notify_about_instance_usage /opt/stack/nova/nova/compute/utils.py:324
2015-02-04 07:40:35.402 DEBUG nova.compute.manager [-] [instance: 2de1b982-13e7-44e9-96ce-ab40ad7b975d] Build of instance 2de1b982-13e7-44e9-96ce-ab40ad7b975d was re-scheduled: Cannot load 'mempages' in the base class from (pid=25343) _do_build_and_run_instance /opt/stack/nova/nova/compute/manager.py:2080

The local.conf file for devstack looked like this:

[[local|localrc]]
HOST_IP=192.168.100.249
FLOATING_RANGE=192.168.100.33/27
FIXED_RANGE=10.11.12.0/24
FIXED_NETWORK_SIZE=256
FLAT_INTERFACE=eth0
NETWORK_GATEWAY=10.11.12.1
PUBLIC_NETWORK_GATEWAY=192.168.100.33

ADMIN_PASSWORD=admin
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
SERVICE_TOKEN=a682f596-76f3-11e3-b3b2-e716f9080d50
SCREEN_LOGDIR=$DEST/logs/screen

NOVA_BRANCH=master
CEILOMETER_BRANCH=origin/stable/juno
CINDER_BRANCH=origin/stable/juno
GLANCE_BRANCH=origin/stable/juno
HEAT_BRANCH=origin/stable/juno
HORIZON_BRANCH=origin/stable/juno
IRONIC_BRANCH=origin/stable/juno
KEYSTONE_BRANCH=origin/stable/juno
NEUTRON_BRANCH=origin/stable/juno
NEUTRON_FWAAS_BRANCH=origin/stable/juno
NEUTRON_LBAAS_BRANCH=origin/stable/juno
NEUTRON_VPNAAS_BRANCH=origin/stable/juno
SAHARA_BRANCH=origin/stable/juno
SWIFT_BRANCH=origin/stable/juno
TROVE_BRANCH=origin/stable/juno

"virsh capabilities" contains:
    <pages unit='KiB' size='4'/>
    <pages unit='KiB' size='...

Read more...

Changed in nova:
status: Incomplete → New
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/152930

Changed in nova:
status: New → In Progress
Revision history for this message
Chris Friesen (cbf123) wrote :
Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

Thanks for the report and confirm Chris... They are now waiting for review.

Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Jay Pipes (jaypipes)
Changed in nova:
assignee: Jay Pipes (jaypipes) → sahid (sahid-ferdjaoui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/152930
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=67e11d45c43facb1b0aab718b48aa8e2f7d3f161
Submitter: Jenkins
Branch: master

commit 67e11d45c43facb1b0aab718b48aa8e2f7d3f161
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Feb 4 05:39:20 2015 -0500

    objects: fix numa obj relationships

    The attributes used in obj_relationships for mempages
    and cells are not correctly set. This commit fix the
    errors.

    Related-Bug: #1417201
    Change-Id: I0ae2db6d1cc14787f2d6b4b41047e51a5de61ed8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/152931
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6b32c16329dbae332d4f52a9c14e41a24ac07ea1
Submitter: Jenkins
Branch: master

commit 6b32c16329dbae332d4f52a9c14e41a24ac07ea1
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Feb 4 09:31:18 2015 -0500

    hardware: fix reported host mempages in numa cell

    In commit b11dbfa4902cdd74bad3745db177d80b1c8b07c6 we lost
    the mempages information when no guests are using huge pages

    Closes-Bug: #1417201
    Change-Id: Id0871bf08e8ba43b386c1565e73bf2cb3f6a3a9d

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.