OpenStack Compute (nova)

Scheduler selects deleted baremetal nodes

Bug #1138184 reported by aeva black on 2013-03-01

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	aeva black	OpenStack Compute (nova) 2013.1 "grizzly"

Bug Description

When a baremetal node is deleted, the associated compute_node record stops receiving periodic updates (but is not actually deleted). However, the scheduler's ComputeFilter seems to be unaware of this and continues to try to assign Nova instances to the deleted node.

To reproduce, start devstack with the baremetal driver, enroll a node (nova baremetal-node-create ...), wait a minute for the PeriodicTask to update compute, then delete the node (nova baremetal-node-delete ...). Then try to launch an instance (nova boot ...) and observe the failure.

To see whether this was just a timeout issue, I left devstack running for many hours after deleting the baremetal node, as can be seen from the database records below (some columns snipped for brevity).

stack@ubuntu:~/devstack$ mysql nova -e 'select * from compute_nodes\G'
*************************** 1. row ***************************
          created_at: 2013-02-28 18:22:38
          updated_at: 2013-02-28 18:49:08
          deleted_at: NULL
                  id: 1
          service_id: 2
     hypervisor_type: baremetal
  hypervisor_version: 1
hypervisor_hostname: 653b6c79-35a1-4af8-99a5-edd62fe9625b
             deleted: 0

stack@ubuntu:~/devstack$ mysql nova_bm -e 'select * from bm_nodes where uuid="653b6c79-35a1-4af8-99a5-edd62fe9625b"\G'
*************************** 1. row ***************************
         created_at: 2013-02-28 18:22:04
         updated_at: 2013-02-28 18:48:25
         deleted_at: 2013-02-28 21:08:03
            deleted: 1
                 id: 1
      instance_uuid: NULL
registration_status: NULL
         task_state: deleted
               uuid: 653b6c79-35a1-4af8-99a5-edd62fe9625b
      instance_name: NULL

stack@ubuntu:~/devstack$ mysql -e 'select now()'
2013-03-01 16:51:34

Here is a snippet from n-schd in devstack when calling "nova boot". What I don't understand, and what seems to be causing this issue, is why the servicegroup API believes this compute_node is still up! Note the 'updated_at' value logged by servicegroup.api is recent, whereas in the db, it is much older.

2013-03-01 16:50:12.271 DEBUG nova.scheduler.filter_scheduler [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] Attempting to build 1 instance(s) from (pid=8693) schedule_run_instance /opt/stack/nova/nova/scheduler/filter_scheduler.py:75
2013-03-01 16:50:12.280 DEBUG nova.servicegroup.api [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] Check if the given member [{'binary': u'nova-compute', 'deleted': 0L, 'created_at': datetime.datetime(2013, 2, 28, 17, 40, 45), 'updated_at': datetime.datetime(2013, 3, 1, 16, 50, 2), 'report_count': 8172L, 'topic': u'compute', 'host': u'ubuntu', 'disabled': False, 'deleted_at': None, 'id': 2L}] is part of the ServiceGroup, is up from (pid=8693) service_is_up /opt/stack/nova/nova/servicegroup/api.py:93
2013-03-01 16:50:12.281 DEBUG nova.servicegroup.drivers.db [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] DB_Driver.is_up last_heartbeat = 2013-03-01 16:50:02 elapsed = 10.281252 from (pid=8693) is_up /opt/stack/nova/nova/servicegroup/drivers/db.py:68
2013-03-01 16:50:12.281 DEBUG nova.scheduler.filters.compute_filter [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] ComputeFilter: Service {'binary': u'nova-compute', 'deleted': 0L, 'created_at': datetime.datetime(2013, 2, 28, 17, 40, 45), 'updated_at': datetime.datetime(2013, 3, 1, 16, 50, 2), 'report_count': 8172L, 'topic': u'compute', 'host': u'ubuntu', 'disabled': False, 'deleted_at': None, 'id': 2L} is True from (pid=8693) host_passes /opt/stack/nova/nova/scheduler/filters/compute_filter.py:39

Tags:

aeva black (tenbrae) on 2013-03-01

Changed in nova:
importance:	Undecided → High
assignee:	nobody → Devananda van der Veen (devananda)
milestone:	none → grizzly-rc1

aeva black (tenbrae) on 2013-03-01

tags:

added: baremetal

Russell Bryant (russellb) on 2013-03-01

Changed in nova:
status:	New → Confirmed

Revision history for this message

aeva black (tenbrae) wrote on 2013-03-01:

I did some more digging, and it looks like the issue is that:
- the nova compute service is still running
- servicegroup/drivers/db.py starts a FixedIntervalLoopingCall for self._report_state when the service is started
- which continues to report its state
- which includes a reference to compute_node_ref['compute_node'][0], as returned from service_get_by_compute_host.
- and so that compute_node appears to be getting updated every 10 seconds, even when the compute driver knows it is dead.

I've roughly validated this by observing the following:
- adding additional baremetal nodes doesn't affect which compute_node is included in servicegroup _report_state RPC call
- deleting baremetal nodes also has no effect on the RPC call
- the scheduler continues to believe a compute_node is online when it is included in the _report_state RPC call, even if the compute driver knows otherwise, because its 'updated_at' value is never more than 10 seconds old.
- deleting the oldest compute_node causes _report_state to include the next-oldest compute_node at the next update interval.
- deleting all the compute_nodes results in the _report_state RPC call properly including no nodes.

I'm convinced at this point that there needs to be a way to inform Nova that a compute_node is dead/deleted, besides merely relying on the last update_at timestamp for the associated compute service. This should allow deployers to remove baremetal nodes from production without breaking the scheduler's ability to find available nodes.

An alternative (but I think more complex) solution would be for the service group API to understand that a compute service may have any number of compute nodes (not just 1) and then to track their status' distinctly.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-03-02: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/23333

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-03-04: Fix merged to nova (master)

Reviewed: https://review.openstack.org/23333
Committed: http://github.com/openstack/nova/commit/ac0f6eb063fc5a5c0a9410402ecf57fae1faf594
Submitter: Jenkins
Branch: master

commit ac0f6eb063fc5a5c0a9410402ecf57fae1faf594
Author: Devananda van der Veen <email address hidden>
Date: Fri Mar 1 14:05:35 2013 -0800

Compute manager should remove dead resources

    While most hypervisors return a single - and constant - value from
    driver.get_available_nodes, baremetal does not. When a node is deleted
    from the baremetal database, it is no longer returned from
    driver.get_available_nodes. However, Nova's compute_node record is not
    directly updated.

    This patch allows Compute Manager to detect missing nodes within
    update_available_resources. It then invokes resource_tracker to update
    the dead node and remove it from compute.

This in turn allows the ServiceGroup API to properly update the
servicegroup when a baremetal node is no longer in service.

Fixes bug 1138184

Change-Id: Icfff3f8e3099668806633a6a58a152b32ec8b49b

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2013-03-20

Changed in nova:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2013-04-04

Changed in nova:
milestone:	grizzly-rc1 → 2013.1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.