Scheduler selects deleted baremetal nodes

Bug #1138184 reported by aeva black
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
aeva black

Bug Description

When a baremetal node is deleted, the associated compute_node record stops receiving periodic updates (but is not actually deleted). However, the scheduler's ComputeFilter seems to be unaware of this and continues to try to assign Nova instances to the deleted node.

To reproduce, start devstack with the baremetal driver, enroll a node (nova baremetal-node-create ...), wait a minute for the PeriodicTask to update compute, then delete the node (nova baremetal-node-delete ...). Then try to launch an instance (nova boot ...) and observe the failure.

To see whether this was just a timeout issue, I left devstack running for many hours after deleting the baremetal node, as can be seen from the database records below (some columns snipped for brevity).

stack@ubuntu:~/devstack$ mysql nova -e 'select * from compute_nodes\G'
*************************** 1. row ***************************
          created_at: 2013-02-28 18:22:38
          updated_at: 2013-02-28 18:49:08
          deleted_at: NULL
                  id: 1
          service_id: 2
     hypervisor_type: baremetal
  hypervisor_version: 1
 hypervisor_hostname: 653b6c79-35a1-4af8-99a5-edd62fe9625b
             deleted: 0

stack@ubuntu:~/devstack$ mysql nova_bm -e 'select * from bm_nodes where uuid="653b6c79-35a1-4af8-99a5-edd62fe9625b"\G'
*************************** 1. row ***************************
         created_at: 2013-02-28 18:22:04
         updated_at: 2013-02-28 18:48:25
         deleted_at: 2013-02-28 21:08:03
            deleted: 1
                 id: 1
      instance_uuid: NULL
registration_status: NULL
         task_state: deleted
               uuid: 653b6c79-35a1-4af8-99a5-edd62fe9625b
      instance_name: NULL

stack@ubuntu:~/devstack$ mysql -e 'select now()'
  2013-03-01 16:51:34

Here is a snippet from n-schd in devstack when calling "nova boot". What I don't understand, and what seems to be causing this issue, is why the servicegroup API believes this compute_node is still up! Note the 'updated_at' value logged by servicegroup.api is recent, whereas in the db, it is much older.

2013-03-01 16:50:12.271 DEBUG nova.scheduler.filter_scheduler [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] Attempting to build 1 instance(s) from (pid=8693) schedule_run_instance /opt/stack/nova/nova/scheduler/filter_scheduler.py:75
2013-03-01 16:50:12.280 DEBUG nova.servicegroup.api [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] Check if the given member [{'binary': u'nova-compute', 'deleted': 0L, 'created_at': datetime.datetime(2013, 2, 28, 17, 40, 45), 'updated_at': datetime.datetime(2013, 3, 1, 16, 50, 2), 'report_count': 8172L, 'topic': u'compute', 'host': u'ubuntu', 'disabled': False, 'deleted_at': None, 'id': 2L}] is part of the ServiceGroup, is up from (pid=8693) service_is_up /opt/stack/nova/nova/servicegroup/api.py:93
2013-03-01 16:50:12.281 DEBUG nova.servicegroup.drivers.db [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] DB_Driver.is_up last_heartbeat = 2013-03-01 16:50:02 elapsed = 10.281252 from (pid=8693) is_up /opt/stack/nova/nova/servicegroup/drivers/db.py:68
2013-03-01 16:50:12.281 DEBUG nova.scheduler.filters.compute_filter [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] ComputeFilter: Service {'binary': u'nova-compute', 'deleted': 0L, 'created_at': datetime.datetime(2013, 2, 28, 17, 40, 45), 'updated_at': datetime.datetime(2013, 3, 1, 16, 50, 2), 'report_count': 8172L, 'topic': u'compute', 'host': u'ubuntu', 'disabled': False, 'deleted_at': None, 'id': 2L} is True from (pid=8693) host_passes /opt/stack/nova/nova/scheduler/filters/compute_filter.py:39

Tags: baremetal
aeva black (tenbrae)
Changed in nova:
importance: Undecided → High
assignee: nobody → Devananda van der Veen (devananda)
milestone: none → grizzly-rc1
aeva black (tenbrae)
tags: added: baremetal
Changed in nova:
status: New → Confirmed
Revision history for this message
aeva black (tenbrae) wrote :

I did some more digging, and it looks like the issue is that:
- the nova compute service is still running
- servicegroup/drivers/db.py starts a FixedIntervalLoopingCall for self._report_state when the service is started
- which continues to report its state
- which includes a reference to compute_node_ref['compute_node'][0], as returned from service_get_by_compute_host.
- and so that compute_node appears to be getting updated every 10 seconds, even when the compute driver knows it is dead.

I've roughly validated this by observing the following:
- adding additional baremetal nodes doesn't affect which compute_node is included in servicegroup _report_state RPC call
- deleting baremetal nodes also has no effect on the RPC call
- the scheduler continues to believe a compute_node is online when it is included in the _report_state RPC call, even if the compute driver knows otherwise, because its 'updated_at' value is never more than 10 seconds old.
- deleting the oldest compute_node causes _report_state to include the next-oldest compute_node at the next update interval.
- deleting all the compute_nodes results in the _report_state RPC call properly including no nodes.

I'm convinced at this point that there needs to be a way to inform Nova that a compute_node is dead/deleted, besides merely relying on the last update_at timestamp for the associated compute service. This should allow deployers to remove baremetal nodes from production without breaking the scheduler's ability to find available nodes.

An alternative (but I think more complex) solution would be for the service group API to understand that a compute service may have any number of compute nodes (not just 1) and then to track their status' distinctly.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/23333

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/23333
Committed: http://github.com/openstack/nova/commit/ac0f6eb063fc5a5c0a9410402ecf57fae1faf594
Submitter: Jenkins
Branch: master

commit ac0f6eb063fc5a5c0a9410402ecf57fae1faf594
Author: Devananda van der Veen <email address hidden>
Date: Fri Mar 1 14:05:35 2013 -0800

    Compute manager should remove dead resources

    While most hypervisors return a single - and constant - value from
    driver.get_available_nodes, baremetal does not. When a node is deleted
    from the baremetal database, it is no longer returned from
    driver.get_available_nodes. However, Nova's compute_node record is not
    directly updated.

    This patch allows Compute Manager to detect missing nodes within
    update_available_resources. It then invokes resource_tracker to update
    the dead node and remove it from compute.

    This in turn allows the ServiceGroup API to properly update the
    servicegroup when a baremetal node is no longer in service.

    Fixes bug 1138184

    Change-Id: Icfff3f8e3099668806633a6a58a152b32ec8b49b

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-rc1 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.