Cannot get ComputeNodeStat by DB utility of compute_node_get_all()

Bug #1224712 reported by Yong Feng
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Joe Cropper

Bug Description

When there is hypervisor gets removed, the compute_node_get_all() will not return stat for new added hypervisors.

In the following codes of compute_node_get_all() of nova/db/sqlalchemy, it assume all the record in compute_node_stats should have a matched compute node. However in current implementation of nova conductor API of compute_node_delete(), the records in compute_node_stats is not deleted. Therefore when a hypervisor gets removed, there is no node matching the record of compute_node_stats which belongs to the removed hypervisor in following codes. As a result, all the nodes will be set with 'stats' of [].

    # Join ComputeNode & ComputeNodeStat manually.
    # NOTE(msdubov): ComputeNode and ComputeNodeStat map 1-to-Many.
    # Running time is (asymptotically) optimal due to the use
    # of iterators (itertools.groupby() for ComputeNodeStat and
    # iter() for ComputeNode) - we handle each record only once.
    compute_nodes.sort(key=lambda node: node['id'])
    compute_nodes_iter = iter(compute_nodes)
    for nid, nsts in itertools.groupby(stats, lambda s: s['compute_node_id']):
        for node in compute_nodes_iter:
            if node['id'] == nid:
                node['stats'] = list(nsts)
                break
            else:
                node['stats'] = []

    return compute_nodes

We need enhance either nova conductor API to clean up all the record related with instance.

Tags: db scheduler
Yong Feng (fengyong-gm)
summary: Cannot get ComputeNodeStat by DB utility of compute_node_get_all()
- cannot get()
Joe Cropper (jwcroppe)
Changed in nova:
assignee: nobody → Joe Cropper (jwcroppe)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/46379

Yong Feng (fengyong-gm)
description: updated
Joe Cropper (jwcroppe)
Changed in nova:
assignee: Joe Cropper (jwcroppe) → nobody
Revision history for this message
Joe Cropper (jwcroppe) wrote :

Will try to take a look at this.

Revision history for this message
Joe Cropper (jwcroppe) wrote :

After further investigation, this does seem to behave appropriately. I've gone ahead and added some test cases to show that deleting either the compute node or service (two new test cases) does indeed cascade the deletes to the compute node stats.

Changed in nova:
assignee: nobody → Joe Cropper (jwcroppe)
Revision history for this message
Joe Cropper (jwcroppe) wrote :

Scratch my previous comment, I misunderstood what the compute_node_statistics call returned. After closer inspection, this does seem to be problematic. Continuing investigation.

Revision history for this message
Joe Cropper (jwcroppe) wrote :

Fix and test case has been submitted.

Thanks,
Joe

Joe Gordon (jogo)
Changed in nova:
importance: Undecided → Critical
milestone: none → havana-rc1
importance: Critical → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/46379
Committed: http://github.com/openstack/nova/commit/8b9d6f6fedbbd47932cd672f51d4db7031724e84
Submitter: Jenkins
Branch: master

commit 8b9d6f6fedbbd47932cd672f51d4db7031724e84
Author: Joe Cropper <email address hidden>
Date: Thu Sep 12 17:16:47 2013 -0500

    Prune node stats at compute node delete time

    This commit addresses the situation in which a compute node is deleted, but
    its compute node stats still remain in the database as **not** deleted. This
    causes ill side effects in compute_node_get_all when it's retrieving host stats
    as it doesn't expect there to be compute node stats for which there is no
    corresponding compute node (i.e., causing some nodes' stats to be empty).

    As such, when a compute node is deleted, its stats should also be implicitly
    deleted.

    The new test case that's been created fails without the code changes, which
    illustrates the problem that compute node stats are empty when they should
    not be.

    Also included is a simple DB migration script that will update old stats that
    were not marked soft-deleted as they should have been.

    Change-Id: Ief0f7cf1a506e71898b5a45a0513d34167432d67
    Closes-Bug: #1224712

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-rc1 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.