Comment 7 for bug 1654102

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/416822
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2f1245a56c238646a056a20caa543f2a254b83d7
Submitter: Jenkins
Branch: master

commit 2f1245a56c238646a056a20caa543f2a254b83d7
Author: Matt Riedemann <email address hidden>
Date: Wed Jan 4 22:06:23 2017 -0500

    Fix TypeError in _update_from_compute_node race

    A compute node record is created in the database from the
    _init_compute_node method in the resource tracker in the
    nova-compute service. It is created without the updated_at
    or free_disk_gb fields being set.

    Shortly after it's created, update_resource_stats() is called
    on the scheduler report client which calls compute_node.save()
    which calls the update_compute_node() method in the DB API which
    *always* updates the updated_at field even if nothing else
    was changed.

    So at this point, we have a compute node with updated_at set
    but not free_disk_gb. The free_disk_gb field gets set by the
    _update_usage_from_instances() method in the resource tracker
    and then that value is later saved off when the compute node
    is updated in the database on a second call to the
    update_resource_stats() method in the scheduler report client.
    At that point the free_disk_gb field is set on the compute
    node record.

    There is a race in between the compute node create and initial
    update but before free_disk_gb is set where the scheduler host
    manager can be attempting to update the HostState object from
    the same compute node when selecting host destinations during
    a server build. If that picks up the compute node before its
    free_disk_gb field is set it will result in a TypeError when
    trying to multiply None * 1024.

    Change 36a0ba9c8141b445f2c6bfc093fde4cc98d229b2 was an earlier
    attempt at fixing this bug and shortened the race window but
    didn't handle the problem where updated_at is set but free_disk_gb
    is not yet.

    This change builds on that by simply checking for the thing
    the scheduler host manager actually cares about, which is the
    free_disk_gb field.

    Closes-Bug: #1654102
    Closes-Bug: #1610679

    Change-Id: I37b75dabb3ea7ec2d5678550d9aff30b1a0c29e6