nova/compute/manager.py throws TypeError: 'NoneType' object is not iterable

Bug #1796981 reported by Chris DeVita
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Ocata
Triaged
Medium
Unassigned
Pike
Fix Committed
Medium
Matt Riedemann
Queens
Fix Committed
Medium
Matt Riedemann
Rocky
Fix Committed
Medium
Matt Riedemann

Bug Description

Problem Description: Nova evacuate <instance_id> <target_host_name> fails, instances reported in ERROR state after executing “nova evacuate”

Summary: A instance with its boot volume on a shared storage device ( HP 3Par or SUSE CEPH cluster ) errors out during a evacuate. By making a code change to catch the error dereferencing vol_stats in nova/compute/manager.py the error can be avoided.

Included:
* Steps to produce it
* nova evacuate with --debug
* Command line output showing the state of the VM before and after the evacuate
* code that was changed
* Command line output showing the successful state of the VM before and after the evacuate

Revision history for this message
Chris DeVita (cdev3) wrote :
tags: added: compute notifications
Revision history for this message
Matt Riedemann (mriedem) wrote :

Looking at what fails:

def _notify_volume_usage_detach(self, context, instance, bdm):
        if CONF.volume_usage_poll_interval <= 0:
            return

        vol_stats = []
        mp = bdm.device_name
        # Handle bootable volumes which will not contain /dev/
        if '/dev/' in mp:
            mp = mp[5:]
        try:
            vol_stats = self.driver.block_stats(instance, mp)
        except NotImplementedError:
            return

        LOG.debug("Updating volume usage cache with totals", instance=instance)
rd_req, rd_bytes, wr_req, wr_bytes, flush_ops = vol_stats

Clearly volume_usage_poll_interval has to be configured, which it's not by default. Which compute driver are you using that returns None but doesn't raise NotImplementedError?

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like the libvirt driver is the only driver that implements the block_stats() method, and it that method encounters an error it logs it and returns None:

https://github.com/openstack/nova/blob/6bf11e1dc14afad78b11d980c2544a3dc41579ff/nova/virt/libvirt/driver.py#L6364

Revision history for this message
Matt Riedemann (mriedem) wrote :

Do you see one of those log messages in the compute log before the NoneType failure?

Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Matt Riedemann (mriedem)
tags: added: libvirt
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/609518

Changed in nova:
status: Triaged → In Progress
Revision history for this message
Chris DeVita (cdev3) wrote :

based on the calls to Log.info in

https://github.com/openstack/nova/blob/6bf11e1dc14afad78b11d980c2544a3dc41579ff/nova/virt/libvirt/driver.py#L6364

cd /var/log/nova
grep -e "Updating volume usage cache with totals" -e "Getting block stats failed" -e "Could not find domain in libvirt" *.log

There are no entries found, Is this the correct text to be looking for ?

Revision history for this message
Chris DeVita (cdev3) wrote :

bump.. Is this the correct text to be looking for in the log files ?
"Updating volume usage cache with totals"
"Getting block stats failed"
"Could not find domain in libvirt"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/609518
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4da54c07861a1542a8e30a768d4506d3e81b5598
Submitter: Zuul
Branch: master

commit 4da54c07861a1542a8e30a768d4506d3e81b5598
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 10 15:07:00 2018 -0400

    Fix NoneType error in _notify_volume_usage_detach

    If the driver.block_stats() method returns None, like the
    libvirt driver will if the guest is gone during an evacuate,
    we'll get a NoneType error trying to unpack the return value
    from the driver. Instead, simply return as if the driver
    raised NotImplementedError.

    Since handling None is changing the contract on the virt
    driver API, the docstring is updated to explain the acceptable
    return values of the driver method.

    Change-Id: I98a2785c07f7af02ad83650c72d9e1868290ece4
    Closes-Bug: #1796981

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/611326

Revision history for this message
melanie witt (melwitt) wrote :

Chris, yes those are the correct text to look for in the log files. But you said you had none of those entries, so that means there was no error encountered while the libvirt driver was getting block stats. It implies that the domain.blockStats() libvirt python binding returned None, but I don't know why it would do that. This is the method that's called at the lower level:

https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockStats

Revision history for this message
melanie witt (melwitt) wrote :

Re-reading the commit message of the fix, indeed it's probably that the method will return None if the guest is gone during the evacuate.

We handle InstanceNotFound in the block_stats method:

https://github.com/openstack/nova/blob/6bf11e1dc14afad78b11d980c2544a3dc41579ff/nova/virt/libvirt/driver.py#L6364-L6381

But this is probably a race where we are able to get a guest reference to the instance on the host _before_ it's gone (so we don't get InstanceNotFound) but by the time we call dev.blockStats(), the guest is gone, so we get None returned.

Anyway, the merged patch should fix the issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/614868

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/614872

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/611326
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=940034c27c611b8b344f8371eb0458b16fb58f1d
Submitter: Zuul
Branch: stable/rocky

commit 940034c27c611b8b344f8371eb0458b16fb58f1d
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 10 15:07:00 2018 -0400

    Fix NoneType error in _notify_volume_usage_detach

    If the driver.block_stats() method returns None, like the
    libvirt driver will if the guest is gone during an evacuate,
    we'll get a NoneType error trying to unpack the return value
    from the driver. Instead, simply return as if the driver
    raised NotImplementedError.

    Since handling None is changing the contract on the virt
    driver API, the docstring is updated to explain the acceptable
    return values of the driver method.

    Change-Id: I98a2785c07f7af02ad83650c72d9e1868290ece4
    Closes-Bug: #1796981
    (cherry picked from commit 4da54c07861a1542a8e30a768d4506d3e81b5598)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/614868
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=62fbfdfa8f2dda1a5d9c3733b2f3da3a9d00c93a
Submitter: Zuul
Branch: stable/queens

commit 62fbfdfa8f2dda1a5d9c3733b2f3da3a9d00c93a
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 10 15:07:00 2018 -0400

    Fix NoneType error in _notify_volume_usage_detach

    If the driver.block_stats() method returns None, like the
    libvirt driver will if the guest is gone during an evacuate,
    we'll get a NoneType error trying to unpack the return value
    from the driver. Instead, simply return as if the driver
    raised NotImplementedError.

    Since handling None is changing the contract on the virt
    driver API, the docstring is updated to explain the acceptable
    return values of the driver method.

    Change-Id: I98a2785c07f7af02ad83650c72d9e1868290ece4
    Closes-Bug: #1796981
    (cherry picked from commit 4da54c07861a1542a8e30a768d4506d3e81b5598)
    (cherry picked from commit 940034c27c611b8b344f8371eb0458b16fb58f1d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/614872
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d004b0ecfa6f20ceefd029f4ed09de191f0d76b3
Submitter: Zuul
Branch: stable/pike

commit d004b0ecfa6f20ceefd029f4ed09de191f0d76b3
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 10 15:07:00 2018 -0400

    Fix NoneType error in _notify_volume_usage_detach

    If the driver.block_stats() method returns None, like the
    libvirt driver will if the guest is gone during an evacuate,
    we'll get a NoneType error trying to unpack the return value
    from the driver. Instead, simply return as if the driver
    raised NotImplementedError.

    Since handling None is changing the contract on the virt
    driver API, the docstring is updated to explain the acceptable
    return values of the driver method.

    Conflicts:
          nova/tests/unit/compute/test_compute_mgr.py

    NOTE(mriedem): The conflict is due to not having the
    test_get_scheduler_hints test from change
    I49ffebcd129990f1835f404d98b51732a32171eb which was
    added in Queens.

    Change-Id: I98a2785c07f7af02ad83650c72d9e1868290ece4
    Closes-Bug: #1796981
    (cherry picked from commit 4da54c07861a1542a8e30a768d4506d3e81b5598)
    (cherry picked from commit 940034c27c611b8b344f8371eb0458b16fb58f1d)
    (cherry picked from commit 62fbfdfa8f2dda1a5d9c3733b2f3da3a9d00c93a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.1.0

This issue was fixed in the openstack/nova 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.7

This issue was fixed in the openstack/nova 16.1.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.9

This issue was fixed in the openstack/nova 17.0.9 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.