_populate_assigned_resources raises "TypeError: argument of type 'NoneType' is not iterable" during active migration

Bug #1849165 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Eric Fried
Train
Fix Committed
High
Eric Fried

Bug Description

Seen here:

https://zuul.opendev.org/t/openstack/build/2b10b4a240b84245bcee3366db93951d/log/logs/screen-n-cpu.txt.gz?severity=4#2675

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager [None req-dd5ddbad-4234-4288-bbab-2c3d20b7f4ad None None] Error updating resources for node ubuntu-bionic-rax-iad-0012404623.: TypeError: argument of type 'NoneType' is not iterable

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager Traceback (most recent call last):

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/manager.py", line 8925, in _update_available_resource_for_node

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager startup=startup)

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 883, in update_available_resource

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager return f(*args, **kwargs)

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 965, in _update_available_resource

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager self._populate_assigned_resources(context, instance_by_uuid)

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 482, in _populate_assigned_resources

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager if mig.dest_compute == self.host and 'new_resources' in mig_ctx:

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager TypeError: argument of type 'NoneType' is not iterable

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager

This was added late in Train:

https://review.opendev.org/#/c/678452/

Revision history for this message
Matt Riedemann (mriedem) wrote :
summary: - _populate_assigned_resources raises TypeError: argument of type
- 'NoneType' is not iterable
+ _populate_assigned_resources raises "TypeError: argument of type
+ 'NoneType' is not iterable" during active migration
Changed in nova:
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :

This doesn't appear to cause any failures since it's probably a separate update_available_resource periodic task that is hitting a window and failing and then just runs on the next opportunity and is resolved, but it's still really ugly to see in the logs when you're trying to debug another issue.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I think what happens here is the RT is running on the dest host and tracking the incoming migration before the claim is made which creates the migration_context in the DB. The migration record is created in the control plane but the claim happens in the compute, so we can have a race where resource tracker's update_available_resource periodic runs between those times on the dest host and we hit the NoneType error. As seen from the logs:

Oct 21 13:35:16.757137 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: DEBUG nova.compute.resource_tracker [None req-dd5ddbad-4234-4288-bbab-2c3d20b7f4ad None None] [instance: cd4148a2-4550-4e83-b6f7-c91752eaf779] Starting to track incoming migration 407fd025-e8ba-4012-ade7-d0255d2a1837 with flavor 42 {{(pid=26938) _update_usage_from_migration /opt/stack/new/nova/nova/compute/resource_tracker.py:1337}}

...

Oct 21 13:35:16.977968 ubuntu-bionic-rax-iad-0012404623 nova-compute[26938]: ERROR nova.compute.manager [None req-dd5ddbad-4234-4288-bbab-2c3d20b7f4ad None None] Error updating resources for node ubuntu-bionic-rax-iad-0012404623.: TypeError: argument of type 'NoneType' is not iterable

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/689842

Changed in nova:
assignee: nobody → Eric Fried (efried)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/689866

Revision history for this message
Eric Fried (efried) wrote :

> This doesn't appear to cause any failures since it's probably a separate
> update_available_resource periodic task that is hitting a window and failing and then just runs
> on the next opportunity and is resolved

So is this still "High"?

Revision history for this message
Eric Fried (efried) wrote :

Regression test: https://review.opendev.org/#/c/689866/

(I created that review with the Related-Bug tag; why didn't it show up here?)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/689866
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=761be5d0cb364333cde267b431b1ef87920f7400
Submitter: Zuul
Branch: master

commit 761be5d0cb364333cde267b431b1ef87920f7400
Author: Eric Fried <email address hidden>
Date: Mon Oct 21 12:18:10 2019 -0500

    Func: bug 1849165: mig race with _populate_assigned_resources

    Add a functional regression test for the referenced bug:

    If a migration is initiated, and update_available_resource runs on the
    destination between when the migration record is associated with the
    destination and when the migration context is added to the instance, it
    will raise a TypeError attempting to _populate_assigned_resources for
    that instance, because that method attempts to access the
    (as-yet-nonexistent) migration context.

    Note that this doesn't fail the migration; it just leaves ugly logs. In
    real life it probably also leaves other pieces of
    update_available_resource unfinished on the destination.

    Related-Bug: #1849165
    Change-Id: I7e96cd24049c205f76a684a2e7425f85b4376f73

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/690099

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/690100

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/689842
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=80385a22ee480a4c9775148d4729ab5d9c52e76d
Submitter: Zuul
Branch: master

commit 80385a22ee480a4c9775148d4729ab5d9c52e76d
Author: Eric Fried <email address hidden>
Date: Mon Oct 21 11:50:25 2019 -0500

    Don't populate resources for not-yet-migrated inst

    Per the referenced bug, it is possible for update_available_resource to
    race with a migration such that the migration record exists, but the
    instance's migration context doesn't. In such cases we shouldn't try to
    track the instance's assigned resources on this host (because there
    aren't any yet).

    Change-Id: I69f99adfa8c91b50086052ca1b15c55e86ed614d
    Closes-Bug: #1849165

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/690099
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=91a2056d526dd07edecba46f93119697190f112e
Submitter: Zuul
Branch: stable/train

commit 91a2056d526dd07edecba46f93119697190f112e
Author: Eric Fried <email address hidden>
Date: Mon Oct 21 12:18:10 2019 -0500

    Func: bug 1849165: mig race with _populate_assigned_resources

    Add a functional regression test for the referenced bug:

    If a migration is initiated, and update_available_resource runs on the
    destination between when the migration record is associated with the
    destination and when the migration context is added to the instance, it
    will raise a TypeError attempting to _populate_assigned_resources for
    that instance, because that method attempts to access the
    (as-yet-nonexistent) migration context.

    Note that this doesn't fail the migration; it just leaves ugly logs. In
    real life it probably also leaves other pieces of
    update_available_resource unfinished on the destination.

    Related-Bug: #1849165
    Change-Id: I7e96cd24049c205f76a684a2e7425f85b4376f73
    (cherry picked from commit 761be5d0cb364333cde267b431b1ef87920f7400)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/690100
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1a7be0d62e0b5320338effd5acb6086acdbee346
Submitter: Zuul
Branch: stable/train

commit 1a7be0d62e0b5320338effd5acb6086acdbee346
Author: Eric Fried <email address hidden>
Date: Mon Oct 21 11:50:25 2019 -0500

    Don't populate resources for not-yet-migrated inst

    Per the referenced bug, it is possible for update_available_resource to
    race with a migration such that the migration record exists, but the
    instance's migration context doesn't. In such cases we shouldn't try to
    track the instance's assigned resources on this host (because there
    aren't any yet).

    Change-Id: I69f99adfa8c91b50086052ca1b15c55e86ed614d
    Closes-Bug: #1849165
    (cherry picked from commit 80385a22ee480a4c9775148d4729ab5d9c52e76d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.1

This issue was fixed in the openstack/nova 20.0.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.