TypeError: object of type 'object' has no len() from resources_from_request_spec when cells are down

Bug #1857139 reported by Matt Riedemann on 2019-12-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Matt Riedemann
Train
Low
Unassigned

Bug Description

Seen here:

https://zuul.opendev.org/t/openstack/build/c187e207bc1c48a0a7fa49ef9798b696/log/logs/screen-n-sch.txt.gz#2529

cell1 is down so the call to scatter_gather_cells in get_compute_nodes_by_host_or_node yields a result but it's not a ComputeNodeList, it's the did_not_respond_sentinel object:

https://github.com/openstack/nova/blob/02019d2660dfce3facdd64ecdb2bd60ba4a91c6d/nova/scheduler/host_manager.py#L705

https://github.com/openstack/nova/blob/02019d2660dfce3facdd64ecdb2bd60ba4a91c6d/nova/context.py#L454

which results in an error here:

https://github.com/openstack/nova/blob/02019d2660dfce3facdd64ecdb2bd60ba4a91c6d/nova/scheduler/utils.py#L612

The HostManager.get_compute_nodes_by_host_or_node method should filter out fail/timeout results from the scatter_gather_cells results. We'll get a NoValidHost either way but this is better than the traceback with the TypeError in it.

Fix proposed to branch: master
Review: https://review.opendev.org/700186

Changed in nova:
status: Triaged → In Progress

Fix proposed to branch: master
Review: https://review.opendev.org/700752

Changed in nova:
assignee: Matt Riedemann (mriedem) → Choi-Sung-Hoon (knu-cse)

Change abandoned by Choi-Sung-Hoon (<email address hidden>) on branch: master
Review: https://review.opendev.org/700752

Change abandoned by Choi-Sung-Hoon (<email address hidden>) on branch: master
Review: https://review.opendev.org/700753
Reason: Following Brin Zhang's comment, I abandon this change.

Changed in nova:
assignee: Choi-Sung-Hoon (knu-cse) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Choi-Sung-Hoon (knu-cse)
Changed in nova:
assignee: Choi-Sung-Hoon (knu-cse) → Matt Riedemann (mriedem)

Reviewed: https://review.opendev.org/700186
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0d9622f581e830e7b7bc9763aaa09ba02e99b8bb
Submitter: Zuul
Branch: master

commit 0d9622f581e830e7b7bc9763aaa09ba02e99b8bb
Author: Matt Riedemann <email address hidden>
Date: Fri Dec 20 10:03:23 2019 -0500

    Handle cell failures in get_compute_nodes_by_host_or_node

    get_compute_nodes_by_host_or_node uses the scatter_gather_cells
    function but was not handling the case that a failure result
    was returned, which could be the called function raising some
    exception or the cell timing out. This causes issues when the
    caller of get_compute_nodes_by_host_or_node expects to get a
    ComputeNodeList back and can do something like len(nodes) on it
    which fails when the result is not iterable.

    To be clear, if a cell is down there are going to be problems
    which likely result in a NoValidHost error during scheduling, but
    this avoids an ugly TypeError traceback in the scheduler logs.

    Change-Id: Ia54b5adf0a125ae1f9b86887a07dd1d79821dd54
    Closes-Bug: #1857139

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers