Multinode job fails with "Compute host X not found"

Bug #1661014 reported by Vladyslav Drok
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Won't Fix
Undecided
Unassigned
OpenStack Compute (nova)
Fix Released
Undecided
Chris Dent

Bug Description

Example failure:

http://logs.openstack.org/75/427675/2/check/gate-tempest-dsvm-ironic-ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-xenial-nv/3ff2401/console.html#_2017-02-01_14_55_05_875428

2017-02-01 14:55:05.875428 | Details: {u'code': 500, u'message': u'Compute host 5 could not be found.\nTraceback (most recent call last):\n\n File "/opt/stack/new/nova/nova/conductor/manager.py", line 92, in _object_dispatch\n return getattr(target, method)(*args, **kwargs)\n\n File "/usr/local/lib/python2.7/dist-packages', u'created': u'2017-02-01T14:44:56Z', u'details': u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1780, in _do_build_and_run_instance\n filter_properties)\n File "/opt/stack/new/nova/nova/compute/manager.py", line 2016, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n'}

Vasyl Saienko (vsaienko)
description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :

I think you're going to need to enable this new config option in that job so that it discovers hosts when n-sch starts up, and periodically based on what you set it to:

https://review.openstack.org/#/c/426826/

But you also need to run 'nova-manage cell_v2 discover_hosts' when new compute nodes are added to the deployment otherwise.

Revision history for this message
Vasyl Saienko (vsaienko) wrote :

@Matt we already run nova-manage cell_v2 discover_hosts in the end of stack.sh when all ironic nodes are enrolled, and registered in nova with correct CPU/RAM/DISK.

I've found that reverting https://review.openstack.org/#/c/417961/ fixes issue and the problem no longer occur.

The bug also not only affects multinode job, "gate-tempest-dsvm-ironic-pxe_ipmitool-postgres-ubuntu-xenial-nv" is also affected.

Changed in nova:
assignee: nobody → Chris Dent (cdent)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/428375
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=965fffc09d6fffba7918117e170d5799c69fe99b
Submitter: Jenkins
Branch: master

commit 965fffc09d6fffba7918117e170d5799c69fe99b
Author: EdLeafe <email address hidden>
Date: Thu Feb 2 18:48:35 2017 +0000

    Delete a compute node's resource provider when node is deleted

    Currently when a compute node is deleted, its record in the cell DB is
    deleted, but its representation as a resource provider in the placement
    service remains, along with any inventory and allocations. This could
    cause the placement engine to return that provider record, even though
    the compute node no longer exists. And since the periodic "healing" by
    the resource tracker only updates compute node resources for records in
    the compute_nodes table, these old records are never removed.

    This patch adds a call to delete the resource provider when the compute
    node is deleted. It also adds a method to the scheduler report client
    to make these calls to the placement API.

    Partial-Bug: #1661258
    Closes-Bug: #1661014

    Change-Id: I6098d186d05ff8b9a568e23f860295a7bc2e6447

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0rc1

This issue was fixed in the openstack/nova 15.0.0.0rc1 release candidate.

Vladyslav Drok (vdrok)
Changed in ironic:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.