KeyError: 'service' during schedule of baremetal instance

Bug #1221620 reported by Derek Higgins
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Nikola Đipanov
tripleo
Fix Released
Critical
Unassigned

Bug Description

Traceback while scheduling both overcloud nodes on tripleo ci

Last succesfull run was 05-Sep-2013 01:54:10 (UTC)
So something changed after this run https://review.openstack.org/#/c/43968/

although scheduling of baremetal node seems to work on seed ....

INFO nova.scheduler.filter_scheduler [req-c754d309-92fc-461a-81fb-d5bfe97a0676 99fa1214e35a4cc6b99c9332b8ca66fb d86556c4d57c4dfc87b30f6c66c40a98] Attempting to build 1 instance(s) uuids: [u'f71e3e47-f2a2-4a13-9
WARNING nova.scheduler.utils [req-c754d309-92fc-461a-81fb-d5bfe97a0676 99fa1214e35a4cc6b99c9332b8ca66fb d86556c4d57c4dfc87b30f6c66c40a98] Failed to scheduler_run_instance: 'service'
WARNING nova.scheduler.utils [req-c754d309-92fc-461a-81fb-d5bfe97a0676 99fa1214e35a4cc6b99c9332b8ca66fb d86556c4d57c4dfc87b30f6c66c40a98] [instance: f71e3e47-f2a2-4a13-92c0-c3397acaf409] Setting instance to ERR
ERROR nova.openstack.common.rpc.amqp [req-c754d309-92fc-461a-81fb-d5bfe97a0676 99fa1214e35a4cc6b99c9332b8ca66fb d86556c4d57c4dfc87b30f6c66c40a98] Exception during message handling
TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
TRACE nova.openstack.common.rpc.amqp File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/openstack/common/rpc/amqp.py", line 461, in _process_data
TRACE nova.openstack.common.rpc.amqp **args)
TRACE nova.openstack.common.rpc.amqp File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
TRACE nova.openstack.common.rpc.amqp File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/scheduler/manager.py", line 160, in run_instance
TRACE nova.openstack.common.rpc.amqp context, ex, request_spec)
TRACE nova.openstack.common.rpc.amqp File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/scheduler/manager.py", line 147, in run_instance
TRACE nova.openstack.common.rpc.amqp legacy_bdm_in_spec)
TRACE nova.openstack.common.rpc.amqp File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 87, in schedule_run_instance
TRACE nova.openstack.common.rpc.amqp filter_properties, instance_uuids)
TRACE nova.openstack.common.rpc.amqp File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 326, in _schedule
TRACE nova.openstack.common.rpc.amqp hosts = self.host_manager.get_all_host_states(elevated)
TRACE nova.openstack.common.rpc.amqp File "/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 432, in get_all_host_states
TRACE nova.openstack.common.rpc.amqp service = compute['service']
TRACE nova.openstack.common.rpc.amqp KeyError: 'service'
TRACE nova.openstack.common.rpc.amqp

Tags: db baremetal
tags: added: db
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

This particular error is caused by this change https://review.openstack.org/#/c/43151/ , but the real problem is deeper:
ComputeNode <--> Service must be a 1-1 relationship, but there is no unique constraint on service_id column of compute_nodes table. Because of this, duplicate entries (referring to the same service row) can be added to compute_nodes table. This error had passed silently before https://review.openstack.org/#/c/43151/ was merged and changed the way the tables were joined.

Changed in nova:
assignee: nobody → Roman Podolyaka (rpodolyaka)
Revision history for this message
Hans Lindgren (hanlind) wrote :

Baremetal actually changed that to be a 1-to-many relationship. See https://review.openstack.org/13920 where the commit message states:

"With this patch, one service entry with multiple compute_node entries
can be registered by nova-compute."

From this, it looks like https://review.openstack.org/#/c/43151/ introduced a regression.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Ahh... Thanks for pointing this out! Var names and comments can be very misleading... It really looks like to be a regression introduced by https://review.openstack.org/#/c/43151/ then, as it assumes ComputeNode <--> Service to be 1-1 relationship, which is not true for Nova Baremetal.

Should we revert the broken commit?

Changed in nova:
assignee: Roman Podolyaka (rpodolyaka) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/45416

Changed in nova:
assignee: nobody → Hans Lindgren (hanlind)
status: New → In Progress
Hans Lindgren (hanlind)
Changed in nova:
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/45422

Changed in nova:
assignee: Hans Lindgren (hanlind) → Roman Podolyaka (rpodolyaka)
tags: added: baremetal
Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/45698

Changed in nova:
assignee: Roman Podolyaka (rpodolyaka) → Nikola Đipanov (ndipanov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/45422
Committed: http://github.com/openstack/nova/commit/59fb3c18759bb2529a9c1dea445c2d5caf6746da
Submitter: Jenkins
Branch: master

commit 59fb3c18759bb2529a9c1dea445c2d5caf6746da
Author: Roman Podolyaka <email address hidden>
Date: Fri Sep 6 15:27:07 2013 +0300

    Fix compute_node_get_all() for Nova Baremetal

    Change Ie5ef00c974b810336787e88c78c93c15ca2890d3 introduced
    a regression leading to KeyError when a new baremetal node
    is scheduled. This is due to the fact, that the mentioned
    change assumes, that ComputeNode <--> Service is a 1-1
    relationship, which is not true for Nova Baremetal driver.

    This patch fixes the tables join in compute_node_get_all()
    DB API method to work with 1-M relationship between ComputeNode
    and Service models.

    Closes-Bug: #1221620

    Change-Id: I7c218d06f63cc2bf7d0e358f2f76366601179b0c

Changed in nova:
status: In Progress → Fix Committed
Changed in tripleo:
status: Triaged → Fix Released
Changed in nova:
milestone: none → havana-rc1
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-rc1 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.