kilo controller can't conduct juno compute nodes

Bug #1431201 reported by Lan Qi song
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Dan Smith

Bug Description

When I tried to use kilo controller to conduct juno compute nodes, the juno nova-compute service start with the following two errors:

1. 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup return self._update_available_resource(context, resources)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py", line 272, in inner
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup return f(*args, **kwargs)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup self.gen.throw(type, value, traceback)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py", line 236, in lock
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup yield int_lock
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py", line 272, in inner
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup return f(*args, **kwargs)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/compute/resource_tracker.py", line 377, in _update_available_resource
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup self._sync_compute_node(context, resources)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/compute/resource_tracker.py", line 388, in _sync_compute_node
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup compute_node_refs = service['compute_node']
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup KeyError: 'compute_node'

We can revert this commit to fix this error:
https://github.com/openstack/nova/commit/83b64ceb871b1553b1bb1e0bb9270816db892552

2. 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File "/usr/lib/python2.6/site-packages/nova/rpc.py", line 111, in deserialize_entity
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver return self._base.deserialize_entity(context, entity)
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File "/usr/lib/python2.6/site-packages/nova/objects/base.py", line 649, in deserialize_entity
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver entity = self._process_object(context, entity)
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File "/usr/lib/python2.6/site-packages/nova/objects/base.py", line 615, in _process_object
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver e.kwargs['supported'])
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File "/usr/lib/python2.6/site-packages/nova/conductor/api.py", line 217, in object_backport
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver return self._manager.object_backport(context, objinst, target_version)
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 358, in object_backport
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver target_version=target_version)
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py", line 152, in call

We can revert this commit to fix this error:
https://github.com/openstack/nova/commit/f287b75138129542436b2085d52d6fe201ca7e14

Andbody know is there something like gate keeper to make kilo controller can keep conducting the juno compute nodes ? Thanks!

Tags: compute
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

So, there are 2 problems :
 - old Juno RT is using the conductor API instead of using Objects, so that's directly proxying on top of DB API. Instead of reverting https://github.com/openstack/nova/commit/83b64ceb871b1553b1bb1e0bb9270816db892552 I'm in favor of adding a logic in the conductor manager method which would check if the compute_node field is present, and if not re-add it. That way, it would allow to go on and keep removed the nested compute_node field for Kilo nodes while Juno nodes would still work

 - the second stack is a little bit weird for me, could you please give me a better stacktrace so I could understand what's wrong with the second commit you mentioned ?

Changed in nova:
assignee: nobody → Sylvain Bauza (sylvain-bauza)
importance: Undecided → Critical
status: New → Confirmed
tags: added: compute
Revision history for this message
Lan Qi song (lqslan) wrote :

The full stack is a loop of what I paste in the description of this issue. I think it's happend when juno compute node tried to check the compatible with conductor node, but it failed.

I think maybe this line led to the problem :
https://github.com/openstack/nova/blob/master/nova/objects/service.py#L81

I tried to set the target_version='1.10' to target_version='1.5'(juno compute_node object version), the problem gone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/163797

Changed in nova:
milestone: none → kilo-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/163867

Changed in nova:
status: Confirmed → In Progress
Jay Pipes (jaypipes)
summary: - kilo controller cann't conduct juno compute nodes
+ kilo controller can't conduct juno compute nodes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/163797
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cb7977466833df01c6ed7e8405bb1f7b6b5ea523
Submitter: Jenkins
Branch: master

commit cb7977466833df01c6ed7e8405bb1f7b6b5ea523
Author: Sylvain Bauza <email address hidden>
Date: Thu Mar 12 12:15:45 2015 +0100

    Fix Juno nodes checking service.compute_node

    Old Juno ResourceTracker is calling the conductor instead of calling the
    Service object, so it can't benefit from the backwards compatibility.
    As conductor_api.service_get_by_compute_node is only called by old Juno
    computes (thru a RT method), we can safely backport the compute_node field
    into what the DB method provides so it doesn't break old RTs.

    Related-Bug: #1431201

    Change-Id: I9afd3c65a088d218cb9c452b18881e94e888950b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/164206

Changed in nova:
assignee: Sylvain Bauza (sylvain-bauza) → Dan Smith (danms)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/164206
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dddd37d091d89a813fcb97a726c2bf932d6b69ae
Submitter: Jenkins
Branch: master

commit dddd37d091d89a813fcb97a726c2bf932d6b69ae
Author: Dan Smith <email address hidden>
Date: Fri Mar 13 07:51:57 2015 -0700

    Break out the child version calculation logic from obj_make_compatible()

    This pulls out the child version logic from the compatibility routine,
    so that it can be used in other contexts. I left the tests that verify
    the versions pointing at the caller to show that this doesn't break
    anything. Moving them to actually test the core function would be good.

    However, now the inner version calculation logic is a little more
    precise, which changed one of the test cases simply because it was
    not considering the actual child version properly. That same issue
    plagued other cases because we were not properly calling each case on
    a clean copy of the primitive (which didn't matter before).

    Change-Id: Id4071b6bc6e9d4419e9142fa339095f04b182d92
    Related-Bug: #1431201

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/163867
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=708b576ee9c585394f7b309bb70784c63061027d
Submitter: Jenkins
Branch: master

commit 708b576ee9c585394f7b309bb70784c63061027d
Author: Sylvain Bauza <email address hidden>
Date: Thu Mar 12 16:15:23 2015 +0100

    Fix ComputeNode backport for Service.obj_make_compatible

    The previous implementation was setting anyway the ComputeNode object version
    to 1.10. It consequently breaks compatiblity with Juno compute nodes that can
    only understand ComputeNode <1.6

    Change-Id: Ib2b63d0fb482410233ee42b1ff1c697229c77958
    Closes-Bug: #1431201

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.