Nailgun code does not allow for safe get of old node attributes

Bug #1594443 reported by Vladimir Kuklin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Critical
Georgy Kibardin
Mitaka
Fix Released
Critical
Arthur Svechnikov
Newton
Fix Committed
Critical
Georgy Kibardin

Bug Description

Install Fuel 8.0
Deploy bvt_2 scenario
Upgrade Fuel 8.0 to 9.0 by using fuel 8.0 and fuel 9.0 octane versions for backup/restore.
Try to add compute node

See issues with node metadata get for 8.0 cluster like these ones:

2016-06-20 14:49:44.099 ERROR [7f7a3b124880] (manager) Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nailgun/task/manager.py", line 58, in _call_silently
    to_return = method(task, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 1488, in execute
    cls._check_dpdk_properties(task)
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 1867, in _check_dpdk_properties
    objects.NodeAttributes.distribute_node_cpus(node)
  File "/usr/lib/python2.7/site-packages/nailgun/objects/node.py", line 1438, in distribute_node_cpus
    numa_nodes = node.meta['numa_topology']['numa_nodes']
KeyError: 'numa_topology'

This means that this is not possible to deploy nodes in pre-9.0 clusters after upgrading master node to 9.0. I think that this affects major functionality of Fuel, so I am setting this bug priority to critical.

summary: - Nailgun code does not allow for safe get of new node attributes
+ Nailgun code does not allow for safe get of old node attributes
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Dmitry Klenov (dklenov)
Changed in fuel:
assignee: Fuel Telco (fuel-telco-team) → Arthur Svechnikov (asvechnikov)
Revision history for this message
Arthur Svechnikov (asvechnikov) wrote :

This problem occurred due to old fuel-nailgun-agent in bootstrap image. There is no way nailgun would know about numa_topology if fuel-nailgun-agent doesn't send this information. Thus, the proper way to make an upgrade is to rebuild bootstrap image with fuel-nailgun-agent 9.0 and probably remove all 'discover' nodes from nailgun, so they will be re-discovered.

Does this issue occur for already deployed nodes?

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

This occurs for all nodes of 8.0 version as they contain old nailgun agent. Thus, it is impossible to do anything with old clusters - this makes Fuel unusable after upgrade and this is a major regression in functionality. Also, updating nailgun agent is not a solution here as old 8.0 nailgun-agent nodes should not contain new feature-related code.

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Andrey Maximov (maximov) wrote :

My understanding that fuel-octane and upgrade scripts are not part of 9.0 deliverables, so this should be tagged by non-release tag.
@Nastya?

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

+ 1 to add tag "non-release"

Andrey Maximov (maximov)
tags: added: non-release
tags: removed: need-info
Revision history for this message
Arthur Svechnikov (asvechnikov) wrote :

ETA: 24 Jun

RCA: Each cluster's release can be described by a set of requirements that applied to check that everything configured in the right way. However, Nailgun doesn't have versioning to apply actual checks based on cluster's release. Some new features require particular entities in db. Before this problem was solved by adding default values for these particular entities, however, some entities don't have default values. It's hard to foreknow which place will fail during upgrade when the code is being written.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/332230

Changed in fuel:
assignee: Arthur Svechnikov (asvechnikov) → Sergey Slipushenko (sslypushenko)
status: Confirmed → In Progress
Revision history for this message
Sergey Slipushenko (sslypushenko) wrote :

Partial fix for rules to pick boot disk.

Changed in fuel:
assignee: Arthur Svechnikov (asvechnikov) → Sergey Slipushenko (sslypushenko)
Revision history for this message
Ilya Kharin (akscram) wrote :

A separate bug report was create on the problem with the absent 'rule_to_pick_boot_disk' key [1]. The current reported will be updated to track only the problem with 'numa_topology'.

[1] https://bugs.launchpad.net/fuel/+bug/1595209

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/333578

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/333578
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=79e28facc84f3cd9bfc50292d3d11be6895d86cf
Submitter: Jenkins
Branch: master

commit 79e28facc84f3cd9bfc50292d3d11be6895d86cf
Author: Artur Svechnikov <email address hidden>
Date: Thu Jun 23 23:31:08 2016 +0300

    Do not check NFV features for old envs

    NFV features (DPDK, SR-IOV, NUMA/CPU pinning, HugePages)
    can't be checked for old clusters, due to old nailgun-agent.
    Old nailgun-agent doesn't send NFV specific information.
    So all NFV related checks and functional should be disabled for old
    environments.

    Change-Id: Ib589d67658f45414b8049398316af5c7298d459e
    Closes-Bug: #1594443

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/335513

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
Download full text (3.3 KiB)

The fix provided does not seem to fix the error in the bug description.

There are at least 3 places where it should be fixed:

objects/node.py:

diff --git a/nailgun/nailgun/objects/node.py b/nailgun/nailgun/objects/node.py
index 8833f7c..fd2b76c 100644
--- a/nailgun/nailgun/objects/node.py
+++ b/nailgun/nailgun/objects/node.py
@@ -1435,7 +1435,7 @@ class NodeAttributes(object):

     @classmethod
     def distribute_node_cpus(cls, node, attributes=None):
- numa_nodes = node.meta['numa_topology']['numa_nodes']
+ numa_nodes = node.meta.get('numa_topology',{}).get('numa_nodes', [])
         components = cls.node_cpu_pinning_info(node, attributes)['components']
         dpdk_nics = Node.dpdk_nics(node)

@@ -1497,7 +1497,7 @@ class NodeAttributes(object):
         """

         hugepages = collections.defaultdict(int)
- numa_count = len(node.meta['numa_topology']['numa_nodes'])
+ numa_count = len(node.meta.get('numa_topology', {}).get('numa_nodes', []))

         hugepages_attributes = cls._safe_get_hugepages(node)
         for name, attrs in six.iteritems(hugepages_attributes):
@@ -1557,7 +1557,7 @@ class NodeAttributes(object):
             return {}

         dpdk_memory = hugepages['dpdk']['value']
- numa_nodes_len = len(node.meta['numa_topology']['numa_nodes'])
+ numa_nodes_len = len(node.meta.get('numa_topology', {}).get('numa_nodes', []))

         return {
             'ovs_socket_mem':
@@ -1567,7 +1567,7 @@ class NodeAttributes(object):
     def distribute_hugepages(cls, node, attributes=None):
         hugepages = cls._safe_get_hugepages(
             node, attributes=attributes)
- topology = node.meta['numa_topology']
+ topology = node.meta.get('numa_topology', {})

         # split components to 2 groups:
         # components that should have pages on all numa nodes (such as dpdk)
@@ -1595,7 +1595,7 @@ class NodeAttributes(object):

             nova_cpus = set(cpu_distribution['components'].get('nova', []))
             numa_values = collections.defaultdict(int)
- for numa_node in topology['numa_nodes']:
+ for numa_node in topology.get('numa_nodes', {}):
                 for cpu in numa_node['cpus']:
                     if cpu in nova_cpus:
                         numa_values[numa_node['id']] += 1

and here

diff --git a/nailgun/nailgun/policy/hugepages_distribution.py b/nailgun/nailgun/policy/hugepages_distribution.py
index 264f52e..d6b4412 100644
--- a/nailgun/nailgun/policy/hugepages_distribution.py
+++ b/nailgun/nailgun/policy/hugepages_distribution.py
@@ -132,7 +132,7 @@ def distribute_hugepages(numa_topology, components, numa_sort_func):
     any_comps = [Component(comp) for comp in components['any']]

     numa_nodes = []
- for numa_node in numa_topology['numa_nodes']:
+ for numa_node in numa_topology.get('numa_nodes', {}):
         # converting memory to KiBs
         memory = numa_node['memory'] // 1024

@@ -145,8 +145,8 @@ def distribute_hugepages(numa_topology, components, numa_sort_func):

     numa_nodes.sort(key=lambda x: numa_sort_func(x.id))

- _allocate_all(numa_nodes, all_comps)
- _allocate_any(numa_nodes, any_...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/338177

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (master)

Change abandoned by Artur Svechnikov (<email address hidden>) on branch: master
Review: https://review.openstack.org/338177
Reason: This patch doesn't fix upgrade issue

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/341366

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/335513
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=2c8a42159e431f05e9287b17a4acd584500b76c0
Submitter: Jenkins
Branch: stable/mitaka

commit 2c8a42159e431f05e9287b17a4acd584500b76c0
Author: Artur Svechnikov <email address hidden>
Date: Thu Jun 23 23:31:08 2016 +0300

    Do not check NFV features for old envs

    NFV features (DPDK, SR-IOV, NUMA/CPU pinning, HugePages)
    can't be checked for old clusters, due to old nailgun-agent.
    Old nailgun-agent doesn't send NFV specific information.
    So all NFV related checks and functional should be disabled for old
    environments.

    Change-Id: Ib589d67658f45414b8049398316af5c7298d459e
    Closes-Bug: #1594443
    (cherry picked from commit 79e28facc84f3cd9bfc50292d3d11be6895d86cf)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/341366
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=9bd61c68dc5cc05cdb8277e1f2e02c4a1a9a37aa
Submitter: Jenkins
Branch: master

commit 9bd61c68dc5cc05cdb8277e1f2e02c4a1a9a37aa
Author: Artur Svechnikov <email address hidden>
Date: Wed Jul 13 11:42:08 2016 +0300

    Do not call CPU&HugePages distributors

    If there is no specified hugepages or CPU pinning
    distributors shouldn't be called. Also changed
    initialization of custom_hugepages type.

    Change-Id: I3835c9d163ba2692adf34193b57510060158b8e3
    Closes-Bug: #1594443

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/348190

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (stable/mitaka)

Change abandoned by Artur Svechnikov (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/348190
Reason: Lack of testing was found, it's impossible to specify hugepages if host support only 2M huge pages due to incorrect validation

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/349866

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/349866
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=661ce479a6f191b3286ab2a45027f67a038b7a8d
Submitter: Jenkins
Branch: stable/mitaka

commit 661ce479a6f191b3286ab2a45027f67a038b7a8d
Author: Artur Svechnikov <email address hidden>
Date: Wed Jul 13 11:42:08 2016 +0300

    Skip empty cpu pinning and hugepages

    If there is no specified hugepages or CPU pinning
    distributors shouldn't be called. Also changed
    initialization of custom_hugepages type.

    Change-Id: Iedb819b1da7dcb3877a6a94b9e7cfb93aa949a9e
    Closes-Bug: #1594443

Revision history for this message
Alexey Shtokolov (ashtokolov) wrote :
Revision history for this message
Ilya Kharin (akscram) wrote :

The actual change request was merged in the stable/mitaka branch: https://review.openstack.org/349866

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/355649

Changed in fuel:
assignee: Arthur Svechnikov (asvechnikov) → Ilya Kharin (akscram)
status: Confirmed → In Progress
Changed in fuel:
assignee: Ilya Kharin (akscram) → Arthur Svechnikov (asvechnikov)
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

No key errors on CI - snapshot 299

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (master)

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: master
Review: https://review.openstack.org/355649
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-web 10.0.0rc1

This issue was fixed in the openstack/fuel-web 10.0.0rc1 release candidate.

Ilya Kharin (akscram)
Changed in fuel:
assignee: Ilya Kharin (akscram) → Fuel Sustaining (fuel-sustaining-team)
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Georgy Kibardin (gkibardin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/355649
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=a6434023f18a57947be17933b29cd659ad3ddf66
Submitter: Jenkins
Branch: master

commit a6434023f18a57947be17933b29cd659ad3ddf66
Author: Artur Svechnikov <email address hidden>
Date: Wed Jul 13 11:42:08 2016 +0300

    Skip empty cpu pinning and hugepages

    If there is no specified hugepages or CPU pinning
    distributors shouldn't be called. Also changed
    initialization of custom_hugepages type.

    Change-Id: Iedb819b1da7dcb3877a6a94b9e7cfb93aa949a9e
    Closes-Bug: #1594443
    (cherry picked from commit 661ce479a6f191b3286ab2a45027f67a038b7a8d)

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/newton)

Reviewed: https://review.openstack.org/404133
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=d04c1108b2b19e5d681a07fd3492b93d133c4617
Submitter: Jenkins
Branch: stable/newton

commit d04c1108b2b19e5d681a07fd3492b93d133c4617
Author: Artur Svechnikov <email address hidden>
Date: Wed Jul 13 11:42:08 2016 +0300

    Skip empty cpu pinning and hugepages

    If there is no specified hugepages or CPU pinning
    distributors shouldn't be called. Also changed
    initialization of custom_hugepages type.

    Change-Id: Iedb819b1da7dcb3877a6a94b9e7cfb93aa949a9e
    Closes-Bug: #1594443
    (cherry picked from commit 661ce479a6f191b3286ab2a45027f67a038b7a8d)
    (cherry picked from commit a6434023f18a57947be17933b29cd659ad3ddf66)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-web 10.0.0

This issue was fixed in the openstack/fuel-web 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-web 11.0.0.0rc1

This issue was fixed in the openstack/fuel-web 11.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.