[Upgrade]After the upgrade of Fuel master 8 > 9.1 old 8.0 cluster doesn't scale

Bug #1606823 reported by Sergey Novikov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Dmitry Guryanov
Mitaka
Fix Released
High
Dmitry Guryanov

Bug Description

Detailed bug description:

After the upgrade of Fuel master node from 8 to 9.1 with the existing 8.0 cluster and the further scaling this cluster fails with error in the puppet's log:

 Could not run: Could not find file /etc/puppet/modules/osnailyfacter/modular/hiera/hiera.pp

This happens because of
The root cause is task "rsync_core_puppet" doesn't execute and manifests are not delivered to the deploying node

Was found that granular_deploy modifies graph in case of some nodes are affected by deployment (e.g. in case of scaling a cluster):

    2016-08-16 09:00:31.242 DEBUG [7f0057829880] (manager) There are nodes to deploy: node-9.test.domain.local
    2016-08-16 09:00:31.285 DEBUG [7f0057829880] (manager) There are nodes affected by deployment: node-7.test.domain.local node-4.test.domain.local node-5.test.domain.local node-8.test.domain.local node-
    2.test.domain.local node-6.test.domain.local node-3.test.domain.local node-1.test.domain.local

 and this modification of the graph makes all non-reexecutable tasks as `skipped`:

    ...
    2016-08-16 14:59:03.297 DEBUG [7f0057829880] (orchestrator_graph) Task rsync_core_puppet will be skipped.
    2016-08-16 14:59:03.297 DEBUG [7f0057829880] (orchestrator_graph) Task clear_nodes_info will be skipped.
    2016-08-16 14:59:03.297 DEBUG [7f0057829880] (orchestrator_graph) Task generate_keys will be skipped.
    2016-08-16 14:59:03.297 DEBUG [7f0057829880] (orchestrator_graph) Task copy_keys will be skipped.
    2016-08-16 14:59:03.297 DEBUG [7f0057829880] (orchestrator_graph) Task generate_haproxy_keys will be skipped.
    2016-08-16 14:59:03.297 DEBUG [7f0057829880] (orchestrator_graph) Task copy_haproxy_keys will be skipped.
    2016-08-16 14:59:03.298 DEBUG [7f0057829880] (orchestrator_graph) Task sync_time will be skipped.
    ...

 after that the graph is modified to handle only affected nodes and is incorrect for nodes that have to be deployed.

Steps to reproduce:
 1. Deploy 8.0 Fuel node
 2. Create cluster with 3 cntrl + 3 compute+ceph-osd
 3. Upgrade Fuel master node from 8.0 to 9.1 (http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-install-guide/upgrade/upgrade-fuel.html)
 4. Add node with role "compute" to created cluster
 5. Deploy changes

Expected result:
 Deployment succeeds

Actual result:
 Deployment fails as rsync_core_puppet task does not run on the newly added node.
 This happens due to the fact that pre_|post_deployment stages do not contain any tasks. Here is a excerpt of debugged print from task.py for granular_deploy:

   'post_deployment': [],
   'pre_deployment': []

 What's also important that the deployment succeeds if we provision the node first and deploy it then.

Workaround:
 Some modifications can be done to have two graphs but it leads to run whole deployment graph on affected nodes too:

    diff --git a/nailgun/nailgun/task/task.py b/nailgun/nailgun/task/task.py
    index b5f3b6a..e985652 100644
    --- a/nailgun/nailgun/task/task.py
    +++ b/nailgun/nailgun/task/task.py
    @@ -330,9 +330,10 @@ class DeploymentTask(BaseDeploymentTask):
             cls._save_deployment_info(transaction, serialized_cluster)

             if affected_nodes:
    - graph.reexecutable_tasks(events)
    + reexec_graph = graph.copy()
    + reexec_graph.reexecutable_tasks(events)
                 serialized_cluster.extend(deployment_serializers.serialize(
    - graph, transaction.cluster, affected_nodes
    + reexec_graph, transaction.cluster, affected_nodes
                 ))
                 nodes = nodes + affected_nodes
             pre_deployment = stages.pre_deployment_serialize(

Octane's versions:
 8.0.0-1.mos1192
 9.0.0-1.mos1208

ISO's versions:
 8.0 - http://paste.openstack.org/show/538992/
 9.1 - http://paste.openstack.org/show/542518/

Changed in fuel:
importance: Undecided → High
description: updated
Ilya Kharin (akscram)
Changed in fuel:
status: New → Confirmed
Changed in fuel:
assignee: nobody → Fuel Octane (fuel-octane-team)
tags: added: blocker-for-qa
Ilya Kharin (akscram)
Changed in fuel:
assignee: Fuel Octane (fuel-octane-team) → Ilya Kharin (akscram)
Revision history for this message
Ilya Kharin (akscram) wrote :

It seems that some regression of functionality in 9.0 was done for the granular_deployment method that improperly handle the deployment graph in case of affected nodes.

description: updated
Changed in fuel:
assignee: Ilya Kharin (akscram) → Fuel Sustaining (fuel-sustaining-team)
tags: added: area-python
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Dmitry Guryanov (dguryanov)
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 9.1 → 10.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/362673

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/362673
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=1cdc01bb687b2e7d50ef30221f8db239ebdac63a
Submitter: Jenkins
Branch: master

commit 1cdc01bb687b2e7d50ef30221f8db239ebdac63a
Author: Dmitry Guryanov <email address hidden>
Date: Mon Sep 5 17:53:03 2016 +0300

    Fix granular deployment on operational cluster

    Tasks on new nodes shouldn't be skipped in reexecute
    filter.

    Change-Id: I09148b81bd157e1884785b12e2438614f13e700b
    Closes-Bug: 1606823

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/366795

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/366795
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=c3f3d40b22ecb30acf76a8a87fb489f97115752f
Submitter: Jenkins
Branch: stable/mitaka

commit c3f3d40b22ecb30acf76a8a87fb489f97115752f
Author: Dmitry Guryanov <email address hidden>
Date: Wed Sep 7 17:51:37 2016 +0300

    Fix granular deployment on operational cluster

    Tasks on new nodes shouldn't be skipped in reexecute
    filter.

    Backported from 1cdc01bb687b2e7d50ef30221f8db239ebdac63a
    Closes-Bug: 1606823

    Change-Id: Iafbb486219ad007d424979b51c9a2db4f713127f

Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

After resolving of https://bugs.launchpad.net/fuel/+bug/1622579 - the issue got back with new symptoms(snapshot 255):

 2016-09-13 14:42:05.839 ERROR [7fe3ce45a880] (manager) Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nailgun/task/manager.py", line 61, in _call_silently
    to_return = method(task, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 268, in message
    dry_run=dry_run, **kwargs
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 146, in call_deployment_method
    args = getattr(cls, method)(transaction, **kwargs)
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 382, in granular_deploy
    cls._extend_tasks_list(pre_deployment, pre_deployment_affected)
  File "/usr/lib/python2.7/site-packages/nailgun/task/task.py", line 320, in _extend_tasks_list
    t['uids'].extend(src_dict[t['id']]['uids'])
AttributeError: 'set' object has no attribute 'extend'

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/369528

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/369535

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/369535
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=5801a22173378541dfc4e4bb85873571a042539a
Submitter: Jenkins
Branch: stable/mitaka

commit 5801a22173378541dfc4e4bb85873571a042539a
Author: Dmitry Guryanov <email address hidden>
Date: Tue Sep 13 18:24:27 2016 +0300

    convert uids to list before passing to make_*_task

    RoleResolver returns set of uids, but functions, which
    make tasks want list.

    Change-Id: I959a61a53ff55da400423ac871a2b61366c75f9a
    Closes-Bug: #1606823

Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-web 10.0.0rc1

This issue was fixed in the openstack/fuel-web 10.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/369528
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=68c0d42ae03f137589d38c94b5e7274b8eec8525
Submitter: Jenkins
Branch: master

commit 68c0d42ae03f137589d38c94b5e7274b8eec8525
Author: Dmitry Guryanov <email address hidden>
Date: Tue Sep 13 18:24:27 2016 +0300

    convert uids to list before passing to make_*_task

    RoleResolver returns set of uids, but functions, which
    make tasks want list.

    Change-Id: I959a61a53ff55da400423ac871a2b61366c75f9a
    Closes-Bug: #1606823

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-web 10.0.0

This issue was fixed in the openstack/fuel-web 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/425010

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/newton)

Reviewed: https://review.openstack.org/425010
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=0b5b853317bc759d6dec338e72f2242ba76584be
Submitter: Jenkins
Branch: stable/newton

commit 0b5b853317bc759d6dec338e72f2242ba76584be
Author: Dmitry Guryanov <email address hidden>
Date: Tue Sep 13 18:24:27 2016 +0300

    convert uids to list before passing to make_*_task

    RoleResolver returns set of uids, but functions, which
    make tasks want list.

    Change-Id: I959a61a53ff55da400423ac871a2b61366c75f9a
    Closes-Bug: #1606823
    (cherry picked from commit 68c0d42ae03f137589d38c94b5e7274b8eec8525)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-web 11.0.0.0rc1

This issue was fixed in the openstack/fuel-web 11.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.