New controller can not be added to the ready cluster

Bug #1494507 reported by slava valyavskiy on 2015-09-10
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Fuel Python (Deprecated)
7.0.x
High
Unassigned

Bug Description

ISO info:

####################
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "263"
  build_id: "263"
######################

Steps to reproduce:
1. update FM node from 5.1 to 7.0 release
2. create ubuntu cluster 3 ctrl/mongo + 2 compute
3. deploy it
4. add one more controller node to the cluster when deployment process will be finished
5. start deploy

Expected results:
Cluster with new controller is sucessfully deployed

Real results:
Deployment is failed on cluster-haproxy task:
################
priority: 1500
type: puppet
uids:
- '15'
parameters:
  puppet_modules: "/etc/puppet/modules"
  puppet_manifest: "/etc/puppet/modules/osnailyfacter/modular/cluster-haproxy/cluster-haproxy.pp"
  timeout: 3600
  cwd: "/"
################

puppet log from failed node: http://pastebin.com/cYra4FQD

I'm digging little bit dipper and have found that new node has built its own cluster(etc. haproxy resource has not been found in its cib)

################
root@node-15:~# crm_mon --one-shot
Last updated: Thu Sep 10 21:37:00 2015
Last change: Thu Sep 10 13:49:03 2015
Stack: corosync
Current DC: node-15.test.domain.local (15) - partition WITHOUT quorum
Version: 1.1.12-561c4cf
1 Nodes configured
7 Resources configured

Online: [ node-15.test.domain.local ]

 Clone Set: clone_p_vrouter [p_vrouter]
     Started: [ node-15.test.domain.local ]
 vip__management (ocf::fuel:ns_IPaddr2): Started node-15.test.domain.local
 vip__vrouter_pub (ocf::fuel:ns_IPaddr2): Started node-15.test.domain.local
 vip__vrouter (ocf::fuel:ns_IPaddr2): Started node-15.test.domain.local
 vip__public (ocf::fuel:ns_IPaddr2): Started node-15.test.domain.local
 vip__zbx_vip_mgmt (ocf::fuel:ns_IPaddr2): Started node-15.test.domain.local
 Master/Slave Set: master_p_conntrackd [p_conntrackd]
     Masters: [ node-15.test.domain.local ]
#################

I have inspected /etc/corosync/corosync.conf file from old controller and didn't find new node in the cluster's node list. Also, I take a look into astute.yaml file on old controller nodes and didn't find 'cluster' task in the task list - http://pastebin.com/ATuduYBB , cause in my opinion this task in charging of corosync.conf file configuration.

################
- priority: 1100
  type: puppet
  uids:
  - '15'
  parameters:
    puppet_modules: "/etc/puppet/modules"
    puppet_manifest: "/etc/puppet/modules/osnailyfacter/modular/cluster/cluster.pp"
    timeout: 3600
    cwd: "/"
###############

So, it seems that we don't re-apply this task for new controller nodes and cluster nodes can not sucessfully communicate with new one. I don't know how we calculate set of tasks for ready controller nodes , but it seems that we should consider 'cluster' task in current case.

slava valyavskiy (slava-val-al) wrote :

Astute log from the FM node. 'node-15' - name of the new controller node.

description: updated
description: updated
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
milestone: none → 7.0
Nastya Urlapova (aurlapova) wrote :

1) please, try iso #288 and 2) did you upgrade master from 5.1 to 7.0 you missed upgrade from 6.0 to 6.1, correct?

Changed in fuel:
status: New → Incomplete
slava valyavskiy (slava-val-al) wrote :

How new iso vesrsion can resolve this issue? Did we have the similar bug where the same problem has been resolved?
5.1 -> 6.0 -> 6.1 -> 7.0

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Oleg S. Gelbukh (gelbuhos)
status: Incomplete → In Progress
Changed in fuel:
importance: Undecided → High
milestone: 7.0 → 7.0-updates
Oleg S. Gelbukh (gelbuhos) wrote :

The root cause of this problem is change in serialization of deployment information for nodes in 'ready' status.

Since in 6.1 the default deployment info for 'ready' node wasn't serialized, we had to merge deployment information for 'ready' nodes with default deployment information to get all nodes in the cluster properly configured.

In 7.0, however, default deployment information for 'New' cluster includes all nodes, both 'ready' and 'discover'. On the other hand, deployment information is no longer serialized on the fly, but rather fetched from 'replaced_deployment_info' attribute in 'nodes' table. As we modified and uploaded the information before the deployment, replaced deployment info became inconsistent after adding new controllers: they didn't make it to the replaced deployment settings of the new controller.

This, in turn, caused primary controller to ignore addition of subsequent controllers.

no longer affects: fuel/8.0.x

Change abandoned by Oleg Gelbukh (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/229305
Reason: Need to backport more generic fix from master branch.

Reviewed: https://review.openstack.org/229305
Committed: https://git.openstack.org/cgit/stackforge/fuel-octane/commit/?id=d58ab97613f1d1fe989386656a3584328ce2ba95
Submitter: Jenkins
Branch: stable/7.0

commit d58ab97613f1d1fe989386656a3584328ce2ba95
Author: Oleg Gelbukh <email address hidden>
Date: Tue Sep 29 12:30:34 2015 +0000

    Properly use deployment info in 7.0

    In 7.0, default deployment information is returned
    for 'ready' nodes and must be used instead of outdated
    replaced deployment information.

    Closes-bug: 1494507
    Closes-bug: 1501728
    Change-Id: I545bfabb6c548667464de03fd5aa005d3336002d

tags: added: feature-upgrade
tags: added: module-octane
Dmitry Pyzhov (dpyzhov) on 2015-10-22
tags: added: area-octane
Dmitry Pyzhov (dpyzhov) on 2015-10-29
tags: added: area-python
removed: area-octane

Fix proposed to branch: master
Review: https://review.openstack.org/246676

Changed in fuel:
assignee: Oleg S. Gelbukh (gelbuhos) → Yuriy Taraday (yorik-sar)
Download full text (11.6 KiB)

Reviewed: https://review.openstack.org/246676
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=cae1d8af5d3ed7a6901fdb363f40e3bf7c37f26d
Submitter: Jenkins
Branch: master

commit e27cea930477dcaf5be29bfa4febd270dbe99e97
Author: Oleg Gelbukh <email address hidden>
Date: Thu Nov 12 15:15:57 2015 +0000

    Version bump to 1.0.0

    Change-Id: I51735d21337d51f52eead9625e2d9f3378c88799

commit 4d3c949e1f6c8319658ac713461e9f8f4a345eb4
Author: Oleg Gelbukh <email address hidden>
Date: Thu Nov 12 14:30:09 2015 +0000

    Revert "Set RPM version 0.2.0"

    This reverts commit 1c722afa0f78cfa3601f3a2b71085c6788052885.
    Version downgrade must be avoided.

    Change-Id: I6b3817ffa48ab321fe99bed7cb2df6a0941211ce

commit 1c722afa0f78cfa3601f3a2b71085c6788052885
Author: Oleg Gelbukh <email address hidden>
Date: Sun Nov 8 14:41:40 2015 +0000

    Set RPM version 0.2.0

    Change-Id: Ie0c75b979375e716c33082b9a4e83ffaffc7f42c

commit 3d06b9d8ddd3714166bdedeb91da5df6b8c9f7a6
Author: Valyavskiy Viacheslav <email address hidden>
Date: Sun Oct 4 03:14:37 2015 +0300

    Use more correct approach to detect resources status

    Closes-bug: 1499696
    Change-Id: Ied72bc68ba41e47334796f09b407ebd8a3fb7f40

commit c315735e681928648fd4549956c3621c4caba4f9
Author: Sergey Novikov <email address hidden>
Date: Thu Oct 29 18:57:21 2015 +0300

    Add handling of case when node may haven't full name in services data

    Change-Id: Icf8dc176fc74fc1c5ee888ddb920b1acc58c127d

commit cf83b87b846e6613dc996d8a8dc70723bcbe84d1
Author: Oleg Gelbukh <email address hidden>
Date: Tue Oct 27 13:57:04 2015 +0000

    Try to revert patches to Puppet modules every time

    If octane command was interrupted or failed before patches
    to puppet were reverted, the patch will be applied and repeated
    command will fail.

    Try to revert the patch every time before apply it and skip
    failures. Only actually apply the patch if revert is not assumed.

    Change-Id: I5719dba3c621f307fe5d6ae2356f15d38ee28ff9

commit a3226740bfacc15da032746d3093f77ecc0323ad
Author: Oleg Gelbukh <email address hidden>
Date: Fri Oct 23 12:48:27 2015 +0000

    Start corosync services after control plane switch for multiple controllers

    Before cluster is stopped, some corosync services are in 'Stop' status.
    Starting cluster doesn't bring them up, just restores state of the cluster.
    Explicitly start corosync services after the cluster is back.

    Related-bug: 1506398
    Change-Id: I38e90180f71e2786e18ffb67ec43a50aa2c5bee6

commit b3e496b43ecdd92a2a696147e2a487fcfe0b2c0d
Author: Oleg Gelbukh <email address hidden>
Date: Fri Oct 23 11:58:35 2015 +0000

    Delete shell scripts and unused shell libs

    As migration to Python is complete, remove unused shell scripts and libs

    Change-Id: I5cc13bebfd0423f73c46722d276807154f2b79e8

commit 6b0a7649cfcf546ffe410e62aaaaf72bec9e5c2d
Author: Oleg Gelbukh <email address hidden>
Date: Tue Oct 20 11:22:12 2015 +0000

    Stop corosync in upgrade seed during control plane switch

    Stop corosync cluster before switching to upgrade...

Changed in fuel:
status: In Progress → Fix Committed

Change abandoned by Oleg Gelbukh (<email address hidden>) on branch: master
Review: https://review.openstack.org/228871
Reason: Obsoleted by merge with stable/7.0

Bogdan Dobrelya (bogdando) wrote :

Reopening as it was reproduced in https://bugs.launchpad.net/fuel/+bug/1528488

Changed in fuel:
status: Fix Committed → Confirmed

Currently, this bug was fixed as a fix in fuel-octane project, but, in fact, if we wanna to get it operational for separate provision/deploy calls then problem should be fixed in fuel-web project.

Changed in fuel:
assignee: Yuriy Taraday (yorik-sar) → nobody
assignee: nobody → Fuel Python Team (fuel-python)
Bogdan Dobrelya (bogdando) wrote :

When deploying new controllers, the cluster.pp task shall be ensured on *all* of the controllers, otherwise we will get this bug or like https://bugs.launchpad.net/fuel/+bug/1528488

tags: added: granular life-cycle-management
Bogdan Dobrelya (bogdando) wrote :

The bug https://bugs.launchpad.net/fuel/+bug/1528488 is not a duplicate, returned back to the fix committed

Changed in fuel:
status: Confirmed → Fix Committed
Vladimir (vushakov) on 2016-01-15
tags: added: on-verification
Tatyanka (tatyana-leontovich) wrote :

verified :
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "466"
  build_id: "466"
  fuel-nailgun_sha: "f81311bbd6fee2665e3f96dcac55f72889b2f38c"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "6823f1d4005a634b8436109ab741a2194e2d32e0"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "fe03d887361eb80232e9914eae5b8d54304df781"
  fuel-ostf_sha: "ab5fd151fc6c1aa0b35bc2023631b1f4836ecd61"
  fuel-mirror_sha: "b62f3cce5321fd570c6589bc2684eab994c3f3f2"
  fuelmenu_sha: "fac143f4dfa75785758e72afbdc029693e94ff2b"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "727f7076f04cb0caccc9f305b149a2b5b5c2af3a"

Changed in fuel:
status: Fix Committed → Fix Released
Vladimir (vushakov) on 2016-01-29
tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments