Fuel for OpenStack

Adding HA controllers one by one fails

Bug #1350266 reported by Nikita Gubenko on 2014-07-30

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	Fuel Library (Deprecated)	Fuel for OpenStack 5.1
	5.0.x	Won't Fix	High	Fuel Library (Deprecated)	Fuel for OpenStack 5.0.1

Bug Description

fuel 5.0 release on ubuntu

Tried to deploy 3 controllers one by one - failed on the 3rd controller with
(/Stage[corosync_setup]/Osnailyfacter::Cluster_ha::Virtual_ips/Cluster::Virtual_ips[public_old]/Cluster::Virtual_ip[public_old]/Cs_commit[vip__public_old]/cib) change from absent to vip__public_old failed: Execution of '/usr/sbin/crm_shadow --force --commit vip__public_old' returned 50: Could not commit shadow instance 'vip__public_old' to the CIB: Application of an update diff failed

How to replicate
1. create "Multi-node with HA" env
2. add 1 controller -> deploy changes
3. add 2 controller -> deploy changes
4. add 3 controller -> deploy changes
5. 3rd controller failed

Attaching diagnostic snapshot.

If this type of deploy order is not recommended/won't work we should prohibit users to do it.

Tags:

Revision history for this message

Nikita Gubenko (nikita-gubenko) wrote on 2014-07-30:

fuel-snapshot-2014-07-30_10-01-40.tgz Edit (35.6 MiB, application/x-tar)

Mike Scherbakov (mihgen) on 2014-07-30

Changed in fuel:
milestone:	none → 5.1

Dmitry Ilyin (idv1985) on 2014-07-30

Changed in fuel:
assignee:	nobody → Dmitry Ilyin (idv1985)

Dmitry Ilyin (idv1985) on 2014-07-30

Changed in fuel:
status:	New → In Progress

Revision history for this message

Dmitry Ilyin (idv1985) wrote on 2014-07-30:

Looks like you are using 5.0.x code base. We had several issue with scalability of HA deployment there but as for 5.1 code base they were fixed according to this blueprint https://blueprints.launchpad.net/fuel/+spec/ha-pacemaker-improvements
In 5.1 release adding controllers one-by-one should be working properly.

Changed in fuel:
status:	In Progress → Fix Committed

Nastya Urlapova (aurlapova) on 2014-07-30

Changed in fuel:
status:	Fix Committed → Incomplete
importance:	Undecided → High

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-07-30:

Marked as Won't fix for 5.0.1: we can't backport related changes at the current moment when we are 1 step before acceptance phase for 5.0.1.

Dmitry Ilyin (idv1985) on 2014-07-31

Changed in fuel:
assignee:	Dmitry Ilyin (idv1985) → nobody
assignee:	nobody → Fuel Library Team (fuel-library)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-07-31:

related https://bugs.launchpad.net/fuel/+bug/1283062 5.0.x milestone

Revision history for this message

Tomasz 'Zen' Napierala (tzn) wrote on 2014-08-07:

Still present in 5.1 .
It is practically unusual to deploy in such way, but for clarity we should prohibit it or document it. I'm lowering severity.

Changed in fuel:
status:	Incomplete → Confirmed
importance:	High → Medium
milestone:	5.1 → next

Revision history for this message

Tomasz 'Zen' Napierala (tzn) wrote on 2014-08-07:

fuel-snapshot-2014-08-07_16-18-18.tgz Edit (16.2 MiB, application/x-tar)

Diagnostic snapshot for future reference

Revision history for this message

Andrew Woodward (xarses) wrote on 2014-08-07:

If the following workflow(s) don't work then this is in fact a high priority issue.

start with -> end with
1 controller -> 3 (+2) controllers
3 controllers -> 5 (+2) controllers
5 controllers -> 3 (-2) controllers

If any case where replacing one or two failed controllers in the same task (ie:

start with 3 controllers
remove 1 or 2 controllers in ui
add the same number of controllers back from un-provisioned nodes
deploy changes (removing and adding the nodes in the same task)

if this fails then the priority is critical

if neither of these are an issue, then it can be lowered and targeted for a later release

Changed in fuel:
milestone:	next → 5.1
importance:	Medium → High

Revision history for this message

Tomasz 'Zen' Napierala (tzn) wrote on 2014-08-07:

I'm testing this right now, it will take some time.

Anyway we should prevent from forming cluster resulting with even number of controllers, wither programmatically or in the docs.

Revision history for this message

Tomasz 'Zen' Napierala (tzn) wrote on 2014-08-07:

I have limited trust to this test as it was run on already used installation after removing previous cluster. I will redo this on fresh install tomorrow.

Revision history for this message

Sergii Golovatiuk (sgolovatiuk) wrote on 2014-08-08: Re: [Bug 1350266] Re: Adding HA controllers one by one fails

#10

There are couple problems when you add nodes one by one.

1. astute.yaml should be generated on current nodes to reflect the changes
(new IPs as a sample)
2. puppet should be applied to current nodes to regenerate some sensitive
configs (Corosync, HAProxy, RabbitMQ, Galera)
3. puppet should be applied to new nodes to install and configure them.

Dmitry Ilyin made some changes to Fuel logic. I believe these patches are
on review right now

--
Best regards,
Sergii Golovatiuk,
Skype #golserge
IRC #holser

On Fri, Aug 8, 2014 at 1:45 AM, Tomasz 'Zen' Napierala <
<email address hidden>> wrote:

> After adding 2nd and 3dr controller they both finished in offline state.
> Some crucial processes were blocked (cib, mysql) on kernel level, but after
> while I was able to login, although fuel couldn't recognize those nodes as
> active.
> Galera cluster ended up with only one cluster member - on first deployed
> controller:
> | wsrep_cluster_size | 1 |
> | wsrep_cluster_status | Primary |
> | wsrep_connected | ON |
> | wsrep_provider_name | Galera |
> | wsrep_ready | ON |
> +------------------------------+--------------------------------------+
>
> I have limited trust to this test as it was run on already used
> installation after removing previous cluster. I will redo this on fresh
> install tomorrow.
>
> --
> You received this bug notification because you are a member of Fuel
> Library Team, which is a bug assignee.
> https://bugs.launchpad.net/bugs/1350266
>
> Title:
> Adding HA controllers one by one fails
>
> Status in Fuel: OpenStack installer that works:
> Confirmed
> Status in Fuel for OpenStack 5.0.x series:
> Won't Fix
>
> Bug description:
> fuel 5.0 release on ubuntu
>
> Tried to deploy 3 controllers one by one - failed on the 3rd controller
> with
>
> (/Stage[corosync_setup]/Osnailyfacter::Cluster_ha::Virtual_ips/Cluster::Virtual_ips[public_old]/Cluster::Virtual_ip[public_old]/Cs_commit[vip__public_old]/cib)
> change from absent to vip__public_old failed: Execution of
> '/usr/sbin/crm_shadow --force --commit vip__public_old' returned 50: Could
> not commit shadow instance 'vip__public_old' to the CIB: Application of an
> update diff failed
>
> How to replicate
> 1. create "Multi-node with HA" env
> 2. add 1 controller -> deploy changes
> 3. add 2 controller -> deploy changes
> 4. add 3 controller -> deploy changes
> 5. 3rd controller failed
>
> Attaching diagnostic snapshot.
>
> If this type of deploy order is not recommended/won't work we should
> prohibit users to do it.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/fuel/+bug/1350266/+subscriptions
>

There are couple problems when you add nodes one by one.

Dmitry Ilyin made some changes to Fuel logic. I believe these patches are
on review right now

--
Best regards,
Sergii Golovatiuk,
Skype #golserge
IRC #holser

On Fri, Aug 8, 2014 at 1:45 AM, Tomasz 'Zen' Napierala <
tnapierala@mirantis.com> wrote:

> After adding 2nd and 3dr controller they both finished in offline state.
> Some crucial processes were blocked (cib, mysql) on kernel level, but after
> while I was able to login, although fuel couldn't recognize those nodes as
> active.
> Galera cluster ended up with only one cluster member - on first deployed
> controller:
> | wsrep_cluster_size           | 1                                    |
> | wsrep_cluster_status         | Primary                              |
> | wsrep_connected              | ON                                   |
> | wsrep_provider_name          | Galera                               |
> | wsrep_ready                  | ON                                   |
> +------------------------------+--------------------------------------+
>
> I have limited trust to this test as it was run on already used
> installation after removing previous cluster. I will redo this on fresh
> install tomorrow.
>
> --
> You received this bug notification because you are a member of Fuel
> Library Team, which is a bug assignee.
> https://bugs.launchpad.net/bugs/1350266
>
> Title:
>   Adding HA controllers one by one fails
>
> Status in Fuel: OpenStack installer that works:
>   Confirmed
> Status in Fuel for OpenStack 5.0.x series:
>   Won't Fix
>
> Bug description:
>   fuel 5.0 release on ubuntu
>
>   Tried to deploy 3 controllers one by one - failed on the 3rd controller
> with
>
>  (/Stage[corosync_setup]/Osnailyfacter::Cluster_ha::Virtual_ips/Cluster::Virtual_ips[public_old]/Cluster::Virtual_ip[public_old]/Cs_commit[vip__public_old]/cib)
> change from absent to vip__public_old failed: Execution of
> '/usr/sbin/crm_shadow --force --commit vip__public_old' returned 50: Could
> not commit shadow instance 'vip__public_old' to the CIB: Application of an
> update diff failed
>
>   How to replicate
>   1. create "Multi-node with HA" env
>   2. add 1 controller -> deploy changes
>   3. add 2 controller -> deploy changes
>   4. add 3 controller -> deploy changes
>   5. 3rd controller failed
>
>   Attaching diagnostic snapshot.
>
>   If this type of deploy order is not recommended/won't work we should
>   prohibit users to do it.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/fuel/+bug/1350266/+subscriptions
>

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-08-11:

#11

I am not sure that we support this workflow. You need to have 1 or 3 controllers or more, 2 controllers setup is not supported at all.

Revision history for this message

Tomasz 'Zen' Napierala (tzn) wrote on 2014-08-11:

#12

We should not, it deoes not make sense at all to have even number of controllers.
Anyway, 3 times success with adding 2 controllers to existing 1 controller, so this workflow works fine.

Vladimir Kuklin (vkuklin) on 2014-08-12

Changed in fuel:
status:	Confirmed → Invalid

Nastya Urlapova (aurlapova) on 2014-08-12

tags:

added: release-notes

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.