Adding HA controllers one by one fails

Bug #1350266 reported by Nikita Gubenko
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Library (Deprecated)
5.0.x
Won't Fix
High
Fuel Library (Deprecated)

Bug Description

fuel 5.0 release on ubuntu

Tried to deploy 3 controllers one by one - failed on the 3rd controller with
 (/Stage[corosync_setup]/Osnailyfacter::Cluster_ha::Virtual_ips/Cluster::Virtual_ips[public_old]/Cluster::Virtual_ip[public_old]/Cs_commit[vip__public_old]/cib) change from absent to vip__public_old failed: Execution of '/usr/sbin/crm_shadow --force --commit vip__public_old' returned 50: Could not commit shadow instance 'vip__public_old' to the CIB: Application of an update diff failed

How to replicate
1. create "Multi-node with HA" env
2. add 1 controller -> deploy changes
3. add 2 controller -> deploy changes
4. add 3 controller -> deploy changes
5. 3rd controller failed

Attaching diagnostic snapshot.

If this type of deploy order is not recommended/won't work we should prohibit users to do it.

Revision history for this message
Nikita Gubenko (nikita-gubenko) wrote :
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 5.1
Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: nobody → Dmitry Ilyin (idv1985)
Dmitry Ilyin (idv1985)
Changed in fuel:
status: New → In Progress
Revision history for this message
Dmitry Ilyin (idv1985) wrote :

Looks like you are using 5.0.x code base. We had several issue with scalability of HA deployment there but as for 5.1 code base they were fixed according to this blueprint https://blueprints.launchpad.net/fuel/+spec/ha-pacemaker-improvements
In 5.1 release adding controllers one-by-one should be working properly.

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
status: Fix Committed → Incomplete
importance: Undecided → High
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Marked as Won't fix for 5.0.1: we can't backport related changes at the current moment when we are 1 step before acceptance phase for 5.0.1.

Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → nobody
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

Still present in 5.1 .
It is practically unusual to deploy in such way, but for clarity we should prohibit it or document it. I'm lowering severity.

Changed in fuel:
status: Incomplete → Confirmed
importance: High → Medium
milestone: 5.1 → next
Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

Diagnostic snapshot for future reference

Revision history for this message
Andrew Woodward (xarses) wrote :

If the following workflow(s) don't work then this is in fact a high priority issue.

start with -> end with
1 controller -> 3 (+2) controllers
3 controllers -> 5 (+2) controllers
5 controllers -> 3 (-2) controllers

If any case where replacing one or two failed controllers in the same task (ie:

start with 3 controllers
remove 1 or 2 controllers in ui
add the same number of controllers back from un-provisioned nodes
deploy changes (removing and adding the nodes in the same task)

if this fails then the priority is critical

if neither of these are an issue, then it can be lowered and targeted for a later release

Changed in fuel:
milestone: next → 5.1
importance: Medium → High
Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

I'm testing this right now, it will take some time.

Anyway we should prevent from forming cluster resulting with even number of controllers, wither programmatically or in the docs.

Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

After adding 2nd and 3dr controller they both finished in offline state. Some crucial processes were blocked (cib, mysql) on kernel level, but after while I was able to login, although fuel couldn't recognize those nodes as active.
Galera cluster ended up with only one cluster member - on first deployed controller:
| wsrep_cluster_size | 1 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_provider_name | Galera |
| wsrep_ready | ON |
+------------------------------+--------------------------------------+

I have limited trust to this test as it was run on already used installation after removing previous cluster. I will redo this on fresh install tomorrow.

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote : Re: [Bug 1350266] Re: Adding HA controllers one by one fails

There are couple problems when you add nodes one by one.

1. astute.yaml should be generated on current nodes to reflect the changes
(new IPs as a sample)
2. puppet should be applied to current nodes to regenerate some sensitive
configs (Corosync, HAProxy, RabbitMQ, Galera)
3. puppet should be applied to new nodes to install and configure them.

Dmitry Ilyin made some changes to Fuel logic. I believe these patches are
on review right now

--
Best regards,
Sergii Golovatiuk,
Skype #golserge
IRC #holser

On Fri, Aug 8, 2014 at 1:45 AM, Tomasz 'Zen' Napierala <
<email address hidden>> wrote:

> After adding 2nd and 3dr controller they both finished in offline state.
> Some crucial processes were blocked (cib, mysql) on kernel level, but after
> while I was able to login, although fuel couldn't recognize those nodes as
> active.
> Galera cluster ended up with only one cluster member - on first deployed
> controller:
> | wsrep_cluster_size | 1 |
> | wsrep_cluster_status | Primary |
> | wsrep_connected | ON |
> | wsrep_provider_name | Galera |
> | wsrep_ready | ON |
> +------------------------------+--------------------------------------+
>
> I have limited trust to this test as it was run on already used
> installation after removing previous cluster. I will redo this on fresh
> install tomorrow.
>
> --
> You received this bug notification because you are a member of Fuel
> Library Team, which is a bug assignee.
> https://bugs.launchpad.net/bugs/1350266
>
> Title:
> Adding HA controllers one by one fails
>
> Status in Fuel: OpenStack installer that works:
> Confirmed
> Status in Fuel for OpenStack 5.0.x series:
> Won't Fix
>
> Bug description:
> fuel 5.0 release on ubuntu
>
> Tried to deploy 3 controllers one by one - failed on the 3rd controller
> with
>
> (/Stage[corosync_setup]/Osnailyfacter::Cluster_ha::Virtual_ips/Cluster::Virtual_ips[public_old]/Cluster::Virtual_ip[public_old]/Cs_commit[vip__public_old]/cib)
> change from absent to vip__public_old failed: Execution of
> '/usr/sbin/crm_shadow --force --commit vip__public_old' returned 50: Could
> not commit shadow instance 'vip__public_old' to the CIB: Application of an
> update diff failed
>
> How to replicate
> 1. create "Multi-node with HA" env
> 2. add 1 controller -> deploy changes
> 3. add 2 controller -> deploy changes
> 4. add 3 controller -> deploy changes
> 5. 3rd controller failed
>
> Attaching diagnostic snapshot.
>
> If this type of deploy order is not recommended/won't work we should
> prohibit users to do it.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/fuel/+bug/1350266/+subscriptions
>

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I am not sure that we support this workflow. You need to have 1 or 3 controllers or more, 2 controllers setup is not supported at all.

Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

We should not, it deoes not make sense at all to have even number of controllers.
Anyway, 3 times success with adding 2 controllers to existing 1 controller, so this workflow works fine.

Changed in fuel:
status: Confirmed → Invalid
tags: added: release-notes
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.