Controllers scale up fails due to galera epoch divergence when new controller's id is smaller than old ones

Bug #1398378 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
High
Sergii Golovatiuk

Bug Description

Scenario:
            1. Create cluster
            2. Add 1 controller node
            3. Deploy the cluster
            4. Add 2 controller nodes
            5. Deploy changes
            6. Run network verification
            7. Add 2 controller nodes
            8. Deploy changes
            9. Run network verification
            10. Run OSTF

Actual result:
deployment on step 5 failed (Add 2 controllers - deploy changes) with Failed to call refresh: execution expired
http://paste.openstack.org/show/143488/ (see node 3, 2 in shnapshot)

Expected:
Cluster ready, ostf passed

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1.1"
  api: "1.0"
  build_number: "45"
  build_id: "2014-11-27_23-41-13"
  astute_sha: "ef8aa0fd0e3ce20709612906f1f0551b5682a6ce"
  fuellib_sha: "15a387462f7be50c4f87ad986d0c81535025c125"
  ostf_sha: "64cb59c681658a7a55cc2c09d079072a41beb346"
  nailgun_sha: "500e36d08a45dbb389bf2bd97673d9bff48ee84d"
  fuelmain_sha: "51e66db7750e9c856ba128f35cfb6724895bf479"

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Changed in fuel:
status: New → Confirmed
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

This issue happens due to divergence of GTID as we are adding new nodes and one of new nodes may become a primary controller as it has a lower ID.

Work around is pretty simple:

If you are adding controllers, ensure that nodes that you are adding nodes with lower ids than your current controllers. If you want to increment your nodes IDs, just delete them from nailgun and rebootstrap them. Evgeniy Li will comment on the details how to do this in the next comment:

tags: added: release-notes
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
Revision history for this message
Evgeniy L (rustyrobot) wrote :

In order to delete node you can follow a standard flow [1], select the node, click delete button and then click Deploy button, after node is discovered it should get new incremented id.

[1] http://docs.mirantis.com/openstack/fuel/fuel-5.1/operations.html?highlight=delete#remove-a-controller-node

Revision history for this message
Vladimir Kuklin (vkuklin) wrote : Re: Controllers scale up fails due to galera epoch divergence

We will also need to fix system tests for 5.1.x branch as they should add only nodes with bigger IDs to the cluster.

summary: - [system_tests] Scalability tests failed with Failed to call refresh:
- execution expired
+ Controllers scale up fails due to galera epoch divergence
summary: - Controllers scale up fails due to galera epoch divergence
+ Controllers scale up fails due to galera epoch divergence when new
+ controller's id is smaller than old ones
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

This issue is not reproducible anymore on new OCF script. MySQL Galera was assembled properly.

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

We need to backport OCF script from 5.1 to 6.0

Changed in fuel:
status: Confirmed → Triaged
Changed in fuel:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.