Galera is not syncing on the slaves sometimes

Bug #1354479 reported by Kirill Omelchenko
56
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Critical
Sergii Golovatiuk
6.0.x
Fix Committed
Critical
Sergii Golovatiuk

Bug Description

http://jenkins-product.srt.mirantis.net:8080/view/0_master_swarm/job/master_fuelmain.system_test.ubuntu.thread_4/127/testReport/%28root%29/deploy_neutron_gre_ha/deploy_neutron_gre_ha/

Failed on one of controllers with next erros in puppet.log:

root@node-3:/var/log# grep -ri '(err)' puppet.log
Thu Aug 07 15:13:02 +0000 2014 Puppet (err): Command exceeded timeout
Thu Aug 07 15:13:02 +0000 2014 /Stage[main]/Galera/Exec[wait-for-synced-state]/returns (err): change from notrun to 0 failed: Command exceeded timeout
Thu Aug 07 15:18:58 +0000 2014 /Stage[main]/Glance::Registry/Service[glance-registry] (err): Failed to call refresh: Could not find init script or upstart conf file for 'glance-registry'
Thu Aug 07 15:18:58 +0000 2014 /Stage[main]/Glance::Registry/Service[glance-registry] (err): Could not find init script or upstart conf file for 'glance-registry'
Thu Aug 07 15:19:11 +0000 2014 /Stage[main]/Neutron::Agents::L3/Service[neutron-l3] (err): Failed to call refresh: undefined method `attributes' for nil:NilClass
Thu Aug 07 15:19:11 +0000 2014 /Stage[main]/Neutron::Agents::L3/Service[neutron-l3] (err): undefined method `attributes' for nil:NilClass

Revision history for this message
Kirill Omelchenko (komelchenko) wrote :
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I can see the following errors from the log: http://pastebin.com/1dkjhttr
And nothing about glance. Wrong snapshot?

Changed in fuel:
status: New → Incomplete
Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

Can be. This one should be the one.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

That one looks wrong as well

Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

Strange thing. Ok, I'll try to reproduce the bug on the latest build and give you an update.

Changed in fuel:
status: Incomplete → Confirmed
assignee: Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando)
summary: - HA Could not find init script or upstart conf file for 'glance-registry'
+ [Ubuntu] HA Could not find init script or upstart conf file for 'glance-
+ registry'
Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: [Ubuntu] HA Could not find init script or upstart conf file for 'glance-registry'

I found the same error in logs from https://bugs.launchpad.net/fuel/+bug/1352964 issue

Changed in fuel:
assignee: Bogdan Dobrelya (bogdando) → Fuel Library Team (fuel-library)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)
Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

Here's the Diagnostic snapshot from an env with the same error.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Unable to reproduce, going investigate it later.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Seems that problem in mysql timeout.

summary: - [Ubuntu] HA Could not find init script or upstart conf file for 'glance-
- registry'
+ Galera is not syncing on the slaves sometimes
Changed in fuel:
importance: High → Critical
Changed in fuel:
assignee: Stanislaw Bogatkin (sbogatkin) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/109606
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=989a64693aa330ebf5bdc59cc363e053ecb60a16
Submitter: Jenkins
Branch: master

commit 989a64693aa330ebf5bdc59cc363e053ecb60a16
Author: Sergii Golovatiuk <email address hidden>
Date: Fri Jul 25 14:18:08 2014 +0000

    Enable xtrabackup methods for Galera SST

    - Enable xtrabackup as SST method for Galera
    - Turn off perfomance_schema on
    - Remove old openssl packet
    - Slightly decrease RAM for innodb_buffer_pool_size
    - Set 1M buffer for send/recieve socket for socat to
      speed up SST between nodes

    Change-Id: I9b0b5eee6deab366324f1f808e32f64e290b444e
    Implements: bp/galera-improvements
    Closes-Bug: 1354479

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #506

"build_id": "2014-09-10_00-01-12", "ostf_sha": "1de6ed1c0b72f6687ffb4bebc2c939b135a88e34", "build_number": "506", "auth_required": true, "api": "1.0", "nailgun_sha": "82091e0d61f252619a0842d0f8debb6b602a61fe", "production": "docker", "fuelmain_sha": "ca1b4839a70a10041f8eaf8b9ac995c8b0d4521a", "astute_sha": "b622d9b36dbdd1e03b282b9ee5b7435ba649e711", "feature_groups": ["mirantis"], "release": "5.1", "release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-10_00-01-12", "ostf_sha": "1de6ed1c0b72f6687ffb4bebc2c939b135a88e34", "build_number": "506", "api": "1.0", "nailgun_sha": "82091e0d61f252619a0842d0f8debb6b602a61fe", "production": "docker", "fuelmain_sha": "ca1b4839a70a10041f8eaf8b9ac995c8b0d4521a", "astute_sha": "b622d9b36dbdd1e03b282b9ee5b7435ba649e711", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "e3f947cc4142210499a282bc5f183c333552fa23"}}}, "fuellib_sha": "e3f947cc4142210499a282bc5f183c333552fa23"

1. Create new environment (Ubuntu, HA mode)
2. Choose GRE segmentation
3. Add 3 controllers+cinder, 1 compute
4. Start deployment. It has failed

Controllers - node-1,2,3

Error on second controller (node-2):

2014-09-10 12:50:22 ERR

 (/Stage[main]/Galera/Exec[wait-initial-sync]) Failed to call refresh: /usr/bin/mysql -uwsrep_sst -ppassword -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q -e Synced -e Initialized && sleep 10 returned 1 instead of one of [0]

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

After analyzing environment, it was found that Ubuntu controllers had 2GB of RAM as stated in https://<email address hidden>/msg01311.html

Anastasia is going to update settings for her environment.

Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

I was able to reproduce this bug on vanilla MOS 5.1 with 3GB of RAM on controllers.

 Attached is the output from Top command.

Environment:

Ubuntu + HA mode
Neutron+VLAN
2 controllers
2 computes + Ceph OSD

Ceph Ephemeral volume enabled
Murano enabled
Use qcow images turned off.
Other settings - defaults.

Strange, but this issue does not appear of I do the same deployment to CentOS.

Revision history for this message
Egor Kotko (ykotko) wrote :

Reproduced on:
{"build_id": "2014-11-18_22-00-23", "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068", "build_number": "114", "auth_required": true, "api": "1.0", "nailgun_sha": "b0add09c4361fee8fc70637c9a6ef42fbe738abe", "production": "docker", "fuelmain_sha": "e556f0e1b00c30ec5c4b374ca2878c047c8686c2", "astute_sha": "65eb911c38afc0e23d187772f9a05f703c685896", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-11-18_22-00-23", "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068", "build_number": "114", "api": "1.0", "nailgun_sha": "b0add09c4361fee8fc70637c9a6ef42fbe738abe", "production": "docker", "fuelmain_sha": "e556f0e1b00c30ec5c4b374ca2878c047c8686c2", "astute_sha": "65eb911c38afc0e23d187772f9a05f703c685896", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "5a5275370b33ab3b9a403728a1c7ad173289e4a0"}}}, "fuellib_sha": "5a5275370b33ab3b9a403728a1c7ad173289e4a0"}

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.