Galera is not syncing on the slaves sometimes

Bug #1354479 reported by Kirill Omelchenko on 2014-08-08
56
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Critical
Sergii Golovatiuk
6.0.x
Critical
Sergii Golovatiuk

Bug Description

http://jenkins-product.srt.mirantis.net:8080/view/0_master_swarm/job/master_fuelmain.system_test.ubuntu.thread_4/127/testReport/%28root%29/deploy_neutron_gre_ha/deploy_neutron_gre_ha/

Failed on one of controllers with next erros in puppet.log:

root@node-3:/var/log# grep -ri '(err)' puppet.log
Thu Aug 07 15:13:02 +0000 2014 Puppet (err): Command exceeded timeout
Thu Aug 07 15:13:02 +0000 2014 /Stage[main]/Galera/Exec[wait-for-synced-state]/returns (err): change from notrun to 0 failed: Command exceeded timeout
Thu Aug 07 15:18:58 +0000 2014 /Stage[main]/Glance::Registry/Service[glance-registry] (err): Failed to call refresh: Could not find init script or upstart conf file for 'glance-registry'
Thu Aug 07 15:18:58 +0000 2014 /Stage[main]/Glance::Registry/Service[glance-registry] (err): Could not find init script or upstart conf file for 'glance-registry'
Thu Aug 07 15:19:11 +0000 2014 /Stage[main]/Neutron::Agents::L3/Service[neutron-l3] (err): Failed to call refresh: undefined method `attributes' for nil:NilClass
Thu Aug 07 15:19:11 +0000 2014 /Stage[main]/Neutron::Agents::L3/Service[neutron-l3] (err): undefined method `attributes' for nil:NilClass

Kirill Omelchenko (komelchenko) wrote :
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
Bogdan Dobrelya (bogdando) wrote :

I can see the following errors from the log: http://pastebin.com/1dkjhttr
And nothing about glance. Wrong snapshot?

Changed in fuel:
status: New → Incomplete
Kirill Omelchenko (komelchenko) wrote :

Can be. This one should be the one.

Bogdan Dobrelya (bogdando) wrote :

That one looks wrong as well

Kirill Omelchenko (komelchenko) wrote :

Strange thing. Ok, I'll try to reproduce the bug on the latest build and give you an update.

Changed in fuel:
status: Incomplete → Confirmed
assignee: Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando)
summary: - HA Could not find init script or upstart conf file for 'glance-registry'
+ [Ubuntu] HA Could not find init script or upstart conf file for 'glance-
+ registry'

I found the same error in logs from https://bugs.launchpad.net/fuel/+bug/1352964 issue

Changed in fuel:
assignee: Bogdan Dobrelya (bogdando) → Fuel Library Team (fuel-library)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)
Kirill Omelchenko (komelchenko) wrote :

Here's the Diagnostic snapshot from an env with the same error.

Stanislaw Bogatkin (sbogatkin) wrote :

Unable to reproduce, going investigate it later.

Stanislaw Bogatkin (sbogatkin) wrote :

Seems that problem in mysql timeout.

summary: - [Ubuntu] HA Could not find init script or upstart conf file for 'glance-
- registry'
+ Galera is not syncing on the slaves sometimes
Changed in fuel:
importance: High → Critical
Changed in fuel:
assignee: Stanislaw Bogatkin (sbogatkin) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/109606
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=989a64693aa330ebf5bdc59cc363e053ecb60a16
Submitter: Jenkins
Branch: master

commit 989a64693aa330ebf5bdc59cc363e053ecb60a16
Author: Sergii Golovatiuk <email address hidden>
Date: Fri Jul 25 14:18:08 2014 +0000

    Enable xtrabackup methods for Galera SST

    - Enable xtrabackup as SST method for Galera
    - Turn off perfomance_schema on
    - Remove old openssl packet
    - Slightly decrease RAM for innodb_buffer_pool_size
    - Set 1M buffer for send/recieve socket for socat to
      speed up SST between nodes

    Change-Id: I9b0b5eee6deab366324f1f808e32f64e290b444e
    Implements: bp/galera-improvements
    Closes-Bug: 1354479

Changed in fuel:
status: In Progress → Fix Committed
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #506

"build_id": "2014-09-10_00-01-12", "ostf_sha": "1de6ed1c0b72f6687ffb4bebc2c939b135a88e34", "build_number": "506", "auth_required": true, "api": "1.0", "nailgun_sha": "82091e0d61f252619a0842d0f8debb6b602a61fe", "production": "docker", "fuelmain_sha": "ca1b4839a70a10041f8eaf8b9ac995c8b0d4521a", "astute_sha": "b622d9b36dbdd1e03b282b9ee5b7435ba649e711", "feature_groups": ["mirantis"], "release": "5.1", "release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-10_00-01-12", "ostf_sha": "1de6ed1c0b72f6687ffb4bebc2c939b135a88e34", "build_number": "506", "api": "1.0", "nailgun_sha": "82091e0d61f252619a0842d0f8debb6b602a61fe", "production": "docker", "fuelmain_sha": "ca1b4839a70a10041f8eaf8b9ac995c8b0d4521a", "astute_sha": "b622d9b36dbdd1e03b282b9ee5b7435ba649e711", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "e3f947cc4142210499a282bc5f183c333552fa23"}}}, "fuellib_sha": "e3f947cc4142210499a282bc5f183c333552fa23"

1. Create new environment (Ubuntu, HA mode)
2. Choose GRE segmentation
3. Add 3 controllers+cinder, 1 compute
4. Start deployment. It has failed

Controllers - node-1,2,3

Error on second controller (node-2):

2014-09-10 12:50:22 ERR

 (/Stage[main]/Galera/Exec[wait-initial-sync]) Failed to call refresh: /usr/bin/mysql -uwsrep_sst -ppassword -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q -e Synced -e Initialized && sleep 10 returned 1 instead of one of [0]

Anastasia Palkina (apalkina) wrote :
Changed in fuel:
status: Fix Committed → Confirmed

After analyzing environment, it was found that Ubuntu controllers had 2GB of RAM as stated in https://<email address hidden>/msg01311.html

Anastasia is going to update settings for her environment.

Changed in fuel:
status: Confirmed → Fix Committed
Miroslav Anashkin (manashkin) wrote :

I was able to reproduce this bug on vanilla MOS 5.1 with 3GB of RAM on controllers.

 Attached is the output from Top command.

Environment:

Ubuntu + HA mode
Neutron+VLAN
2 controllers
2 computes + Ceph OSD

Ceph Ephemeral volume enabled
Murano enabled
Use qcow images turned off.
Other settings - defaults.

Strange, but this issue does not appear of I do the same deployment to CentOS.

Egor Kotko (ykotko) wrote :

Reproduced on:
{"build_id": "2014-11-18_22-00-23", "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068", "build_number": "114", "auth_required": true, "api": "1.0", "nailgun_sha": "b0add09c4361fee8fc70637c9a6ef42fbe738abe", "production": "docker", "fuelmain_sha": "e556f0e1b00c30ec5c4b374ca2878c047c8686c2", "astute_sha": "65eb911c38afc0e23d187772f9a05f703c685896", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-11-18_22-00-23", "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068", "build_number": "114", "api": "1.0", "nailgun_sha": "b0add09c4361fee8fc70637c9a6ef42fbe738abe", "production": "docker", "fuelmain_sha": "e556f0e1b00c30ec5c4b374ca2878c047c8686c2", "astute_sha": "65eb911c38afc0e23d187772f9a05f703c685896", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "5a5275370b33ab3b9a403728a1c7ad173289e4a0"}}}, "fuellib_sha": "5a5275370b33ab3b9a403728a1c7ad173289e4a0"}

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers