Galera go down after power off on primary controller

Bug #1370401 reported by Anastasia Palkina
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Sergii Golovatiuk

Bug Description

"build_id": "2014-09-15_00-01-46",
"ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
"build_number": "8",
"auth_required": true,
"api": "1.0",
"nailgun_sha": "b8d8189cc37d6d1b26f4479be6be7313beefb1c8",
"production": "docker",
"fuelmain_sha": "d7ed7973034bde73d3f42c000984423b59b2312b",
"astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13",
"feature_groups": ["experimental"],
"release": "5.1",
"release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-15_00-01-46", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "8", "api": "1.0", "nailgun_sha": "b8d8189cc37d6d1b26f4479be6be7313beefb1c8", "production": "docker", "fuelmain_sha": "d7ed7973034bde73d3f42c000984423b59b2312b", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["experimental"], "release": "5.1", "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"}}}, "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"

1. Create new environment (CentOS, HA mode)
2. Choose GRE segmentation
3. Choose Ceph for images and Ceph Rados
4. Choose Sahara, Murano, Ceilometer
5. Add 3 controllers, 1 compute, 1 cinder+mongo, 3 ceph, 2 mongo
6. Start deployment. It was successful
7. Start OSTF tests. It was successful
8. Power off second controller
9. Start OSTF tests. It was successful
10. Power on second controller
11. Power off primary controller
12. Start OSTF tests. It has failed with error: Keystone client is not available. Please, refer to OpenStack logs to fix this problem
13. Power on primary controller.
14. Start OSTF tests. It has failed with the same error

[root@node-31 ~]# keystone tenant-list
Authorization Failed: An unexpected error prevented the server from fulfilling your request. (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None (HTTP 500)
[root@node-31 ~]# neutron net-list
{"error": {"message": "An unexpected error prevented the server from fulfilling your request. (OperationalError) (2013, \"Lost connection to MySQL server at 'reading initial communication packet', system error: 0\") None None", "code": 500, "title": "Internal Server Error"}}
[root@node-31 ~]# mysql -e 'show status like wsrep%'
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)

[root@node-32 ~]# mysql -e 'show status like wsrep%'
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

[root@node-32 ~]# pcs status
Cluster name:
Last updated: Wed Sep 17 08:49:53 2014
Last change: Wed Sep 17 08:48:56 2014 via crm_attribute on node-33.domain.tld
Stack: classic openais (with plugin)
Current DC: node-33.domain.tld - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
3 Nodes configured, 3 expected votes
22 Resources configured

Online: [ node-31.domain.tld node-32.domain.tld node-33.domain.tld ]

Full list of resources:

 vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-32.domain.tld
 vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-33.domain.tld
 p_openstack-ceilometer-central (ocf::mirantis:ceilometer-agent-central): Started node-32.domain.tld
 p_openstack-ceilometer-alarm-evaluator (ocf::mirantis:ceilometer-alarm-evaluator): Started node-31.domain.tld
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-31.domain.tld node-32.domain.tld node-33.domain.tld ]
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-31.domain.tld ]
     Slaves: [ node-32.domain.tld node-33.domain.tld ]
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-31.domain.tld node-32.domain.tld node-33.domain.tld ]
 p_openstack-heat-engine (ocf::mirantis:openstack-heat-engine): Started node-32.domain.tld
 Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
     Started: [ node-31.domain.tld node-32.domain.tld node-33.domain.tld ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-31.domain.tld node-32.domain.tld node-33.domain.tld ]
 p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started node-32.domain.tld
 p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-33.domain.tld

Failed actions:
    p_mysql_start_0 on node-31.domain.tld 'unknown error' (1): call=98, status=Timed Out, last-rc-change='Wed Sep 17 08:36:00 2014', queued=475002ms, exec=0ms
    p_mysql_monitor_120000 on node-33.domain.tld 'unknown error' (1): call=333, status=complete, last-rc-change='Wed Sep 17 08:43:57 2014', queued=544ms, exec=1ms
    p_mysql_start_0 on node-32.domain.tld 'unknown error' (1): call=131, status=Timed Out, last-rc-change='Wed Sep 17 08:36:00 2014', queued=475180ms, exec=2ms

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please elaborate
10. Power on second controller
11. Power off primary controller

1) What was the time delay between these two?
2) Did you wait for HA health check passed after step 10?

Changed in mos:
status: New → Incomplete
Revision history for this message
Anastasia Palkina (apalkina) wrote :

1) I think about 5 minutes
2) Yes, I wait

Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

Can we please have an update on this bug? Is it still Incomplete?

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

It's incomplete.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Moving the bug to Fuel project, since it is about Galera

affects: mos → fuel
Changed in fuel:
milestone: 5.1 → none
milestone: none → 5.1.2
Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.