[library] After locking DB access from primary controller cluster is unable to work

Bug #1326829 reported by Egor Kotko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Sergii Golovatiuk
4.1.x
Won't Fix
High
Sergii Golovatiuk
5.0.x
Won't Fix
High
Sergii Golovatiuk

Bug Description

{"build_id": "2014-06-04_09-16-08", "mirantis": "yes", "build_number": "341", "nailgun_sha": "a828d6b7610f872980d5a2113774f1cda6f6810b", "ostf_sha": "c959aa55f83fe2555cf2d382559271c7a9b17467", "fuelmain_sha": "7ed0f85acc0bab4b9157703a618b8cc9fd7de3e1", "astute_sha": "55df06b2e84fa5d71a1cc0e78dbccab5db29d968", "release": "4.1B", "fuellib_sha": "0e96fc5a340cd57f75c454ea8536471379299494"}

Steps to reproduce:
1. Deploy cluster: Centos, HA, Neutron Vlan, 3Controllers, 1 Compute
2. On primary controller emulate non-responsiveness of MySQL - disables MYSQL ports via iptables
3. See status of Galera:
-on node with disabled ports:
http://paste.openstack.org/show/82966/
-on other node:
http://paste.openstack.org/show/82965/

Expected result:
Cluster will works with 2 controllers.

Actual result:
Functionality of cluster is inaccessible.

Revision history for this message
Egor Kotko (ykotko) wrote :
Revision history for this message
Egor Kotko (ykotko) wrote :
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

this bug is going to be targeted in 5.1 release.

Changed in fuel:
milestone: 4.1.1 → 4.1.2
importance: Undecided → High
assignee: nobody → Fuel Library Team (fuel-library)
milestone: 4.1.2 → 5.1
status: New → Confirmed
no longer affects: fuel/5.1.x
tags: added: ha
tags: removed: nailgun
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

There should be a good script for HAProxy to shut down backend if "wsrep_ready" = "OFF". This bug will be addressed in blueprint

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

This test is very synthetic. Though I confirm this case is present

Changed in fuel:
status: Invalid → Confirmed
importance: High → Medium
Revision history for this message
Egor Kotko (ykotko) wrote :

Have reproduced on:
{"build_id": "2014-07-10_00-39-56", "mirantis": "yes", "build_number": "112", "ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f", "nailgun_sha": "f5ff82558f99bb6ca7d5e1617eddddf7142fe857", "production": "docker", "api": "1.0", "fuelmain_sha": "293015843304222ead899270449495af91b06aed", "astute_sha": "5df009e8eab611750309a4c5b5c9b0f7b9d85806", "release": "5.0.1", "fuellib_sha": "364dee37435cbdc85d6b814a61f57800b83bf22d"}

Revision history for this message
Egor Kotko (ykotko) wrote :
Dmitry Ilyin (idv1985)
summary:
summary: - After locking DB access from primary controller cluster is unable to
- work
+ [library] After locking DB access from primary controller cluster is
+ unable to work
Mike Scherbakov (mihgen)
tags: added: release-notes
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Reproduced on
{
build_id: "2014-07-17_11-18-10",
mirantis: "yes",
build_number: "135",
ostf_sha: "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
nailgun_sha: "1d08d6f80b6514085dd8c0af4d437ef5d37e2802",
production: "docker",
api: "1.0",
fuelmain_sha: "c8e13df4c7de3ce3504c2bcb6d51a165b9aae0b6",
astute_sha: "9a74b788be9a7c5682f1c52a892df36e4766ce3f",
release: "5.0.1",
fuellib_sha: "e8c2bb726be6b78c3a34f75c84337a3a5662bb35"
}

Looks like 40-50% envs are affected by this issue after failover.

Changed in fuel:
importance: Medium → High
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

this requires additional HA checks. may be it is worth moving to 6.0

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

This test should be invalid right now. If block port 3307is blocked galera still gets updates from the neighbors. Even in this case unix socket is responsible and many services will be able to get access to mysql (clustercheck script is a good sample).
In order to perform a proper test
1. Disable galera port in INPUT/OUTPUT chain in filter table
iptables -I OUTPUT 1 -p tcp --dport 4567 -j DROP
iptables -I INPUT 1 -p tcp --dport 4567 -j DROP
2. Check if Galera/MySQL is in sync
/usr/local/bin/clustercheck or telnet localhost 49000
3. Try to create a new database in mysql client. Just run mysql client without any parameters in this case it will connect to local mysql server
4. Try to connect to HAProxy interface
mysql -h192.168.0.1 -P3306 -uUSER -pPASSWORD
5. Unblock port 4567.
See if if created database appeared on local mysql server. Try to delete that database.

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.