OpenStack-Ansible

Slow failover Recover on primary node restart

Bug #1665667 reported by Shannon Mitchell on 2017-02-17

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack-Ansible	Fix Released	Medium	Kevin Carter

Bug Description

This is a finding from OSIC failover testing. The way haproxy is configured a single galera node is considered to be primary. If that primary node goes down we see hard failures in applications for 2-5 minutes and connection failures in the haproxy logs going for up to 20 minutes or so. The galera node itself only takes around 10 seconds to come back online and become accessible. It seems like it takes much longer for the pooled connections going through haproxy to recover. Most of these failed connections are keystone and nova-conductor related but not all.

######################################################
# Connection Failure Counts Pulled from HaProxy Logs
######################################################

Seeing downs 10 seconds before service marked DOWN (09:50:28 - 09:50:33)
Was down for 12 seconds before service marked UP (09:50:33 - 09:50:45) - 340 connection errors total from start to time marked up.

Errors continue AFTER it went up and tapered off each minute for the next hour. This script pulled the passing/failing connection counts from haproxy per minute during the time of the restart.

root@infra-1:/var/log/haproxy# ./gather.sh
Time: bad good
-------------------
09:50: 515 71
09:51: 482 20
09:52: 197 13
09:53: 95 18
09:54: 62 119
09:55: 53 216
09:56: 44 93
09:57: 39 90
09:58: 35 122
09:59: 28 145
10:00: 21 106
10:01: 21 123
10:02: 14 95
10:03: 18 136
10:04: 12 156
10:05: 8 181
10:06: 7 142
10:07: 7 132
10:08: 5 167
10:09: 1 181
10:10: 3 129

####################
# Haproxy Settings
####################

root@infra-1:/var/log/haproxy# grep '^backend galera' /etc/haproxy/haproxy.cfg -A13
backend galera-back
    mode tcp
    balance leastconn
    timeout server 5000s
    stick store-request src
    stick-table type ip size 256k expire 30m
    option tcplog
    option mysql-check user monitoring
    option log-health-checks

server infra-2_galera_container-44bd7975 172.22.1.186:3306 check port 3306 inter 12000 rise 1 fall 1

server infra-3_galera_container-4e1c3b2a 172.22.3.82:3306 check port 3306 inter 12000 rise 2 fall 2 backup
server infra-1_galera_container-a62b2d16 172.22.3.97:3306 check port 3306 inter 12000 rise 2 fall 2 backup

#############################
# Galera Log During Restart
#############################

oot@infra-2-galera-container-44bd7975:/var/log/mysql_logs# zcat galera_server_error.log-20170108.gz | awk '/170103 9/'
170103 9:50:57 [Note] /usr/sbin/mysqld: Normal shutdown
170103 9:50:57 [Note] WSREP: Stop replication
170103 9:50:57 [Note] WSREP: Closing send monitor...
170103 9:50:57 [Note] WSREP: Closed send monitor.
170103 9:50:57 [Note] WSREP: gcomm: terminating thread
170103 9:50:57 [Note] WSREP: gcomm: joining thread
170103 9:50:57 [Note] WSREP: gcomm: closing backend
170103 9:50:57 [Note] WSREP: view(view_id(NON_PRIM,42f36b93,111) memb {
170103 9:50:57 [Note] WSREP: view((empty))
170103 9:50:57 [Note] WSREP: gcomm: closed
170103 9:50:57 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
170103 9:50:57 [Note] WSREP: Flow-control interval: [16, 16]
170103 9:50:57 [Note] WSREP: Received NON-PRIMARY.
170103 9:50:57 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 16523229)
170103 9:50:57 [Note] WSREP: Received self-leave message.
170103 9:50:57 [Note] WSREP: Flow-control interval: [0, 0]
170103 9:50:57 [Note] WSREP: Received SELF-LEAVE. Closing connection.
170103 9:50:57 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 16523229)
170103 9:50:57 [Note] WSREP: RECV thread exiting 0: Success
170103 9:50:57 [Note] WSREP: New cluster view: global state: f460653b-bafe-11e6-a0a7-6b002e1c79c8:16523229, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
170103 9:50:57 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170103 9:50:57 [Note] WSREP: New cluster view: global state: f460653b-bafe-11e6-a0a7-6b002e1c79c8:16523229, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 3
170103 9:50:57 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170103 9:50:57 [Note] WSREP: applier thread exiting (code:0)
170103 9:50:57 [Note] WSREP: recv_thread() joined.
170103 9:50:57 [Note] WSREP: Closing replication queue.
170103 9:50:57 [Note] WSREP: Closing slave action queue.
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:59 [Note] WSREP: rollbacker thread exiting
170103 9:50:59 [Note] Event Scheduler: Purging the queue. 0 events
170103 9:50:59 [Note] WSREP: dtor state: CLOSED
170103 9:50:59 [Note] WSREP: mon: entered 701548 oooe fraction 0 oool fraction 4.13372e-05
170103 9:50:59 [Note] WSREP: mon: entered 701548 oooe fraction 0.011697 oool fraction 4.13372e-05
170103 9:50:59 [Note] WSREP: mon: entered 717634 oooe fraction 0 oool fraction 1.39347e-06
170103 9:50:59 [Note] WSREP: cert index usage at exit 0
170103 9:50:59 [Note] WSREP: cert trx map usage at exit 103
170103 9:50:59 [Note] WSREP: deps set usage at exit 0
170103 9:50:59 [Note] WSREP: avg deps dist 62.7248
170103 9:50:59 [Note] WSREP: avg cert interval 0.0200944
170103 9:50:59 [Note] WSREP: cert index size 125
170103 9:50:59 [Note] WSREP: Service thread queue flushed.
170103 9:50:59 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
170103 9:50:59 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0.999824, misses: 143, in use: 0, in pool: 143
170103 9:50:59 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0.987289, misses: 43, in use: 0, in pool: 43
170103 9:50:59 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 16523229)
170103 9:50:59 [Note] WSREP: Flushing memory map to disk...
170103 9:50:59 [Note] InnoDB: FTS optimize thread exiting.
170103 9:50:59 [Note] InnoDB: Starting shutdown...
170103 9:51:00 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
170103 9:51:01 [Note] InnoDB: Shutdown completed; log sequence number 21750856990
170103 9:51:01 [Note] /usr/sbin/mysqld: Shutdown complete
170103 9:51:02 [Note] /usr/sbin/mysqld (mysqld 10.0.28-MariaDB-1~trusty-wsrep) starting as process 45427 ...
170103 9:51:05 [Note] /usr/sbin/mysqld (mysqld 10.0.28-MariaDB-1~trusty-wsrep) starting as process 45468 ...
170103 9:51:05 [Note] WSREP: Read nil XID from storage engines, skipping position init
170103 9:51:05 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
170103 9:51:05 [Note] WSREP: wsrep_load(): Galera 25.3.18(r3632) by Codership Oy <email address hidden> loaded successfully.
170103 9:51:05 [Note] WSREP: CRC-32C: using hardware acceleration.
170103 9:51:05 [Note] WSREP: Found saved state: f460653b-bafe-11e6-a0a7-6b002e1c79c8:16523229
170103 9:51:05 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.22.1.186; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 1024M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false
170103 9:51:05 [Note] WSREP: Service thread queue flushed.
170103 9:51:05 [Note] WSREP: Assign initial position for certification: 16523229, protocol version: -1
170103 9:51:05 [Note] WSREP: wsrep_sst_grab()
170103 9:51:05 [Note] WSREP: Start replication
170103 9:51:05 [Note] WSREP: Setting initial position to f460653b-bafe-11e6-a0a7-6b002e1c79c8:16523229
170103 9:51:05 [Note] WSREP: protonet asio version 0
170103 9:51:05 [Note] WSREP: Using CRC-32C for message checksums.
170103 9:51:05 [Note] WSREP: backend: asio
170103 9:51:05 [Note] WSREP: gcomm thread scheduling priority set to other:0
170103 9:51:05 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
170103 9:51:05 [Note] WSREP: restore pc from disk failed
170103 9:51:05 [Note] WSREP: GMCast version 0
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
170103 9:51:05 [Note] WSREP: EVS version 0
170103 9:51:05 [Note] WSREP: gcomm: connecting to group 'openstack_galera_cluster', peer '172.22.1.186:,172.22.3.82:,172.22.3.97:'
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://0.0.0.0:4567') connection established to 24f5eb32 tcp://172.22.1.186:4567
170103 9:51:05 [Warning] WSREP: (24f5eb32, 'tcp://0.0.0.0:4567') address 'tcp://172.22.1.186:4567' points to own listening address, blacklisting
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://0.0.0.0:4567') connection established to 24f5eb32 tcp://172.22.1.186:4567
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://0.0.0.0:4567') connection established to 42f36b93 tcp://172.22.3.82:4567
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://0.0.0.0:4567') connection established to c31a22d6 tcp://172.22.3.97:4567
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
170103 9:51:06 [Note] WSREP: declaring 42f36b93 at tcp://172.22.3.82:4567 stable
170103 9:51:06 [Note] WSREP: declaring c31a22d6 at tcp://172.22.3.97:4567 stable
170103 9:51:06 [Note] WSREP: Node 42f36b93 state prim
170103 9:51:06 [Note] WSREP: view(view_id(PRIM,24f5eb32,113) memb {
170103 9:51:06 [Note] WSREP: save pc into disk
170103 9:51:06 [Note] WSREP: gcomm: connected
170103 9:51:06 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
170103 9:51:06 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
170103 9:51:06 [Note] WSREP: Opened channel 'openstack_galera_cluster'
170103 9:51:06 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
170103 9:51:06 [Note] WSREP: Waiting for SST to complete.
170103 9:51:06 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 25429412-d19a-11e6-abc2-ef06002d9368
170103 9:51:06 [Note] WSREP: STATE EXCHANGE: sent state msg: 25429412-d19a-11e6-abc2-ef06002d9368
170103 9:51:06 [Note] WSREP: STATE EXCHANGE: got state msg: 25429412-d19a-11e6-abc2-ef06002d9368 from 0 (infra-2_galera_container-44bd7975)
170103 9:51:06 [Note] WSREP: STATE EXCHANGE: got state msg: 25429412-d19a-11e6-abc2-ef06002d9368 from 1 (infra-3_galera_container-4e1c3b2a)
170103 9:51:06 [Note] WSREP: STATE EXCHANGE: got state msg: 25429412-d19a-11e6-abc2-ef06002d9368 from 2 (infra-1_galera_container-a62b2d16)
170103 9:51:06 [Note] WSREP: Quorum results:
170103 9:51:06 [Note] WSREP: Flow-control interval: [28, 28]
170103 9:51:06 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 16523231)
170103 9:51:06 [Note] WSREP: State transfer required:
170103 9:51:06 [Note] WSREP: New cluster view: global state: f460653b-bafe-11e6-a0a7-6b002e1c79c8:16523231, view# 109: Primary, number of nodes: 3, my index: 0, protocol version 3
170103 9:51:06 [Warning] WSREP: Gap in state sequence. Need state transfer.
170103 9:51:06 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.22.1.186' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '45468' --binlog '/var/lib/mysql/mariadb-bin' '
170103 9:51:06 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.22.1.186:4444/xtrabackup_sst//1
170103 9:51:06 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170103 9:51:06 [Note] WSREP: REPL Protocols: 7 (3, 2)
170103 9:51:06 [Note] WSREP: Service thread queue flushed.
170103 9:51:06 [Note] WSREP: Assign initial position for certification: 16523231, protocol version: 3
170103 9:51:06 [Note] WSREP: Service thread queue flushed.
170103 9:51:06 [Note] WSREP: IST receiver addr using tcp://172.22.1.186:4568
170103 9:51:06 [Note] WSREP: Prepared IST receiver, listening at: tcp://172.22.1.186:4568
170103 9:51:06 [Note] WSREP: Member 0.0 (infra-2_galera_container-44bd7975) requested state transfer from '*any*'. Selected 1.0 (infra-3_galera_container-4e1c3b2a)(SYNCED) as donor.
170103 9:51:06 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 16523231)
170103 9:51:06 [Note] WSREP: Requesting state transfer: success, donor: 1
170103 9:51:07 [Note] WSREP: 1.0 (infra-3_galera_container-4e1c3b2a): State transfer to 0.0 (infra-2_galera_container-44bd7975) complete.
170103 9:51:07 [Note] WSREP: Member 1.0 (infra-3_galera_container-4e1c3b2a) synced with group.
170103 9:51:07 [Note] WSREP: SST complete, seqno: 16523229
170103 9:51:07 [Note] InnoDB: Using mutexes to ref count buffer pool pages
170103 9:51:07 [Note] InnoDB: The InnoDB memory heap is disabled
170103 9:51:07 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
170103 9:51:07 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
170103 9:51:07 [Note] InnoDB: Compressed tables use zlib 1.2.8
170103 9:51:07 [Note] InnoDB: Using Linux native AIO
170103 9:51:07 [Note] InnoDB: Using CPU crc32 instructions
170103 9:51:07 [Note] InnoDB: Initializing buffer pool, size = 4.0G
170103 9:51:07 [Note] InnoDB: Completed initialization of buffer pool
170103 9:51:07 [Note] InnoDB: Highest supported file format is Barracuda.
170103 9:51:07 [Note] InnoDB: 128 rollback segment(s) are active.
170103 9:51:07 [Note] InnoDB: Waiting for purge to start
170103 9:51:07 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.32-79.0 started; log sequence number 21750861244
170103 9:51:07 [Note] Plugin 'FEEDBACK' is disabled.
170103 9:51:07 [Note] Server socket created on IP: '::'.
170103 9:51:07 [Note] WSREP: Signalling provider to continue.
170103 9:51:07 [Note] WSREP: SST received: f460653b-bafe-11e6-a0a7-6b002e1c79c8:16523229
170103 9:51:07 [Note] WSREP: Receiving IST: 2 writesets, seqnos 16523229-16523231
170103 9:51:07 [Note] WSREP: IST received: f460653b-bafe-11e6-a0a7-6b002e1c79c8:16523231
170103 9:51:07 [Note] WSREP: 0.0 (infra-2_galera_container-44bd7975): State transfer from 1.0 (infra-3_galera_container-4e1c3b2a) complete.
170103 9:51:07 [Note] WSREP: Shifting JOINER -> JOINED (TO: 16523233)
170103 9:51:07 [Note] WSREP: Member 0.0 (infra-2_galera_container-44bd7975) synced with group.
170103 9:51:07 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 16523233)
170103 9:51:07 [Note] WSREP: Synchronized with group, ready for connections
170103 9:51:07 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170103 9:51:07 [Note] /usr/sbin/mysqld: ready for connections.
170103 9:51:08 [Note] WSREP: (24f5eb32, 'tcp://0.0.0.0:4567') turning message relay requesting off

#########################################
# Snippet of the HAProxy Failure logs
#########################################

Jan 3 09:10:50 localhost haproxy[45498]: 172.22.3.123:45230 [03/Jan/2017:07:47:30.675] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/5000203 3334 cD 1941/1835/1835/1835/0 0/0
Jan 3 09:12:27 localhost haproxy[45498]: 172.22.2.250:45698 [03/Jan/2017:07:34:00.580] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/5906493 19941 cD 1968/1834/1834/1834/0 0/0
Jan 3 09:14:52 localhost haproxy[45498]: 172.22.2.250:42328 [03/Jan/2017:07:29:57.683] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/6295284 11708 cD 1941/1833/1833/1833/0 0/0
Jan 3 09:14:52 localhost haproxy[45498]: 172.22.2.250:42316 [03/Jan/2017:07:29:57.346] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/6295628 18274 cD 1940/1832/1832/1832/0 0/0
Jan 3 09:15:41 localhost haproxy[45498]: 172.22.2.95:41536 [03/Jan/2017:07:28:29.307] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/6432366 69441 cD 1954/1831/1831/1831/0 0/0
Jan 3 09:16:10 localhost haproxy[45498]: 172.22.2.95:57052 [03/Jan/2017:07:52:00.562] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/5049857 11654 cD 1949/1830/1830/1830/0 0/0
Jan 3 09:25:47 localhost haproxy[45498]: 172.22.2.75:55940 [03/Jan/2017:08:02:27.635] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/5000019 3574 cD 1940/1829/1829/1829/0 0/0
Jan 3 09:25:50 localhost haproxy[45498]: 172.22.2.95:45446 [03/Jan/2017:07:33:32.872] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/6738094 29841 cD 1945/1828/1828/1828/0 0/0
Jan 3 09:26:06 localhost haproxy[45498]: 172.22.2.75:34970 [03/Jan/2017:07:31:49.044] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/6857669 17306 cD 1943/1827/1827/1827/0 0/0
Jan 3 09:27:11 localhost haproxy[45498]: 172.22.2.75:37542 [03/Jan/2017:07:34:43.752] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/1/6747913 10440 cD 1935/1826/1826/1826/0 0/0
Jan 3 09:28:31 localhost haproxy[45498]: 172.22.2.95:54652 [03/Jan/2017:07:48:49.331] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/5982305 29841 cD 1935/1825/1825/1825/0 0/0
Jan 3 09:28:37 localhost haproxy[45498]: 172.22.1.251:46704 [03/Jan/2017:07:51:01.415] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/5855976 7007 cD 1935/1824/1824/1824/0 0/0
Jan 3 09:29:38 localhost haproxy[45498]: 172.22.2.95:41366 [03/Jan/2017:07:28:11.950] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/7286774 89241 cD 1933/1823/1823/1823/0 0/0
Jan 3 09:30:13 localhost haproxy[45498]: 172.22.2.250:57126 [03/Jan/2017:07:50:39.371] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/5974279 19941 cD 1931/1822/1822/1822/0 0/0
Jan 3 09:30:58 localhost haproxy[45498]: 172.22.2.95:42686 [03/Jan/2017:07:30:00.866] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/7257811 79341 cD 1925/1821/1821/1821/0 0/0
Jan 3 09:31:01 localhost haproxy[45498]: 172.22.2.27:34718 [03/Jan/2017:07:45:34.640] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/6326591 31421 cD 1936/1820/1820/1820/0 0/0
Jan 3 09:31:35 localhost haproxy[45498]: 172.22.3.155:60622 [03/Jan/2017:08:07:13.289] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/5062318 7007 cD 1975/1819/1819/1819/0 0/0
Jan 3 09:31:48 localhost haproxy[45498]: 172.22.3.155:39644 [03/Jan/2017:07:29:43.166] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/7325216 24172 cD 2006/1818/1818/1818/0 0/0
Jan 3 09:32:18 localhost haproxy[45498]: 172.22.1.251:42498 [03/Jan/2017:07:45:34.220] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/6404728 7007 cD 2029/1817/1817/1817/0 0/0
Jan 3 09:32:36 localhost haproxy[45498]: 172.22.3.155:46772 [03/Jan/2017:07:31:16.133] galera-front-2 galera-back/infra-2_galera_container-44bd7975 1/0/7279976 23031 cD 1991/1816/1816/1816/0 0/0

Tags:

Revision history for this message

Logan V (loganv) wrote on 2017-03-07:

One thing I've noticed is that when I manually place the active galera endpoint in DRAIN/MAINT and force kill all of the connections to that backend in haproxy, the openstack services reconnect much faster. I'd be curious to hear if you could test that and verify the results.

Tuning the haproxy health timers might help with this, but when health detects the endpoint is down I wonder if it force kills all of the connections or tries to allow them to timeout gracefully (or maybe there's an option we should set).

Jean-Philippe Evrard (jean-philippe-evrard) on 2017-03-07

Changed in openstack-ansible:
assignee:	nobody → Jean-Philippe Evrard (jean-philippe-evrard)

Revision history for this message

Jean-Philippe Evrard (jean-philippe-evrard) wrote on 2017-03-09:

I'm not surprised of logan findings.

Quick question, would you try to change your haproxy settings to see the behavior change? Like modifying downinter and slowstart?
Also changing the general timeout.

My point is I think the whole haproxy needs timing changes and more configurability.
We should allow long connections, but also allow a quick restore.

In case of a recovery after failure, changing the slow start can increase agressivity. It's good and bad. We need to tweak this IMO.

Jean-Philippe Evrard (jean-philippe-evrard) on 2017-03-21

Changed in openstack-ansible:
status:	New → Confirmed
importance:	Undecided → Wishlist
assignee:	Jean-Philippe Evrard (jean-philippe-evrard) → nobody

Revision history for this message

Paul Halmos (paul-halmos) wrote on 2017-11-15:

Download full text (7.4 KiB)

We ran into this issue today as well. More observations for you here. When rebooting the infra nodes we found that the Galera node would take a longtime to have a status of "Synced". It would remain in "joined" status[0]. We found that the status wouldn't change to "Synced" until after the "wsrep_local_recv_queue" returned to a value of 0. In this particular case the node in question was the primary node from HAProxy's standpoint[1]. The node remains up and primary in the pool from HAproxy's stand point because HAproxy is only checking if it can login as the monitoring user and not the status of the node in the cluster. In this case it can. However, the node has not joined the cluster and is effectively down until " wsrep_local_state_comment" is synced. The end result is the cloud is broken. All API requests to the galera VIP are going to land on the primary node which is not active in the cluster and errors are returned to the services[2] and drive up utilization on the node with all of the stack traces. The HAproxy Galera check should be more robust and check that the node is synced and in the cluster before passing the health check.

We ran into this issue today as well. More observations for you here. When rebooting the infra nodes we found that the Galera node would take a longtime to have a status of "Synced".  It would remain in "joined" status[0]. We found that the status wouldn't change to "Synced" until after the "wsrep_local_recv_queue" returned to a value of 0.  In this particular case the node in question was the primary node from HAProxy's standpoint[1].  The node remains up and primary in the pool from HAproxy's stand point because HAproxy is only checking if it can login as the monitoring user and not the status of the node in the cluster.  In this case it can.  However, the node has not joined the cluster and is effectively down until " wsrep_local_state_comment" is synced.  The end result is the cloud is broken.  All API requests to the galera VIP are going to land on the primary node which is not active in the cluster and errors are returned to the services[2] and drive up utilization on the node with all of the stack traces.  The HAproxy Galera check should be more robust and check that the node is synced and in the cluster before passing the health check.

[0]
MariaDB [(none)]> show global status like 'wsrep_%';
+------------------------------+-------------------------------------------------+
| Variable_name                | Value                                           |
+------------------------------+-------------------------------------------------+
| wsrep_local_state_uuid       | 172cf19e-c986-11e7-b23e-ced96a3e883b            |
| wsrep_protocol_version       | 7                                               |
| wsrep_last_committed         | 414136                                          |
| wsrep_replicated             | 0                                               |
| wsrep_replicated_bytes       | 0                                               |
| wsrep_repl_keys              | 0                                               |
| wsrep_repl_keys_bytes        | 0                                               |
| wsrep_repl_data_bytes        | 0                                               |
| wsrep_repl_other_bytes       | 0                                               |
| wsrep_received               | 5140                                            |
| wsrep_received_bytes         | 5540262                                         |
| wsrep_local_commits          | 0                                               |
| wsrep_local_cert_failures    | 0                                               |
| wsrep_local_replays          | 0                                               |
| wsrep_local_send_queue       | 0                                               |
| wsrep_local_send_queue_max   | 1                                               |
| wsrep_local_send_queue_min   | 0                                               |
| wsrep_local_send_queue_avg   | 0.000000                                        |
| wsrep_local_recv_queue       | 390                                             |
| wsrep_local_recv_queue_max   | 1438                                            |
| wsrep_local_recv_queue_min   | 0                                               |
| wsrep_local_recv_queue_avg   | 1079.684448                                     |
| wsrep_local_cached_downto    | 409074                                          |
| wsrep_flow_control_paused_ns | 877872864509                                    |
| wsrep_flow_control_paused    | 0.409347                                        |
| wsrep_flow_control_sent      | 65                                              |
| wsrep_flow_control_recv      | 66                                              |
| wsrep_cert_deps_distance     | 56.951052                                       |
| wsrep_apply_oooe             | 0.891276                                        |
| wsrep_apply_oool             | 0.000000                                        |
| wsrep_apply_window           | 23.620531                                       |
| wsrep_commit_oooe            | 0.000000                                        |
| wsrep_commit_oool            | 0.000000                                        |
| wsrep_commit_window          | 21.765556                                       |
| wsrep_local_state            | 3                                               |
*| wsrep_local_state_comment    | Joined                                          |*
| wsrep_cert_index_size        | 66                                              |
| wsrep_causal_reads           | 0                                               |
| wsrep_cert_interval          | 11.836642                                       |
| wsrep_incoming_addresses     | 0.0.0.0:3306,0.0.0.0:3306,0.0.0.0:3306 |
| wsrep_desync_count           | 0                                               |
| wsrep_evs_delayed            |                                                 |
| wsrep_evs_evict_list         |                                                 |
| wsrep_evs_repl_latency       | 0.00067127/0.00067127/0.00067127/0/1            |
| wsrep_evs_state              | OPERATIONAL                                     |
| wsrep_gcomm_uuid             | d2cf151b-ca34-11e7-853d-efeef6f46c49            |
| wsrep_cluster_conf_id        | 23                                              |
| wsrep_cluster_size           | 3                                               |
| wsrep_cluster_state_uuid     | 172cf19e-c986-11e7-b23e-ced96a3e883b            |
| wsrep_cluster_status         | Primary                                         |
| wsrep_connected              | ON                                              |
| wsrep_local_bf_aborts        | 0                                               |
| wsrep_local_index            | 2                                               |
| wsrep_provider_name          | Galera                                          |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>               |
| wsrep_provider_version       | 25.3.20(r3703)                                  |
| wsrep_ready                  | OFF                                             |
| wsrep_thread_count           | 25                                              |
+------------------------------+-------------------------------------------------+
58 rows in set (0.01 sec)

[1]
# Ansible managed: /etc/ansible/roles/haproxy_server/templates/service.j2 modified on 2017-08-21 17:18:32 by root on hdp004.hdpcloud.net

frontend galera-front-1
    bind 10.0.16.10:3306
    option tcplog
    timeout client 5000s
    acl white_list src 127.0.0.1/8 192.168.0.0/16 172.16.0.0/12 10.0.0.0/8
    tcp-request content accept if white_list
    tcp-request content reject
    mode tcp
    default_backend galera-back

backend galera-back
    mode tcp
    balance leastconn
    timeout server 5000s
    stick store-request src
    stick-table type ip size 256k expire 30m
    option tcplog
    option mysql-check user monitoring

*server hdp006_galera_container-d175d0a8 0.0.0.0:3306 check port 3306 inter 12000 rise 1 fall 1*

server hdp004_galera_container-f6c7981d 0.0.0.0:3306 check port 3306 inter 12000 rise 2 fall 2 backup
    server hdp005_galera_container-6b17af2a 10.0.19.153:3306 check port 3306 inter 12000 rise 2 fall 2 backup

[2]
2017-11-15 16:11:17.587 1110 CRITICAL nova [req-4f8f8e7b-ecbb-4c7a-9865-490be073b66a - - - - -] InternalError: (pymysql.err.InternalError) (1047, u'WSREP has not yet prepared node for application use') [SQL: 'SELECT DATABASE()']

Kevin Carter (kevin-carter) on 2017-11-16

Changed in openstack-ansible:
assignee:	nobody → Kevin Carter (kevin-carter)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-16: Fix proposed to openstack-ansible-galera_server (master)

Fix proposed to branch: master
Review: https://review.openstack.org/520665

Changed in openstack-ansible:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-16: Fix proposed to openstack-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/520673

Kevin Carter (kevin-carter) on 2017-11-16

Changed in openstack-ansible:
importance:	Wishlist → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-17: Fix proposed to openstack-ansible-haproxy_server (master)

Fix proposed to branch: master
Review: https://review.openstack.org/520846

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-17: Fix merged to openstack-ansible-haproxy_server (master)

Reviewed: https://review.openstack.org/520846
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-haproxy_server/commit/?id=0eb657614ed82eda0c7e45773aa631c2d8073542
Submitter: Zuul
Branch: master

commit 0eb657614ed82eda0c7e45773aa631c2d8073542
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 18:00:58 2017 -0600

Add option to set the check port

    The option to set the checkport is needed when trying to balance traffic
    to a service that uses a secondary port to verify service availability.
    This is being implemented to support galera which should be using an
    HTTP based check to verify sync status instead of a simple mysql login.
    For reference this is what the folks at Percona are recommending [0].

[0] https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/virt_sandbox.html
Closes-Bug: #1665667

Change-Id: I81c924464aa4b19c2a62f37b5bf26c3c0453786a
Signed-off-by: Kevin Carter <email address hidden>

Changed in openstack-ansible:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-29: Fix merged to openstack-ansible-galera_server (master)

Reviewed: https://review.openstack.org/520665
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-galera_server/commit/?id=94821f8108f589e47f8c164625f4309ec477a600
Submitter: Zuul
Branch: master

commit 94821f8108f589e47f8c164625f4309ec477a600
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 11:30:55 2017 -0600

Implement a proper WSREP check for galera

    The galera cluster rely on WSREP for cluster consistency. While the
    default MySQL monitor will allow us to know when the database node is
    minimally functional it does not provide the ability to query the node
    state allowing loadbalancers, operators, and deployers to know a node
    is healthy prior to being allowed to accept connections. This change
    implements the checkcluster script as provided by the fine folks at
    Percona. The implementation of this check follows the guild-lines noted
    here [0]. With this in-place, we'll be able to convert our haproxy check
    for the galera cluster nodes to use an HTTP check on port 9200 instead
    of the default MySQL login which will provide for a more robust and
    fault tolerant cluster.

[0] https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/virt_sandbox.html
Closes-Bug: #1665667

Change-Id: Ie1b3b9724dd33de1d90634166e585ecceb1f4c96
Signed-off-by: Kevin Carter <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/520673
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=7b3aac28a0a87e5966527829f6b0abcbc2303cc7
Submitter: Zuul
Branch: master

commit 7b3aac28a0a87e5966527829f6b0abcbc2303cc7
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 11:59:21 2017 -0600

Change the galera health check for better cluster health

    The current galera cluster health check simply logs into a cluster node
    but does not check if the node is sync'd. This can lead to an issue
    where a node is placed back into the pool before it is ready. If this
    happens it can lead to a broken OpenStack environment until the wsrep
    received queue is processed which is especially true if the node out of
    sync happens to be the primary.

    Change-Id: I49e371a2743618a0b5544a23e892aa28bb8567eb
    Closes-Bug: #1665667
    Depends-On: Ie1b3b9724dd33de1d90634166e585ecceb1f4c96
    Depends-On: I81c924464aa4b19c2a62f37b5bf26c3c0453786a
    Signed-off-by: Kevin Carter <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix proposed to openstack-ansible-galera_server (stable/pike)

#10

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/527087

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix proposed to openstack-ansible-haproxy_server (stable/pike)

#11

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/527089

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix proposed to openstack-ansible-haproxy_server (stable/ocata)

#12

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/527090

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix proposed to openstack-ansible-haproxy_server (stable/newton)

#13

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/527091

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix proposed to openstack-ansible (stable/pike)

#14

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/527099

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix included in openstack/openstack-ansible-galera_server 17.0.0.0b2

#15

This issue was fixed in the openstack/openstack-ansible-galera_server 17.0.0.0b2 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix included in openstack/openstack-ansible-haproxy_server 17.0.0.0b2

#16

This issue was fixed in the openstack/openstack-ansible-haproxy_server 17.0.0.0b2 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix merged to openstack-ansible-galera_server (stable/pike)

#17

Reviewed: https://review.openstack.org/527087
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-galera_server/commit/?id=ed739a5243b59596455b3488471c8cd81c15acf5
Submitter: Zuul
Branch: stable/pike

commit ed739a5243b59596455b3488471c8cd81c15acf5
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 11:30:55 2017 -0600

Implement a proper WSREP check for galera

[0] https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/virt_sandbox.html

    Combined backport of:
    - https://review.openstack.org/520665
    - https://review.openstack.org/523850

    Closes-Bug: #1665667
    Change-Id: Ie1b3b9724dd33de1d90634166e585ecceb1f4c96
    Signed-off-by: Kevin Carter <email address hidden>

tags:

added: in-stable-pike

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix merged to openstack-ansible-haproxy_server (stable/pike)

#18

Reviewed: https://review.openstack.org/527089
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-haproxy_server/commit/?id=9b0ec183547a663f8a386375eaeecfad55fc7d73
Submitter: Zuul
Branch: stable/pike

commit 9b0ec183547a663f8a386375eaeecfad55fc7d73
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 18:00:58 2017 -0600

Add option to set the check port

[0] https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/virt_sandbox.html
Closes-Bug: #1665667

    Change-Id: I81c924464aa4b19c2a62f37b5bf26c3c0453786a
    Signed-off-by: Kevin Carter <email address hidden>
    (cherry picked from commit 0eb657614ed82eda0c7e45773aa631c2d8073542)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix merged to openstack-ansible-haproxy_server (stable/ocata)

#19

Reviewed: https://review.openstack.org/527090
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-haproxy_server/commit/?id=43d4290787c2ee600346182c9cda8ccc77d5e4c6
Submitter: Zuul
Branch: stable/ocata

commit 43d4290787c2ee600346182c9cda8ccc77d5e4c6
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 18:00:58 2017 -0600

Add option to set the check port

[0] https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/virt_sandbox.html
Closes-Bug: #1665667

    Change-Id: I81c924464aa4b19c2a62f37b5bf26c3c0453786a
    Signed-off-by: Kevin Carter <email address hidden>
    (cherry picked from commit 0eb657614ed82eda0c7e45773aa631c2d8073542)

tags:	added: in-stable-ocata
tags:	added: in-stable-newton

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-11: Fix merged to openstack-ansible-haproxy_server (stable/newton)

#20

Reviewed: https://review.openstack.org/527091
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-haproxy_server/commit/?id=44ba7a0e96c65a96d3241726f3d9ccfa9b391b63
Submitter: Zuul
Branch: stable/newton

commit 44ba7a0e96c65a96d3241726f3d9ccfa9b391b63
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 18:00:58 2017 -0600

Add option to set the check port

[0] https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/virt_sandbox.html
Closes-Bug: #1665667

    Change-Id: I81c924464aa4b19c2a62f37b5bf26c3c0453786a
    Signed-off-by: Kevin Carter <email address hidden>
    (cherry picked from commit 0eb657614ed82eda0c7e45773aa631c2d8073542)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-13: Fix merged to openstack-ansible (stable/pike)

#21

Reviewed: https://review.openstack.org/527099
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=8c0ce1c62f73f880ed255b20ea932852288d23e9
Submitter: Zuul
Branch: stable/pike

commit 8c0ce1c62f73f880ed255b20ea932852288d23e9
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 11:59:21 2017 -0600

Change the galera health check for better cluster health

    Combined backport of:
    - https://review.openstack.org/520673
    - https://review.openstack.org/523854
    - https://review.openstack.org/524107

    Closes-Bug: #1665667
    Change-Id: I49e371a2743618a0b5544a23e892aa28bb8567eb
    Depends-On: I81c924464aa4b19c2a62f37b5bf26c3c0453786a
    Depends-On: Ie1b3b9724dd33de1d90634166e585ecceb1f4c96
    Signed-off-by: Kevin Carter <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-13: Fix proposed to openstack-ansible-galera_server (stable/ocata)

#22

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/527679

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-13: Fix proposed to openstack-ansible-galera_server (stable/newton)

#23

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/527684

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-13: Fix merged to openstack-ansible-galera_server (stable/newton)

#24

Reviewed: https://review.openstack.org/527684
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-galera_server/commit/?id=f46436b376d1f21cd541c62dd010a6cc5ff5c8bf
Submitter: Zuul
Branch: stable/newton

commit f46436b376d1f21cd541c62dd010a6cc5ff5c8bf
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 11:30:55 2017 -0600

Implement a proper WSREP check for galera

[0] https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/virt_sandbox.html

    Combined backport of:
    - https://review.openstack.org/520665
    - https://review.openstack.org/523850

    Conflicts:
    >------handlers/main.yml
    >------tasks/galera_post_install.yml
    >------tests/test-galera-server-functional.yml
    >------vars/suse-42.yml

    Closes-Bug: #1665667
    Change-Id: Ie1b3b9724dd33de1d90634166e585ecceb1f4c96
    Signed-off-by: Kevin Carter <email address hidden>
    (cherry picked from commit ed739a5243b59596455b3488471c8cd81c15acf5)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-14: Fix proposed to openstack-ansible (stable/ocata)

#25

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/527926

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-14: Fix proposed to openstack-ansible (stable/newton)

#26

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/527928

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-14: Fix merged to openstack-ansible (stable/ocata)

#27

Reviewed: https://review.openstack.org/527926
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=7e7aa69f7e009d1d10783adfd0c113c957210f40
Submitter: Zuul
Branch: stable/ocata

commit 7e7aa69f7e009d1d10783adfd0c113c957210f40
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 11:59:21 2017 -0600

Change the galera health check for better cluster health

    Combined backport of:
    - https://review.openstack.org/520673
    - https://review.openstack.org/523854
    - https://review.openstack.org/524107

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-15: Fix merged to openstack-ansible (stable/newton)

#28

Reviewed: https://review.openstack.org/527928
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=32230e8cd982367940a9b68f65f1e73d6f14d2c4
Submitter: Zuul
Branch: stable/newton

commit 32230e8cd982367940a9b68f65f1e73d6f14d2c4
Author: Kevin Carter <email address hidden>
Date: Thu Nov 16 11:59:21 2017 -0600

Change the galera health check for better cluster health

    Combined backport of:
    - https://review.openstack.org/520673
    - https://review.openstack.org/523854
    - https://review.openstack.org/524107

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-22: Fix included in openstack/openstack-ansible 16.0.6

#29

This issue was fixed in the openstack/openstack-ansible 16.0.6 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-22: Fix included in openstack/openstack-ansible-galera_server 16.0.6

#30

This issue was fixed in the openstack/openstack-ansible-galera_server 16.0.6 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-22: Fix included in openstack/openstack-ansible-haproxy_server 16.0.6

#31

This issue was fixed in the openstack/openstack-ansible-haproxy_server 16.0.6 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-22: Fix included in openstack/openstack-ansible 15.1.14

#32

This issue was fixed in the openstack/openstack-ansible 15.1.14 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-22: Fix included in openstack/openstack-ansible-galera_server 15.1.14

#33

This issue was fixed in the openstack/openstack-ansible-galera_server 15.1.14 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-22: Fix included in openstack/openstack-ansible-haproxy_server 15.1.14

#34

This issue was fixed in the openstack/openstack-ansible-haproxy_server 15.1.14 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-30: Fix included in openstack/openstack-ansible 17.0.0.0b3

#35

This issue was fixed in the openstack/openstack-ansible 17.0.0.0b3 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-06: Fix included in openstack/openstack-ansible 14.2.14

#36

This issue was fixed in the openstack/openstack-ansible 14.2.14 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-06: Fix included in openstack/openstack-ansible-galera_server 14.2.14

#37

This issue was fixed in the openstack/openstack-ansible-galera_server 14.2.14 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-06: Fix included in openstack/openstack-ansible-haproxy_server 14.2.14

#38

This issue was fixed in the openstack/openstack-ansible-haproxy_server 14.2.14 release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.