Slow failover Recover on primary node restart
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Fix Released
|
Medium
|
Kevin Carter |
Bug Description
This is a finding from OSIC failover testing. The way haproxy is configured a single galera node is considered to be primary. If that primary node goes down we see hard failures in applications for 2-5 minutes and connection failures in the haproxy logs going for up to 20 minutes or so. The galera node itself only takes around 10 seconds to come back online and become accessible. It seems like it takes much longer for the pooled connections going through haproxy to recover. Most of these failed connections are keystone and nova-conductor related but not all.
#######
# Connection Failure Counts Pulled from HaProxy Logs
#######
Seeing downs 10 seconds before service marked DOWN (09:50:28 - 09:50:33)
Was down for 12 seconds before service marked UP (09:50:33 - 09:50:45) - 340 connection errors total from start to time marked up.
Errors continue AFTER it went up and tapered off each minute for the next hour. This script pulled the passing/failing connection counts from haproxy per minute during the time of the restart.
root@infra-
Time: bad good
-------------------
09:50: 515 71
09:51: 482 20
09:52: 197 13
09:53: 95 18
09:54: 62 119
09:55: 53 216
09:56: 44 93
09:57: 39 90
09:58: 35 122
09:59: 28 145
10:00: 21 106
10:01: 21 123
10:02: 14 95
10:03: 18 136
10:04: 12 156
10:05: 8 181
10:06: 7 142
10:07: 7 132
10:08: 5 167
10:09: 1 181
10:10: 3 129
#######
# Haproxy Settings
#######
root@infra-
backend galera-back
mode tcp
balance leastconn
timeout server 5000s
stick store-request src
stick-table type ip size 256k expire 30m
option tcplog
option mysql-check user monitoring
option log-health-checks
server infra-2_
server infra-3_
server infra-1_
#######
# Galera Log During Restart
#######
oot@infra-
170103 9:50:57 [Note] /usr/sbin/mysqld: Normal shutdown
170103 9:50:57 [Note] WSREP: Stop replication
170103 9:50:57 [Note] WSREP: Closing send monitor...
170103 9:50:57 [Note] WSREP: Closed send monitor.
170103 9:50:57 [Note] WSREP: gcomm: terminating thread
170103 9:50:57 [Note] WSREP: gcomm: joining thread
170103 9:50:57 [Note] WSREP: gcomm: closing backend
170103 9:50:57 [Note] WSREP: view(view_
170103 9:50:57 [Note] WSREP: view((empty))
170103 9:50:57 [Note] WSREP: gcomm: closed
170103 9:50:57 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
170103 9:50:57 [Note] WSREP: Flow-control interval: [16, 16]
170103 9:50:57 [Note] WSREP: Received NON-PRIMARY.
170103 9:50:57 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 16523229)
170103 9:50:57 [Note] WSREP: Received self-leave message.
170103 9:50:57 [Note] WSREP: Flow-control interval: [0, 0]
170103 9:50:57 [Note] WSREP: Received SELF-LEAVE. Closing connection.
170103 9:50:57 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 16523229)
170103 9:50:57 [Note] WSREP: RECV thread exiting 0: Success
170103 9:50:57 [Note] WSREP: New cluster view: global state: f460653b-
170103 9:50:57 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170103 9:50:57 [Note] WSREP: New cluster view: global state: f460653b-
170103 9:50:57 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170103 9:50:57 [Note] WSREP: applier thread exiting (code:0)
170103 9:50:57 [Note] WSREP: recv_thread() joined.
170103 9:50:57 [Note] WSREP: Closing replication queue.
170103 9:50:57 [Note] WSREP: Closing slave action queue.
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:57 [Note] WSREP: applier thread exiting (code:6)
170103 9:50:59 [Note] WSREP: rollbacker thread exiting
170103 9:50:59 [Note] Event Scheduler: Purging the queue. 0 events
170103 9:50:59 [Note] WSREP: dtor state: CLOSED
170103 9:50:59 [Note] WSREP: mon: entered 701548 oooe fraction 0 oool fraction 4.13372e-05
170103 9:50:59 [Note] WSREP: mon: entered 701548 oooe fraction 0.011697 oool fraction 4.13372e-05
170103 9:50:59 [Note] WSREP: mon: entered 717634 oooe fraction 0 oool fraction 1.39347e-06
170103 9:50:59 [Note] WSREP: cert index usage at exit 0
170103 9:50:59 [Note] WSREP: cert trx map usage at exit 103
170103 9:50:59 [Note] WSREP: deps set usage at exit 0
170103 9:50:59 [Note] WSREP: avg deps dist 62.7248
170103 9:50:59 [Note] WSREP: avg cert interval 0.0200944
170103 9:50:59 [Note] WSREP: cert index size 125
170103 9:50:59 [Note] WSREP: Service thread queue flushed.
170103 9:50:59 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
170103 9:50:59 [Note] WSREP: MemPool(
170103 9:50:59 [Note] WSREP: MemPool(
170103 9:50:59 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 16523229)
170103 9:50:59 [Note] WSREP: Flushing memory map to disk...
170103 9:50:59 [Note] InnoDB: FTS optimize thread exiting.
170103 9:50:59 [Note] InnoDB: Starting shutdown...
170103 9:51:00 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
170103 9:51:01 [Note] InnoDB: Shutdown completed; log sequence number 21750856990
170103 9:51:01 [Note] /usr/sbin/mysqld: Shutdown complete
170103 9:51:02 [Note] /usr/sbin/mysqld (mysqld 10.0.28-
170103 9:51:05 [Note] /usr/sbin/mysqld (mysqld 10.0.28-
170103 9:51:05 [Note] WSREP: Read nil XID from storage engines, skipping position init
170103 9:51:05 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/
170103 9:51:05 [Note] WSREP: wsrep_load(): Galera 25.3.18(r3632) by Codership Oy <email address hidden> loaded successfully.
170103 9:51:05 [Note] WSREP: CRC-32C: using hardware acceleration.
170103 9:51:05 [Note] WSREP: Found saved state: f460653b-
170103 9:51:05 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.22.1.186; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_
170103 9:51:05 [Note] WSREP: Service thread queue flushed.
170103 9:51:05 [Note] WSREP: Assign initial position for certification: 16523229, protocol version: -1
170103 9:51:05 [Note] WSREP: wsrep_sst_grab()
170103 9:51:05 [Note] WSREP: Start replication
170103 9:51:05 [Note] WSREP: Setting initial position to f460653b-
170103 9:51:05 [Note] WSREP: protonet asio version 0
170103 9:51:05 [Note] WSREP: Using CRC-32C for message checksums.
170103 9:51:05 [Note] WSREP: backend: asio
170103 9:51:05 [Note] WSREP: gcomm thread scheduling priority set to other:0
170103 9:51:05 [Warning] WSREP: access file(/var/
170103 9:51:05 [Note] WSREP: restore pc from disk failed
170103 9:51:05 [Note] WSREP: GMCast version 0
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://
170103 9:51:05 [Note] WSREP: EVS version 0
170103 9:51:05 [Note] WSREP: gcomm: connecting to group 'openstack_
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://
170103 9:51:05 [Warning] WSREP: (24f5eb32, 'tcp://
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://
170103 9:51:05 [Note] WSREP: (24f5eb32, 'tcp://
170103 9:51:06 [Note] WSREP: declaring 42f36b93 at tcp://172.
170103 9:51:06 [Note] WSREP: declaring c31a22d6 at tcp://172.
170103 9:51:06 [Note] WSREP: Node 42f36b93 state prim
170103 9:51:06 [Note] WSREP: view(view_
170103 9:51:06 [Note] WSREP: save pc into disk
170103 9:51:06 [Note] WSREP: gcomm: connected
170103 9:51:06 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
170103 9:51:06 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
170103 9:51:06 [Note] WSREP: Opened channel 'openstack_
170103 9:51:06 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
170103 9:51:06 [Note] WSREP: Waiting for SST to complete.
170103 9:51:06 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 25429412-
170103 9:51:06 [Note] WSREP: STATE EXCHANGE: sent state msg: 25429412-
170103 9:51:06 [Note] WSREP: STATE EXCHANGE: got state msg: 25429412-
170103 9:51:06 [Note] WSREP: STATE EXCHANGE: got state msg: 25429412-
170103 9:51:06 [Note] WSREP: STATE EXCHANGE: got state msg: 25429412-
170103 9:51:06 [Note] WSREP: Quorum results:
170103 9:51:06 [Note] WSREP: Flow-control interval: [28, 28]
170103 9:51:06 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 16523231)
170103 9:51:06 [Note] WSREP: State transfer required:
170103 9:51:06 [Note] WSREP: New cluster view: global state: f460653b-
170103 9:51:06 [Warning] WSREP: Gap in state sequence. Need state transfer.
170103 9:51:06 [Note] WSREP: Running: 'wsrep_
170103 9:51:06 [Note] WSREP: Prepared SST request: xtrabackup-
170103 9:51:06 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170103 9:51:06 [Note] WSREP: REPL Protocols: 7 (3, 2)
170103 9:51:06 [Note] WSREP: Service thread queue flushed.
170103 9:51:06 [Note] WSREP: Assign initial position for certification: 16523231, protocol version: 3
170103 9:51:06 [Note] WSREP: Service thread queue flushed.
170103 9:51:06 [Note] WSREP: IST receiver addr using tcp://172.
170103 9:51:06 [Note] WSREP: Prepared IST receiver, listening at: tcp://172.
170103 9:51:06 [Note] WSREP: Member 0.0 (infra-
170103 9:51:06 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 16523231)
170103 9:51:06 [Note] WSREP: Requesting state transfer: success, donor: 1
170103 9:51:07 [Note] WSREP: 1.0 (infra-
170103 9:51:07 [Note] WSREP: Member 1.0 (infra-
170103 9:51:07 [Note] WSREP: SST complete, seqno: 16523229
170103 9:51:07 [Note] InnoDB: Using mutexes to ref count buffer pool pages
170103 9:51:07 [Note] InnoDB: The InnoDB memory heap is disabled
170103 9:51:07 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
170103 9:51:07 [Note] InnoDB: GCC builtin __atomic_
170103 9:51:07 [Note] InnoDB: Compressed tables use zlib 1.2.8
170103 9:51:07 [Note] InnoDB: Using Linux native AIO
170103 9:51:07 [Note] InnoDB: Using CPU crc32 instructions
170103 9:51:07 [Note] InnoDB: Initializing buffer pool, size = 4.0G
170103 9:51:07 [Note] InnoDB: Completed initialization of buffer pool
170103 9:51:07 [Note] InnoDB: Highest supported file format is Barracuda.
170103 9:51:07 [Note] InnoDB: 128 rollback segment(s) are active.
170103 9:51:07 [Note] InnoDB: Waiting for purge to start
170103 9:51:07 [Note] InnoDB: Percona XtraDB (http://
170103 9:51:07 [Note] Plugin 'FEEDBACK' is disabled.
170103 9:51:07 [Note] Server socket created on IP: '::'.
170103 9:51:07 [Note] WSREP: Signalling provider to continue.
170103 9:51:07 [Note] WSREP: SST received: f460653b-
170103 9:51:07 [Note] WSREP: Receiving IST: 2 writesets, seqnos 16523229-16523231
170103 9:51:07 [Note] WSREP: IST received: f460653b-
170103 9:51:07 [Note] WSREP: 0.0 (infra-
170103 9:51:07 [Note] WSREP: Shifting JOINER -> JOINED (TO: 16523233)
170103 9:51:07 [Note] WSREP: Member 0.0 (infra-
170103 9:51:07 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 16523233)
170103 9:51:07 [Note] WSREP: Synchronized with group, ready for connections
170103 9:51:07 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170103 9:51:07 [Note] /usr/sbin/mysqld: ready for connections.
170103 9:51:08 [Note] WSREP: (24f5eb32, 'tcp://
#######
# Snippet of the HAProxy Failure logs
#######
Jan 3 09:10:50 localhost haproxy[45498]: 172.22.3.123:45230 [03/Jan/
Jan 3 09:12:27 localhost haproxy[45498]: 172.22.2.250:45698 [03/Jan/
Jan 3 09:14:52 localhost haproxy[45498]: 172.22.2.250:42328 [03/Jan/
Jan 3 09:14:52 localhost haproxy[45498]: 172.22.2.250:42316 [03/Jan/
Jan 3 09:15:41 localhost haproxy[45498]: 172.22.2.95:41536 [03/Jan/
Jan 3 09:16:10 localhost haproxy[45498]: 172.22.2.95:57052 [03/Jan/
Jan 3 09:25:47 localhost haproxy[45498]: 172.22.2.75:55940 [03/Jan/
Jan 3 09:25:50 localhost haproxy[45498]: 172.22.2.95:45446 [03/Jan/
Jan 3 09:26:06 localhost haproxy[45498]: 172.22.2.75:34970 [03/Jan/
Jan 3 09:27:11 localhost haproxy[45498]: 172.22.2.75:37542 [03/Jan/
Jan 3 09:28:31 localhost haproxy[45498]: 172.22.2.95:54652 [03/Jan/
Jan 3 09:28:37 localhost haproxy[45498]: 172.22.1.251:46704 [03/Jan/
Jan 3 09:29:38 localhost haproxy[45498]: 172.22.2.95:41366 [03/Jan/
Jan 3 09:30:13 localhost haproxy[45498]: 172.22.2.250:57126 [03/Jan/
Jan 3 09:30:58 localhost haproxy[45498]: 172.22.2.95:42686 [03/Jan/
Jan 3 09:31:01 localhost haproxy[45498]: 172.22.2.27:34718 [03/Jan/
Jan 3 09:31:35 localhost haproxy[45498]: 172.22.3.155:60622 [03/Jan/
Jan 3 09:31:48 localhost haproxy[45498]: 172.22.3.155:39644 [03/Jan/
Jan 3 09:32:18 localhost haproxy[45498]: 172.22.1.251:42498 [03/Jan/
Jan 3 09:32:36 localhost haproxy[45498]: 172.22.3.155:46772 [03/Jan/
Changed in openstack-ansible: | |
assignee: | nobody → Jean-Philippe Evrard (jean-philippe-evrard) |
Changed in openstack-ansible: | |
status: | New → Confirmed |
importance: | Undecided → Wishlist |
assignee: | Jean-Philippe Evrard (jean-philippe-evrard) → nobody |
Changed in openstack-ansible: | |
assignee: | nobody → Kevin Carter (kevin-carter) |
Changed in openstack-ansible: | |
importance: | Wishlist → Medium |
One thing I've noticed is that when I manually place the active galera endpoint in DRAIN/MAINT and force kill all of the connections to that backend in haproxy, the openstack services reconnect much faster. I'd be curious to hear if you could test that and verify the results.
Tuning the haproxy health timers might help with this, but when health detects the endpoint is down I wonder if it force kills all of the connections or tries to allow them to timeout gracefully (or maybe there's an option we should set).