MySQL patches by Codership

replaying failure with sysbench (small DB)

Reported by Seppo Jaakola on 2010-01-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
High
Seppo Jaakola

Bug Description

Running sysbench with 3 node cluster with following configuration:

 # load balancer choice
load_balancer=multihost

# gcs backend choice
gcs=vsbes

# sqlgen configuration
user=root
password=rootpass
host=127.0.0.1
port=3306
users=6
duration=40000
requests=0
rows=1000
mode=complex
#mode=simple,nontrx,sp
readonly=off

...resulted in one node having an issue with replaying.
Replaying gives following error:

100103 0:22:25 [ERROR] Slave SQL: Could not execute Update_rows event on table test.sbtest; Can't find record in 'sbtest', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 204, Error_code: 1032

Problem reproduces with debug logging, and following log messages show how problem escalates:

100103 0:22:25 [Note] WSREP: BF kill, with seqno: 312545
100103 0:22:25 [Note] WSREP: Aborting query: COMMIT
100103 0:22:25 [Note] WSREP: kill trx QUERY_COMMITTING for 31279080
100103 0:22:25 [Note] [Debug] WSREP: local.c:read_next_block():625: could not r
ead full cache block, last:145393 cur:145393
100103 0:22:25 [Note] [Debug] WSREP: local.c:read_next_block():625: could not r
ead full cache block, last:145393 cur:145393
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_abort_pre_commit():1
455: abort_pre_commit trx state: 1 seqno: 9223372036854775807
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_pre_commit():1897: l
ocal trx certified, seqno: 27609850 312546 last_seen_trx: 27609847
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_abort_pre_commit():1
482: victim trx is replicating: 9223372036854775807
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_pre_commit():1897: l
ocal trx certified, seqno: 27609851 312547 last_seen_trx: 27609848
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_abort_pre_commit():1
499: victim trx has replicated: 312547
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_abort_pre_commit():1
510: interrupting trx commit: trx_id 31279080 seqno 312547
100103 0:22:25 [Note] [Debug] WSREP: gu_to.c:gu_to_interrupt():434: trying to i
nterrupt used seqno: cancel seqno = 312547, TO seqno = 312548
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_abort_pre_commit():1
534: trx interupt fail in cert_queue: -34
100103 0:22:25 [Note] [Debug] WSREP: gu_to.c:gu_to_interrupt():422: signaling t
o interrupt wait seqno: seqno = 312547, TO seqno = 312544
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_pre_commit():1945: i
nterrupted in commit queue for 312547
100103 0:22:25 [Note] WSREP: commit failed for reason: 4
100103 0:22:25 [Note] WSREP: post rollback for: 31279080
100103 0:22:25 [Note] WSREP replay trx
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:process_query_write_set_applyi
ng():914: GALERA ws commit for: 27609849 312545
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_replay_trx():2313: trx_replay for: 312547 27609851 state: 4, rbr len: 346
100103 0:22:25 [Note] [Debug] WSREP: galera_job_queue.c:job_queue_new_worker():99: new job queue worker, type: 1, id: 1, workers: 2
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_replay_trx():2332: replaying applier in to grab: 1, seqno: 27609851 312547
100103 0:22:25 [Note] [Debug] WSREP: galera_job_queue.c:job_queue_end_job():208: job: 0 complete
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_recv():1369: worker: 0 with seqno: (27609852 - 312548) type: GCS_ACT_TORDERED recvd
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:process_query_write_set():976: remote trx seqno: 27609852 312548 last_seen_trx: 27609851 27609852, cert: 0
100103 0:22:25 [Note] [Debug] WSREP: galera_job_queue.c:job_queue_start_job():162: job: 0 starting
100103 0:22:25 [Note] WSREP: BF kill, with seqno: 312548
100103 0:22:25 [Note] WSREP: Aborting query: void
100103 0:22:25 [Note] WSREP: kill IDLE for 31279086
100103 0:22:25 [Note] WSREP: WSREP signalling aborter
100103 0:22:25 [Warning] WSREP rollback thread wakes for signal
100103 0:22:25 [Note] WSREP: post rollback for: 31279086
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_post_rollback():1633: trx state: 0 at post_rollback for: 9223372036854775807
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_replay_trx():2346: replaying applier starting
100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:ws_conflict_check():183: conflict check for: job1: 312547 type: 1 job2: 312548 type 0
100103 0:22:25 [Note] [Debug] WSREP: galera_job_queue.c:job_queue_start_job():162: job: 1 starting
100103 0:22:25 [Warning] WSREP rollbacker aborted thd: 140720888076624
100103 0:22:25 [ERROR] Slave SQL: Could not execute Update_rows event on table test.sbtest; Can't find record in 'sbtest', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 204, Error_code: 1032

Changed in codership-mysql:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Seppo Jaakola (seppo-jaakola)
milestone: none → 0.7.1
Seppo Jaakola (seppo-jaakola) wrote :

Log analysis reveals following:

100103 0:22:25 [Note] WSREP: BF kill, with seqno: 312545
100103 0:22:25 [Note] WSREP: kill trx QUERY_COMMITTING for 31279080

1. Slave applier with seqno 312545, needs to abort local trx which is in committing phase

100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_pre_commit():1945: i
nterrupted in commit queue for 312547

2. victim has already replicated and is in commit queue, victim's seqno is 312547

100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:mm_galera_replay_trx():2332: replaying applier in to grab: 1, seqno: 27609851 312547

3. victim starts replaying and enters commit queue again

100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:process_query_write_set():976: remote trx seqno: 27609852 312548 last_seen_trx: 27609851 27609852, cert: 0
100103 0:22:25 [Note] [Debug] WSREP: galera_job_queue.c:job_queue_start_job():162: job: 0 starting

4. new WS arrives with seqno 312548 . Certification is OK and slave applier starts applying. Note, that replaying victim has passed cert queue, so this later remote WS can proceed. Note also, that this remote trx has seen our victim trx, and therefore does not conflict with it.

 100103 0:22:25 [Note] [Debug] WSREP: mm_galera.c:ws_conflict_check():183: conflict check for: job1: 312547 type: 1 job2: 312548 type 0
100103 0:22:25 [Note] [Debug] WSREP: galera_job_queue.c:job_queue_start_job():162: job: 1 starting

5. 312546 must have left commit TO (although not visible here), and replaying victim, can grab commit TO. job queue has our victim and slave applier 312548, running

100103 0:22:25 [ERROR] Slave SQL: Could not execute Update_rows event on table test.sbtest; Can't find record in 'sbtest', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 204, Error_code: 1032

6. ...and then this error. This means that 312548 has deleted the row, that 312547 tries to update. This can happen, because 312548 has seen the effects of 312547, and passes certification

Seppo Jaakola (seppo-jaakola) wrote :

There are a few alternatives to fix this issue. A simple solution (which I try first), is to register replaying Ws in job queue immediately, and prevent later slave appliers to start before replaying transaction is complete.

This reduces concurrency in replying phase, a little.

Seppo Jaakola (seppo-jaakola) wrote :

Fix went in Galera library, no need to change wsrep part. Fix seems effective also.

Seppo Jaakola (seppo-jaakola) wrote :

Applying of replaying transaction is now fully serialized with slave transactions. This limits concurrency in repaying phase, and can harm performance. However, this is simple and safe solution for the time being.

Changed in codership-mysql:
status: In Progress → Fix Committed
Changed in codership-mysql:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers