Comment 2 for bug 1269842

Revision history for this message
Ales Perme (ales-perme) wrote :

Pranams Raghavendra Prabhu!

Yes, it looks like it is a problem in galera. I did check "B", but since there is no percona 5.6 with XtraDB Cluster yet I can't check if it is the same bug. I'd gladly give it a try.

I did first run with the default XtraDB Cluster settings:
wsrep_max_ws_rows = 128K
wsrep_max_ws_size = 1024M

But I hit the same bug. Cause I thought it might be a rtansaction size issue I created a bigger size, without help. I also increased the wsrep_provider_options gcache.size from 1G to 16G without help.

Here are all the options: "base_host = ae-02; base_port = 4567; cert.log_conflicts = no; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /data/percona/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/percona/mysql//galera.cache; gcache.page_size = 512M; gcache.size = 16G; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = ae-02; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version = 0; pc.weight = 1; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3"

I increased the values because I callculated that just somewhere around 20 mio records I'd hit a 1GB transaction size because the average row length is about 58 bytes, thus 20.000.000 x 58 bytes = 1,08 GB (give or take a couple of thousand records since I can(t get the exact row length). So I did a little guessing if there is a problem with buffer sizes, log sizes, etc... but without success. Alas! :-)

Ad. d. Yes this happens at a certain limit. I can delete 15 mio records without problem. I just can't get the exact number, since it would take me a lot of time. I taheks me about an hour to restore the cluseter and it would be a lengthly operation to get the exact number... You can understand.

I can perform deletes or INSERT INTO xxx SELECT * FROM docStatsDetail.... usling LIMIT offset, number with success. The crash happens when using large transactions. Thus I thought it might be a problem with the sizes...

I would understand if MySQL would report an error, but a crash with a crashed Cluster?!? This is a nasty thing... To get a "quick" fix, I usually do a bootstrap of healthy cluster server and then perform a SST. But in production environment this is unacceptable... :-(

What should I do? And thank you again for all the help.

Kind regards,
Ales