[Warning] WSREP: Unsupported protocol downgrade: incremental data collection disabled. Expect abort.

Bug #1379204 reported by Philip Stoev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Low
Unassigned
5.6
Fix Committed
Low
Unassigned

Bug Description

Running a full series of MTR tests produces this warning on a test that unsets and then sets wsrep_provider. Running that same test in isolation does not reproduce the problem.

Revision history for this message
Philip Stoev (philip-stoev-f) wrote :
Revision history for this message
Philip Stoev (philip-stoev-f) wrote :
Revision history for this message
Philip Stoev (philip-stoev-f) wrote :

Stack trace after adding an abort() immediately after the warning:
#4 0x0000003f2d836f78 in abort () from /lib64/libc.so.6
#5 0x00000000005b2a8c in wsrep_view_handler_cb (app_ctx=<optimized out>, recv_ctx=<optimized out>, view=0x7f314005a6d0, state=<optimized out>, state_len=<optimized out>, sst_req=0x7f31702cce00, sst_req_len=0x7f31702cce10) at /home/philips/git/codership-mysql/sql/wsrep_mysqld.cc:375
#6 0x00007f317b1c53f4 in galera::ReplicatorSMM::process_conf_change (this=0x7f313c811e60, recv_ctx=0x7f31400895b0, view_info=..., repl_proto=6, next_state=galera::Replicator::S_CONNECTED, seqno_l=<optimized out>) at galera/src/replicator_smm.cpp:1372
#7 0x00007f317b1a23ec in galera::GcsActionSource::dispatch (this=this@entry=0x7f313c8124b8, recv_ctx=recv_ctx@entry=0x7f31400895b0, act=..., exit_loop=@0x7f31702cd2e0: false) at galera/src/gcs_action_source.cpp:138
#8 0x00007f317b1a380c in galera::GcsActionSource::process (this=0x7f313c8124b8, recv_ctx=0x7f31400895b0, exit_loop=@0x7f31702cd2e0: false) at galera/src/gcs_action_source.cpp:180
#9 0x00007f317b1c4dab in galera::ReplicatorSMM::async_recv (this=0x7f313c811e60, recv_ctx=0x7f31400895b0) at galera/src/replicator_smm.cpp:354
#10 0x00007f317b1d3878 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at galera/src/wsrep_provider.cpp:231
#11 0x00000000005be7c0 in wsrep_replication_process (thd=0x7f31400895b0) at /home/philips/git/codership-mysql/sql/wsrep_thd.cc:309
#12 0x00000000005a7c2b in start_wsrep_THD (arg=0x5be770 <wsrep_replication_process(THD*)>) at /home/philips/git/codership-mysql/sql/mysqld.cc:5374
#13 0x0000003f2e007f35 in start_thread () from /lib64/libpthread.so.0
#14 0x0000003f2d8f4c3d in clone () from /lib64/libc.so.6

Revision history for this message
Philip Stoev (philip-stoev-f) wrote :

Attaching core and binary.

Revision history for this message
Philip Stoev (philip-stoev-f) wrote :

Please enable the galera.galera_wsrep_provider_unset_set test after this is fixed.

Revision history for this message
Teemu Ollakka (teemu-ollakka) wrote :

Was unable to inspect core file because of 'strcut wsrep' type and 'wsrep' variable naming clash. Type 'strcut wsrep' needs to be renamed to something else and this issue needs to be reproduced.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Phillip,

Is this bug reported on github anywhere.

Revision history for this message
Philip Stoev (philip-stoev-f) wrote :

@raghavendra-prabhu I do not think so.

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Warning is seen only when TC is run back back as part of mtr suite.
If TC is run independently then warning is not observed. Seems like some stale state is being left out.

Revision history for this message
Kenn Takara (kenn-takara) wrote :

I investigated this (since I was seeing in my local MTR runs).

To repro, run the galera.galera_suspend_slave test followed by the galera.pxc-421 test (or any test that resets the wsrep_provider variable).

The bug occurs because the suspend_slave test runs long enough to establish the protocol version and pxc-421 uses the same running servers.

When we stop the wsrep provider, the information is not being reset, so a warning is issued because of the downgrade in capabilities.

    Galera defaults the repl_proto_ver to -1.
    We then request the capabilities (while the version is -1) (wsrep_view_handler_cb() in wsrep_mysqld.cc)
    We receive the v4 capabilities (the default) with the incremental data collection bit set to 0
    At some point the real protocol version is established (v7), so we get the v5 set of capabilities with the incremental data collection bit set to 1. (galera_capabilities() in wsrep_provider.cpp)
    Now the PXC-421 test restarts the wsrep_provider, so galera now has a version of -1 again (but wsrep still has the incremental data collection enabled)
    Since it's been restarted, the capabilities is requested again and since the version is -1, the incremental data collection bit is set to 0
    This then triggers the warning message as WSREP has the bit enabled and it is now being disabled.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-875

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.