Attempt to match against an empty key | Crash in UNIV_DEBUG

Bug #1274199 reported by Jay Janssen on 2014-01-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Undecided
Alex Yurchenko
Percona XtraDB Cluster
Status tracked in 5.6
5.5
Undecided
Unassigned
5.6
Undecided
Unassigned

Bug Description

[root@ip-10-29-254-10 mysql]# rpm -qa | grep -i percona
Percona-XtraDB-Cluster-shared-55-5.5.34-25.9.607.rhel6.x86_64
Percona-XtraDB-Cluster-client-55-5.5.34-25.9.607.rhel6.x86_64
Percona-XtraDB-Cluster-galera-3-3.2-1.196.rhel6.x86_64
percona-toolkit-2.2.6-1.noarch
Percona-Server-shared-51-5.1.73-rel14.11.603.rhel6.x86_64
percona-xtrabackup-2.1.7-721.rhel6.x86_64
Percona-XtraDB-Cluster-server-55-5.5.34-25.9.607.rhel6.x86_64

Scenario is 2 node cluster.

node2 is wsrep_desync=ON and FLUSH TABLES WITH READ LOCK;

node1 runs: sysbench --db-driver=mysql --test=sysbench_tests/db/update_non_index.lua --rand-type=uniform --rand-init --mysql-user=test --mysql-password=test --mysql-db=test --oltp-table-size=250000 --max-requests=1000000 --num-threads=8 --max-time=0 --report-interval=1 run

Once sysbench finishes on node1, execute UNLOCK TABLES on node2. Node2 gets this error:

Version: '5.5.34-55' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), wsrep_25.9.r3928
140129 15:25:09 [Note] WSREP: Service thread queue flushed.
140129 15:25:09 [Note] WSREP: Synchronized with group, ready for connections
140129 15:25:09 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140129 15:25:22 [Note] WSREP: Node 1 (ip-10-29-254-10) desyncs itself from group
140129 15:25:22 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 2424911)
140129 15:26:36 [Note] WSREP: Provider paused at 3f186ea4-88ee-11e3-b496-2271cf23d665:2524911
140129 15:27:16 [Note] WSREP: Provider resumed.
140129 16:52:06 [Note] WSREP: Provider paused at 3f186ea4-88ee-11e3-b496-2271cf23d665:2624911
140129 16:53:18 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000000 of size 134217728 bytes
140129 16:54:26 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes
140129 16:55:35 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000002 of size 134217728 bytes
140129 16:56:42 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000003 of size 134217728 bytes
140129 17:07:50 [Note] WSREP: Provider resumed.
140129 17:07:51 [ERROR] WSREP: Attempt to match against an empty key (1,0): 22 (Invalid argument)
         at galera/src/key_set.cpp:throw_match_empty_key():135
140129 17:07:51 [Note] WSREP: applier thread exiting (code:7)
140129 17:07:51 [ERROR] WSREP: node consistency compromised, aborting
140129 17:07:51 [Note] WSREP: starting shutdown
140129 17:07:51 [Note] /usr/sbin/mysqld: Normal shutdown

140129 17:07:51 [Note] WSREP: Stop replication
140129 17:07:51 [Note] WSREP: Closing send monitor...
140129 17:07:51 [Note] WSREP: Closed send monitor.
140129 17:07:51 [Note] WSREP: gcomm: terminating thread
140129 17:07:51 [Note] WSREP: gcomm: joining thread
140129 17:07:51 [Note] WSREP: gcomm: closing backend
140129 17:07:52 [Note] WSREP: view(view_id(NON_PRIM,825f9682-88f9-11e3-9ed8-b732f283d4db,2) memb {
        887a4798-88f9-11e3-9c26-2ecbbc9449b7,0
} joined {
} left {
} partitioned {
        825f9682-88f9-11e3-9ed8-b732f283d4db,0
})
140129 17:07:52 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140129 17:07:52 [Note] WSREP: view((empty))
140129 17:07:52 [Note] WSREP: gcomm: closed
140129 17:07:52 [Note] WSREP: Flow-control interval: [1000000, 1000000]
140129 17:07:52 [Note] WSREP: Received NON-PRIMARY.
140129 17:07:52 [Note] WSREP: Shifting DONOR/DESYNCED -> OPEN (TO: 3624911)
140129 17:07:52 [Note] WSREP: Received self-leave message.
140129 17:07:52 [Note] WSREP: Flow-control interval: [0, 0]
140129 17:07:52 [Note] WSREP: Received SELF-LEAVE. Closing connection.
140129 17:07:52 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 3624911)
140129 17:07:52 [Note] WSREP: RECV thread exiting 0: Success
140129 17:07:52 [Note] WSREP: recv_thread() joined.
140129 17:07:52 [Note] WSREP: Closing replication queue.
140129 17:07:52 [Note] WSREP: Closing slave action queue.
140129 17:07:54 [Note] WSREP: killing local connection: 18

Note that node2 is also hung at this point: mysqld is there, but it does not accept connections. I'm unsure if this will happen consistently or not.

Related branches

lp:galera
David Bennett: Pending requested 2014-07-25

Tested this with 5.6 and sysbench, couldn't replicate it there. Need to check with 5.5.

@Jay,
One more thing, I see 5.5 has been used with 3.x, were you able to replicate this with 5.5 + 2.x as well?

Alex Yurchenko (ayurchen) wrote :

Raghu, this is exclusively Galera-3 issue.

Download full text (4.3 KiB)

Actually with UNIV_DEBUG build it crashes inside galera: (ie. 5.5 with galera3)

(gdb) bt
#0 0x00007ffff637c389 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff637d788 in abort () from /usr/lib/libc.so.6
#2 0x00007ffff5486265 in base_size (buf=<optimized out>, size=4294967295, ver=<optimized out>) at galera/src/key_set.hpp:267
#3 serial_size (size=4294967295, buf=<optimized out>, ver=<optimized out>) at galera/src/key_set.hpp:282
#4 serial_size (size=<optimized out>, buf=<optimized out>) at galera/src/key_set.hpp:208
#5 serial_size (this=<optimized out>) at galera/src/key_set.hpp:212
#6 next_base<galera::KeySet::KeyPart> (this=0x7fff886d9e68) at galerautils/src/gu_rset.hpp:346
#7 next (this=0x7fff886d9e68) at galerautils/src/gu_rset.hpp:420
#8 next (this=0x7fff886d9e68) at galera/src/key_set.hpp:713
#9 galera::Certification::purge_for_trx_v3 (this=0x11ccd18, trx=0x7fff886d9cd0) at galera/src/certification.cpp:120
#10 0x00007ffff5486315 in galera::Certification::purge_for_trx (this=<optimized out>, trx=trx@entry=0x7fff886d9cd0) at galera/src/certification.cpp:147
#11 0x00007ffff548c982 in galera::Certification::PurgeAndDiscard::operator() (this=this@entry=0x7fffe410c630, vt=...) at galera/src/certification.hpp:140
#12 0x00007ffff54863ac in for_each<std::_Rb_tree_iterator<std::pair<long const, galera::TrxHandle*> >, galera::Certification::PurgeAndDiscard> (__f=..., __last=..., __first=...)
    at /usr/include/c++/4.8.2/bits/stl_algo.h:4417
#13 galera::Certification::purge_trxs_upto_ (this=this@entry=0x11ccd18, seqno=seqno@entry=119040, handle_gcache=handle_gcache@entry=true) at galera/src/certification.cpp:937
#14 0x00007ffff548b65d in galera::Certification::append_trx (this=this@entry=0x11ccd18, trx=trx@entry=0x7fff89b82740) at galera/src/certification.cpp:1003
#15 0x00007ffff54b5189 in galera::ReplicatorSMM::cert (this=this@entry=0x11cc480, trx=trx@entry=0x7fff89b82740) at galera/src/replicator_smm.cpp:1667
#16 0x00007ffff54b567b in galera::ReplicatorSMM::pre_commit (this=0x11cc480, trx=0x7fff89b82740, meta=<optimized out>) at galera/src/replicator_smm.cpp:715
#17 0x00007ffff54c51c0 in galera_pre_commit (gh=<optimized out>, conn_id=22, trx_handle=<optimized out>, flags=<optimized out>, meta=0x18f6758) at galera/src/wsrep_provider.cpp:505
#18 0x000000000066474a in wsrep_run_wsrep_commit (thd=thd@entry=0x18f4b20, hton=hton@entry=0x11f5250, all=all@entry=true) at /media/Oort/ncode/percona-xtradb-cluster/trunk-25/sql/wsrep_hton.cc:429
#19 0x0000000000665467 in wsrep_prepare (hton=0x11f5250, thd=0x18f4b20, all=<optimized out>) at /media/Oort/ncode/percona-xtradb-cluster/trunk-25/sql/wsrep_hton.cc:163
#20 0x00000000006bb7cf in ha_commit_trans (thd=thd@entry=0x18f4b20, all=all@entry=true) at /media/Oort/ncode/percona-xtradb-cluster/trunk-25/sql/handler.cc:1384
#21 0x000000000064d6a8 in trans_commit (thd=thd@entry=0x18f4b20) at /media/Oort/ncode/percona-xtradb-cluster/trunk-25/sql/transaction.cc:199
#22 0x00000000005902cc in mysql_execute_command (thd=thd@entry=0x18f4b20) at /media/Oort/ncode/percona-xtradb-cluster/trunk-25/sql/sql_parse.cc:4371
#23 0x0000000000593471 in mysql_parse (thd=thd@entry=0x18f4b20, rawbuf=rawbuf@entry=0x7fff8802...

Read more...

bt full and thread apply all bt full.

It crashes here:

        static size_t
        base_size (Version const ver,
                   const gu::byte_t* const buf, size_t const size)
        {
            switch (ver)
            {
            case FLAT16:
            case FLAT16A:
                return 16;
            case FLAT8:
            case FLAT8A:
                return 8;
            case EMPTY: assert(0);
            }

            abort();
        }

summary: - Attempt to match against an empty key
+ Attempt to match against an empty key | Crash in UNIV_DEBUG
Download full text (76.1 KiB)

With debug=1 + log_info that I added:

140308 22:17:06 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 203856
140308 22:17:06 [Note] [Debug] WSREP: gcs/src/gcs_group.c:gcs_group_handle_last_msg():629: New COMMIT CUT 203837 after 203856 from 1
140308 22:17:06 [Note] [Debug] WSREP: galera/src/replicator_smm.cpp:process_commit_cut():1250: Got commit cut from GCS: 203837
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:06 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 203905
WSREP: Deadlock detected
140308 22:17:06 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 203954
WSREP: Deadlock detected
140308 22:17:07 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204002
WSREP: Deadlock detected
140308 22:17:07 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204054
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:07 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204090
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:07 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204137
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:08 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204188
140308 22:17:08 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204218
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:08 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204268
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:09 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204317
WSREP: Deadlock detected
140308 22:17:09 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204346
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:09 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204394
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:09 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204444
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:09 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204470
140308 22:17:09 [Note] [Debug] WSREP: galera/src/galera_service_thd.cpp:thd_func():67: Reported last committed: 204471
WSREP: Deadlock detected
WSREP: Deadlock detected
WSREP: Deadlock detected
140308 22:17:10 [Note]...

From log_info I added:

=== modified file 'galera/src/key_set.hpp'
--- galera/src/key_set.hpp 2013-12-01 16:29:42 +0000
+++ galera/src/key_set.hpp 2014-03-08 16:31:07 +0000
@@ -264,6 +264,7 @@
             case EMPTY: assert(0);
             }

+ log_info << "Version value:" << ver;
             abort();
         }

140308 22:18:46 [Note] [Debug] WSREP: galera/src/certification.cpp:append_trx():986: trx map size: 16451 - check if status.last_committed is incrementing
140308 22:18:46 [Note] WSREP: galera/src/key_set.hpp:base_size():267: Version value:0

Here:

             case FLAT8A:
                return 8;
            case EMPTY: assert(0);
            }

            abort();

Is it missing a 'return 0' here? unless you want to assert (when NDEBUG) and abort as well.

From node2 as per the reporter I see:

140308 22:28:32 [Note] WSREP: Skipping empty log_xid: (null)
140308 22:28:32 [Note] WSREP: Skipping empty log_xid: (null)
140308 22:28:32 [Note] WSREP: Skipping empty log_xid: (null)
140308 22:28:32 [ERROR] WSREP: Attempt to match against an empty key (1,0): 22 (Invalid argument)
         at galera/src/key_set.cpp:throw_match_empty_key():135
140308 22:28:32 [Note] WSREP: applier thread exiting (code:7)
140308 22:28:32 [ERROR] WSREP: node consistency compromised, aborting
140308 22:28:32 [Note] WSREP: starting shutdown
140308 22:28:32 [Note] WSREP: closing applier 7
140308 22:28:32 [Note] WSREP: wsrep running threads now: 8
140308 22:28:32 [Note] WSREP: Skipping empty log_xid: (null)
1

Download full text (21.5 KiB)

Adding 'return 0' :

On node1:
==============
140308 22:36:47 [ERROR] WSREP: galera/src/wsrep_provider.cpp:galera_pre_commit():513: Attempt to match against an empty key (1,0): 22 (Invalid argument)
         at galera/src/key_set.cpp:throw_match_empty_key():135
140308 22:36:47 [ERROR] WSREP: unknown connection failure
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/evs_proto.cpp:handle_gap():2968: evs::proto(667205c1-a6e3-11e3-bf0b-b683d5ad1a93, GATHER, view_id(REG,667205c1-a6e3-11e3-bf0b-b683d5ad1a93,2)) commit gap from 667205c1-a6e3-11e3-bf0b-b683d5ad1a93
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/evs_proto.cpp:handle_gap():3000: evs::proto(667205c1-a6e3-11e3-bf0b-b683d5ad1a93, INSTALL, view_id(REG,667205c1-a6e3-11e3-bf0b-b683d5ad1a93,2)): install gap evs::msg{version=0,type=3,user_type=255,order=1,seq=-1,seq_range=-1,aru_seq=-1,flags=0,source=667205c1-a6e3-11e3-bf0b-b683d5ad1a93,source_view_id=view_id(REG,667205c1-a6e3-11e3-bf0b-b683d5ad1a93,3),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=35686,node_list=()
}
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/pc_proto.cpp:handle_view():538: 667205c1-a6e3-11e3-bf0b-b683d5ad1a93 view(view_id(TRANS,667205c1-a6e3-11e3-bf0b-b683d5ad1a93,2) memb {
        667205c1-a6e3-11e3-bf0b-b683d5ad1a93,0
} joined {
} left {
        6ce84e85-a6e3-11e3-bfce-6f762c1fafc3,0
} partitioned {
})
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/pc_proto.cpp:handle_trans():451: 667205c1-a6e3-11e3-bf0b-b683d5ad1a93

 current view view(view_id(REG,667205c1-a6e3-11e3-bf0b-b683d5ad1a93,2) memb {
        667205c1-a6e3-11e3-bf0b-b683d5ad1a93,0
        6ce84e85-a6e3-11e3-bfce-6f762c1fafc3,0
} joined {
        6ce84e85-a6e3-11e3-bfce-6f762c1fafc3,0
} left {
} partitioned {
})

 next view view(view_id(TRANS,667205c1-a6e3-11e3-bf0b-b683d5ad1a93,2) memb {
        667205c1-a6e3-11e3-bf0b-b683d5ad1a93,0
} joined {
} left {
        6ce84e85-a6e3-11e3-bfce-6f762c1fafc3,0
} partitioned {
})

 pc view view(view_id(PRIM,667205c1-a6e3-11e3-bf0b-b683d5ad1a93,2) memb {
        667205c1-a6e3-11e3-bf0b-b683d5ad1a93,0
        6ce84e85-a6e3-11e3-bfce-6f762c1fafc3,0
} joined {
} left {
} partitioned {
})
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/pc_proto.cpp:handle_trans():483: 667205c1-a6e3-11e3-bf0b-b683d5ad1a93 quorum ok
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/pc_proto.cpp:shift_to():321: 667205c1-a6e3-11e3-bf0b-b683d5ad1a93 shift_to: PRIM -> TRANS prim 1 last prim view_id(PRIM,667205c1-a6e3-11e3-bf0b-b683d5ad1a93,2) to_seq 18696
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/evs_input_map2.cpp:clear():227: discarding 12 messages from recovery index
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/evs_input_map2.cpp:reset():153: size 0
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/evs_input_map2.cpp:reset():159: node: {idx=0,range=[0,-1],safe_seq=-1} size 1
140308 22:38:48 [Note] [Debug] WSREP: gcomm/src/gmcast.cpp:handle_stable_view():1439: GMCast::handle_stable_view: view(view_id(REG,667205c1-a6e3-11e3-bf0b-b683d5ad1a93,3) memb {
        667205c1-a6e3-11e3-bf0b-b683d5ad1a93,0
} joined {
} left {
        6ce84e85-a6e3-11e3-bfce-6f762c1fafc3,0
} partitioned {
})
140308 22:38:48 [Note] [Debug]...

Regarding previous comment,

 140308 22:36:47 [ERROR] WSREP: galera/src/wsrep_provider.cpp:galera_pre_commit():513: Attempt to match against an empty key (1,0): 22 (Invalid argument)
         at galera/src/key_set.cpp:throw_match_empty_key():135
140308 22:36:47 [ERROR] WSREP: unknown connection failure

After this the node1 hangs, probably due to queue being full, unlock tables on node2 continues it.

Alex Yurchenko (ayurchen) wrote :

Looks like it was off-by-one error in certification cleanup.
Fix pushed in http://bazaar.launchpad.net/~codership/galera/3.x/revision/178

Changed in galera:
assignee: nobody → Alex Yurchenko (ayurchen)
milestone: none → 25.3.5
status: New → Fix Committed
Changed in galera:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments