during cluster startup, node asserts for: wsrep_last_committed <= 0

Bug #411814 reported by Seppo Jaakola
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Invalid
Low
Alex Yurchenko
Trunk
Invalid
Low
Alex Yurchenko

Bug Description

If node has previous state stored in state file, it can happen that during the startup an assertion fails for wsrep_last_committed value.

This assertion is in: wsrep_mysqld.cc::wsrep_view_handler_cb()

Related branches

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :
Download full text (6.3 KiB)

Here is the error log from crashing node:

090810 23:08:05 [Note] wsrep_load(): loading provider library '/home/galera/mysq
l-5.1.36-2859/galera/lib/libmmgalera.so'
090810 23:08:05 [Note] wsrep_load(): provider loaded succesfully.
090810 23:08:05 [Note] mm_galera.c:mm_galera_init():271: setting debug level log
ging
090810 23:08:05 [Note] DEBUG: gu_log.c:gu_conf_debug_on():58: Turning debug on
090810 23:08:05 [Note] mm_galera.c:mm_galera_init():290: Found stored state: 343
3a26a-1dd2-11b2-0800-096518d1cdc0:972
090810 23:08:05 [Note] mm_galera.c:mm_galera_init():301: Configured state: 343
3a26a-1dd2-11b2-0800-096518d1cdc0:972
090810 23:08:05 [Note] DEBUG: gcs.c:gcs_create():135: Requesting recv queue len:
 600868
090810 23:08:05 [Note] DEBUG: gu_fifo.c:gu_fifo_create():103: Creating FIFO buff
er of 1048576 elements, memory min used: 4272, max used: 29364400
090810 23:08:05 [Note] DEBUG: gu_log.c:gu_conf_set_log_callback():122: Logging f
unction changed by application
090810 23:08:05 [Note] DEBUG: gcs_core.c:gcs_core_open():146: Initializing backe
nd IO layer
090810 23:08:05 [Note] DEBUG: gcs_vs.cpp:long int gcs_vs_create(gcs_backend_t*,
const char*)():359: Opening connection to 'tcp:10.2.0.101:26000'
Logger init: (stderr,2)
090810 23:08:05 [Note] DEBUG: gcs_core.c:gcs_core_set_pkt_size():993: Changing m
aximum message size 0 -> 1048576
090810 23:08:05 [Note] DEBUG: gcs_core.c:gcs_core_set_pkt_size():1004: Message p
ayload (action fragment size): 1048556
090810 23:08:05 [Note] gcs.c:gcs_open():591: Opened channel 'my_wsrep_cluster'
090810 23:08:05 [Note] mm_galera.c:mm_galera_connect():469: Successfully opened
GCS connection to my_wsrep_cluster
090810 23:08:05 [Note] gcs_group.c:gcs_group_handle_comp_msg():243: New COMPONEN
T: primary = yes, my_id = 1, memb_num = 2
090810 23:08:05 [Note] gcs_core.c:core_handle_comp_msg():612: STATE EXCHANGE: Wa
iting for state UUID.
090810 23:08:05 [Note] gcs_core.c:core_handle_uuid_msg():689: STATE EXCHANGE: se
nt state msg: 83cdebd2-1dd2-11b2-0800-ae8f50462c88
090810 23:08:05 [Note] gcs_group.c:gcs_group_handle_state_msg():404: STATE EXCHA
NGE: got state msg: 83cdebd2-1dd2-11b2-0800-ae8f50462c88 from 0 (ardennes)
090810 23:08:05 [Note] DEBUG: gcs_group.c:group_print_state_debug():386:
        Protocols : 0 - 0
        Status : Synced
        Global seqno : 0
        Configuration: 1
        State UUID : 83cdebd2-1dd2-11b2-0800-ae8f50462c88
        Group UUID : f556ad20-1dd1-11b2-0800-ec3172bd8292
        Name : 'ardennes'
        Incoming addr: '10.0.0.105:3306'

090810 23:08:05 [Note] gcs_group.c:gcs_group_handle_state_msg():404: STATE EXCHA
NGE: got state msg: 83cdebd2-1dd2-11b2-0800-ae8f50462c88 from 1 (batak)
090810 23:08:05 [Note] DEBUG: gcs_group.c:group_print_state_debug():386:
        Protocols : 0 - 0
        Status : Non-primary
        Global seqno : 972
        Configuration: -1
        State UUID : 83cdebd2-1dd2-11b2-0800-ae8f50462c88
        Group UUID : 3433a26a-1dd2-11b2-0800-096518d1cdc0
        Name : 'batak'
        Incoming addr: '10.0.0.106:3306'

090810 23:08:05 [Note] DEBUG: gcs_group.c:group_post_state_exchange():184: STATE
 EXCHANGE: 83cd...

Read more...

Changed in codership-mysql:
importance: Undecided → High
Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

To fix this, wsrep provider (galera library) should return the state to mysqld.
This can happen through a separate callback or as return value from wsrep_init() call.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

The reason for this bug is that Galera saves/restores application state metadata independently of the application, without application being aware of it. This is ugly for several reasons:
1) Ultimately the entity that keeps the state (application) should be responsible for keeping state metadata.
2) This Galera capability is not explicitly exposed in the wsrep API, so it is sort of a "proprietary hack" which may not be present in other wsrep providers.

Of course, modifying wsrep API would solve problem 2), but 1) won't go anywhere.

So I won't call it a "solution", but rather a "hack".

Of the proposed hacks it is not clear which one is uglier:
- another callback will make execution flow even more convoluted and broken: wsrep::init() accepts application provided UUID as one of its const arguments and then calls a callback to modify it in the application while still using it inside wsrep::init(). That can be quite a nightmare.
- modifying wsrep_init() might be more correct, but still make its semantics ugly - we're supposed to provide initialization values to wsrep::init(), not get them back.

More appropriate change would be to have explicit store/restore calls in wsrep API for saving and recovering state metadata. And explicitly call them in application, not in Galera. However, it is only about saving few bytes on a persistent storage. This has nothing to do with replication per se. So there is no reason for these calls to be a part of wsrep API.

So I would like to avoid modifying wsrep API for this purpose as any modification seems inappropriate. However fixing this is urgent so I might have to do that.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

r2860 is a (fairly clean and transparent) workaround for #411814, it should not have any negative effects on correctness of the code. The result is that wsrep_group_uuid, an internal static variable, won't be updated until state transfer completes or if it is the first node in the cluster. This may result in fake warnings about the need for state transfer, but those can be ignored for now.

Since the real solution might involve wsrep API changes I'd like to delay it until we get a consensus on that.

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

There is still one more issue with not updating wsrep_group_uuid variable. Following log shows that
one cluster node assumes that SST is needed, because local and global group ids differ:

090812 10:05:12 [Note] WSREP: New cluster view: group UUID: abe9d918-1dd1-11b2-0
800-c025207f70fc, conf# 2: PRIMARY, number of nodes: 2, my index: 1, first seqno
: 1
090812 10:05:12 [Warning] WSREP: Gap in state sequence. Need state transfer.
090812 10:05:12 [Note] /home/galera/mysql-5.1.36-2859/mysql/libexec/mysqld: read
y for connections.
Version: '5.1.36-debug' socket: '/home/galera/mysql-5.1.36-2859/mysql/var/mysql
d.sock' port: 3306 wsrep_0.6
090812 10:05:12 [Warning] WSREP: Group state: abe9d918-1dd1-11b2-0800-c025207f70
fc:0
090812 10:05:12 [Warning] WSREP: Local state: 00000000-0000-0000-0000-0000000000
00:0

However, SST will be never initiated and this nodes remains closed for application access.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Well, my first take was wrong. r2862 introduces a workaround roughly based on wsrep::init() proposal: after return struct wsrep_args contains updated state values. I still view it as a temporary workaround and would like to discuss this again when Teemu is back.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

This is another attempt to reflect on the problem.

It all boils down to one thing: mysqld needs to know when its state is or gets outdated in order to set wsrep_ready status correctly.

In view info structure wsrep provider tells mysql the cluster situation. mysql then compares its state with the cluster state and decides what should be the value of wsrep_ready.

The problem was that with the current implementation, when state metadata is saved and loaded by galera library independently of mysql, it does not know its initial state and cannot do that comparison when it receives its first view info.

My reasoning was that if mysql stored and restored this information, the problem would have been solved.

Now this breaks into two problems:

1) where to keep state metadata: state uuid and seqno? Ultimately they should be a part of the application state and therefore saved and loaded by application. Saving them in a separate state independent location is error prone, and if that's going to happen on wsrep provider side, this will never be fixed.

2) In view of https://bugs.launchpad.net/wsrep/+bug/418641 last applied seqno will no longer be set by the callback, So both state uuid and seqno will have to be queried via wsrep::status_get(). On one hand this is no big deal, as on start-up it can be queried between wsrep::init() and wsrep::connect(). On the other wsrep::status_get() can't be called from the wsrep_view callback as it would lead to a deadlock since both need to enter the same critical section. Thus the gap in state sequence cannot be detected during view change.

However wsrep provider already does this and if state transfer is needed it will call wsrep_sst_prepare_cb. This the flag indicating the gap in state sequence can be passed in view_info structure. There are 2 advantages with that:
1) state transfer decision is made only in one place - consistency.
2) wsrep_sst_prepare_cb and wsrep_view_cb callback can be actually merged.
Result: total win.

What does this mean in this bug? It is broken into two rather independent issues:
• state metadata storage issue
• using sst decision from wsrep provider to set wsrep_ready status variable. Basically this boil down to wsrep_ready = !view->state_gap

Revision history for this message
Alex Yurchenko (ayurchen) wrote :
Revision history for this message
Alex Yurchenko (ayurchen) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.