Bug #1108035 “PXC innodb_buffer_pool_restore_at_startup and wsre...” : Bugs : Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Revision history for this message

Vadim Tkachenko (vadim-tk) wrote on 2013-01-28:

#1

you can use innodb_blocking_buffer_pool_restore=ON
to make sure that restore operation is complete.

Changed in percona-xtradb-cluster:
status:	New → Invalid

Revision history for this message

Ilias Bertsimas (d-ilias) wrote on 2013-01-28:

#2

Will this be respected by wsrep as well and not join the cluster until the operation is done ?

Revision history for this message

Vadim Tkachenko (vadim-tk) wrote on 2013-01-28:

#3

I am actually not sure.
If wsrep does not respect that, it should be fixed.

But I think it should be respected for xtrabackup SST.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-02-18:

#4

Download full text (7.6 KiB)

Here is the log:

sudo mysqld-debug --user=mysql --debug
130219 4:09:08 [Warning] The syntax '--log-slow-queries' is deprecated and will be removed in a future release. Please use '--slow-query-log'/'--slow-query-log-file' instead.
130219 4:09:08 [Note] Flashcache bypass: disabled
130219 4:09:08 [Note] Flashcache setup error is : ioctl failed

130219 4:09:08 [Note] WSREP: Read nil XID from storage engines, skipping position init
130219 4:09:08 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
130219 4:09:08 [Note] WSREP: wsrep_load(): Galera 2.2(r114) by Codership Oy <email address hidden> loaded succesfully.
130219 4:09:08 [Note] WSREP: Found saved state: 11264cec-06e6-11e2-0800-61616b1fc754:2669
130219 4:09:08 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
130219 4:09:08 [Note] WSREP: Passing config to GCS: base_host = 10.0.2.154; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
130219 4:09:08 [Note] WSREP: Assign initial position for certification: 2669, protocol version: -1
130219 4:09:08 [Note] WSREP: wsrep_sst_grab()
130219 4:09:08 [Note] WSREP: Start replication
130219 4:09:08 [Note] WSREP: Setting initial position to 11264cec-06e6-11e2-0800-61616b1fc754:2669
130219 4:09:08 [Note] WSREP: protonet asio version 0
130219 4:09:08 [Note] WSREP: backend: asio
130219 4:09:08 [Note] WSREP: GMCast version 0
130219 4:09:08 [Note] WSREP: (01b640b8-7a1c-11e2-0800-ecbf0dd5f72b, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
130219 4:09:08 [Note] WSREP: (01b640b8-7a1c-11e2-0800-ecbf0dd5f72b, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
130219 4:09:08 [Note] WSREP: EVS version 0
130219 4:09:08 [Note] WSREP: PC version 0
130219 4:09:08 [Note] WSREP: gcomm: connecting to group 'PXC', peer 'Pxc1:'
130219 4:09:08 [Note] WSREP: declaring db89bc0c-7a18-11e2-0800-517d29aa6526 stable
130219 4:09:08 [Note] WSREP: view(view_id(PRIM,01b640b8-7a1c-11e2-0800-ecbf0dd5f72b,10) memb {
01b640b8-7a1c-11e2-0800-ecbf0dd5f72b,
db89bc0c-7a18-11e2-0800-517d29aa6526,
} joined {
} left {
} partitioned {
})
130219 4:09:09 [Note] WSREP: gcomm: connected
130219 4:09:09 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
130219 4:09:09 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
130219 4:09:09 [Note] WSREP: Opened channel 'PXC'
130219 4:09:09 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
130219 4:09:09 [Note] WSREP: Waiting for SST to complete.
130219 4:09:09 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 02501b41-7a1c-11e2-0800-fa53e208c983
130219 4:09:09 [Note] WSREP: STATE EXCHANGE: sent state msg: 02501b41-7a1c-11e2-0800-fa53e208c983
130219 4:09:09 [Note] WSREP: STATE EXCHANGE: got state msg: 0250...

Here is the log:

sudo mysqld-debug --user=mysql --debug 
130219  4:09:08 [Warning] The syntax '--log-slow-queries' is deprecated and will be removed in a future release. Please use '--slow-query-log'/'--slow-query-log-file' instead.
130219  4:09:08 [Note] Flashcache bypass: disabled
130219  4:09:08 [Note] Flashcache setup error is : ioctl failed

130219  4:09:08 [Note] WSREP: Read nil XID from storage engines, skipping position init
130219  4:09:08 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
130219  4:09:08 [Note] WSREP: wsrep_load(): Galera 2.2(r114) by Codership Oy <info@codership.com> loaded succesfully.
130219  4:09:08 [Note] WSREP: Found saved state: 11264cec-06e6-11e2-0800-61616b1fc754:2669
130219  4:09:08 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
130219  4:09:08 [Note] WSREP: Passing config to GCS: base_host = 10.0.2.154; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
130219  4:09:08 [Note] WSREP: Assign initial position for certification: 2669, protocol version: -1
130219  4:09:08 [Note] WSREP: wsrep_sst_grab()
130219  4:09:08 [Note] WSREP: Start replication
130219  4:09:08 [Note] WSREP: Setting initial position to 11264cec-06e6-11e2-0800-61616b1fc754:2669
130219  4:09:08 [Note] WSREP: protonet asio version 0
130219  4:09:08 [Note] WSREP: backend: asio
130219  4:09:08 [Note] WSREP: GMCast version 0
130219  4:09:08 [Note] WSREP: (01b640b8-7a1c-11e2-0800-ecbf0dd5f72b, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
130219  4:09:08 [Note] WSREP: (01b640b8-7a1c-11e2-0800-ecbf0dd5f72b, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
130219  4:09:08 [Note] WSREP: EVS version 0
130219  4:09:08 [Note] WSREP: PC version 0
130219  4:09:08 [Note] WSREP: gcomm: connecting to group 'PXC', peer 'Pxc1:'
130219  4:09:08 [Note] WSREP: declaring db89bc0c-7a18-11e2-0800-517d29aa6526 stable
130219  4:09:08 [Note] WSREP: view(view_id(PRIM,01b640b8-7a1c-11e2-0800-ecbf0dd5f72b,10) memb {
        01b640b8-7a1c-11e2-0800-ecbf0dd5f72b,
        db89bc0c-7a18-11e2-0800-517d29aa6526,
} joined {
} left {
} partitioned {
})
130219  4:09:09 [Note] WSREP: gcomm: connected
130219  4:09:09 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
130219  4:09:09 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
130219  4:09:09 [Note] WSREP: Opened channel 'PXC'
130219  4:09:09 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
130219  4:09:09 [Note] WSREP: Waiting for SST to complete.
130219  4:09:09 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 02501b41-7a1c-11e2-0800-fa53e208c983
130219  4:09:09 [Note] WSREP: STATE EXCHANGE: sent state msg: 02501b41-7a1c-11e2-0800-fa53e208c983
130219  4:09:09 [Note] WSREP: STATE EXCHANGE: got state msg: 02501b41-7a1c-11e2-0800-fa53e208c983 from 0 (Pxc2)
130219  4:09:09 [Note] WSREP: STATE EXCHANGE: got state msg: 02501b41-7a1c-11e2-0800-fa53e208c983 from 1 (Pxc1)
130219  4:09:09 [Note] WSREP: Quorum results:
        version    = 2,
        component  = PRIMARY,
        conf_id    = 9,
        members    = 1/2 (joined/total),
        act_id     = 50645,
        last_appl. = -1,
        protocols  = 0/4/2 (gcs/repl/appl),
        group UUID = 11264cec-06e6-11e2-0800-61616b1fc754
130219  4:09:09 [Note] WSREP: Flow-control interval: [12, 23]
130219  4:09:09 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 50645)
130219  4:09:09 [Note] WSREP: State transfer required:
        Group state: 11264cec-06e6-11e2-0800-61616b1fc754:50645
        Local state: 11264cec-06e6-11e2-0800-61616b1fc754:2669
130219  4:09:09 [Note] WSREP: New cluster view: global state: 11264cec-06e6-11e2-0800-61616b1fc754:50645, view# 10: Primary, number of nodes: 2, my index: 0, protocol version 2
130219  4:09:09 [Warning] WSREP: Gap in state sequence. Need state transfer.
130219  4:09:11 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '10.0.2.154' '' '/var/lib/mysql/' '/etc/my.cnf' '3479' 2>sst.err'
130219  4:09:11 [Note] WSREP: Prepared SST request: xtrabackup|10.0.2.154:4444/xtrabackup_sst
130219  4:09:11 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130219  4:09:11 [Note] WSREP: Assign initial position for certification: 50645, protocol version: 2
130219  4:09:11 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.0.2.154:4568
130219  4:09:11 [Note] WSREP: Node 0 (Pxc2) requested state transfer from '*any*'. Selected 1 (Pxc1)(SYNCED) as donor.
130219  4:09:11 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 50645)
130219  4:09:11 [Note] WSREP: Requesting state transfer: success, donor: 1
130219  4:09:11 [Note] WSREP: SST complete, seqno: 2669
130219  4:09:11 [Note] Plugin 'FEDERATED' is disabled.
130219  4:09:11 InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
130219  4:09:11 InnoDB: The InnoDB memory heap is disabled
130219  4:09:11 InnoDB: Mutexes and rw_locks use GCC atomic builtins
130219  4:09:11 InnoDB: Compressed tables use zlib 1.2.3
130219  4:09:11 InnoDB: Using Linux native AIO
130219  4:09:11 InnoDB: Initializing buffer pool, size = 500.0M
130219  4:09:11 InnoDB: Completed initialization of buffer pool
130219  4:09:11 InnoDB: highest supported file format is Barracuda.
130219  4:09:11 InnoDB: Restoring buffer pool pages from ib_lru_dump
130219  4:09:11  InnoDB: Waiting for the background threads to start
130219  4:09:11 InnoDB: Completed reading buffer pool pages (requested: 669, read: 263)
130219  4:09:12 Percona XtraDB (http://www.percona.com) 1.1.8-rel28.1 started; log sequence number 27644325
130219  4:09:12 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
130219  4:09:12 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
130219  4:09:12 [Note] Server socket created on IP: '0.0.0.0'.
130219  4:09:12 [Note] Event Scheduler: Loaded 0 events
130219  4:09:12 [Note] WSREP: Signalling provider to continue.
130219  4:09:12 [Note] WSREP: SST received: 11264cec-06e6-11e2-0800-61616b1fc754:2669
130219  4:09:12 [Note] WSREP: Receiving IST: 47976 writesets, seqnos 2669-50645
130219  4:09:12 [Note] mysqld-debug: ready for connections.
Version: '5.5.27-debug-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster - Debug (GPL), wsrep_23.6.r356
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.
InnoDB: DEBUG: update_statistics for sbtest/sbtest1.

================================================================

From, this it looks interleaved

Changed in percona-xtradb-cluster:
status:	Invalid → New

Revision history for this message

Ilias Bertsimas (d-ilias) wrote on 2013-02-18:

#5

I can confirm that innodb_blocking_buffer_pool_restore=ON solves the issue but with a weird side-effect. Galera tries to restore the wsrep_position and with that option enabled it spends a lot of time on that part:

130205 13:10:38 mysqld_safe Starting mysqld daemon with databases from /data/mysql
130205 13:10:38 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.nci5tZ9xZF
130205 13:16:37 mysqld_safe WSREP: Recovered position 402367df-5fd0-11e2-0800-58b321ec9eec:518510084

Almost 6 minutes spent above when without the innodb_blocking_buffer_pool_restore=ON it is pretty much instant.
It is not that it restores the buffer pool pages at that point as the log indicates later on:

130205 13:16:45 InnoDB: Restoring buffer pool pages from ib_lru_dump
130205 13:22:52 InnoDB: Completed reading buffer pool pages (requested: 3178478, read: 3178076)

What do you think ?

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-02-18:

#6

So, the delay between 13:10:38 and 13:16:37 is present only when innodb_blocking_buffer_pool_restore=ON?

Also, the delay between 13:16 and 13:22 is justified since it has to read 3178076 pages (about 48 GB) in a blocking way.

Can you provide that whole log?

========================

Here is the full log of tests which I ran:

http://sprunge.us/MfPM

http://sprunge.us/LcVW

As you can see it takes longer with lru dump but that alone can't be used here (due to caching effects of galera cache etc.)

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-02-19:

#7

Ignore the timing in http://sprunge.us/MfPM (it was due to verbose logging with --debug), here is the log for

http://sprunge.us/hPGU (for innodb_buffer_pool_restore_at_startup=true)

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-02-19: Re: [Bug 1108035] PXC innodb_buffer_pool_restore_at_startup and node joining cluster io resources race condition

#8

The buffer pool is clearly getting restored in both places:

On Feb 18, 2013, at 6:24 PM, Ilias Bertsimas <email address hidden> wrote:

> I can confirm that innodb_blocking_buffer_pool_restore=ON solves the
> issue but with a weird side-effect. Galera tries to restore the
> wsrep_position and with that option enabled it spends a lot of time on
> that part:
>
> 130205 13:10:38 mysqld_safe Starting mysqld daemon with databases from /data/mysql
> 130205 13:10:38 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.nci5tZ9xZF
> 130205 13:16:37 mysqld_safe WSREP: Recovered position 402367df-5fd0-11e2-0800-58b321ec9eec:518510084

The --wsrep_recover starts Innodb using our standard my.cnf, and it allocates and loads the full buffer pool (without logging as it seems).

>
> Almost 6 minutes spent above when without the innodb_blocking_buffer_pool_restore=ON it is pretty much instant.
> It is not that it restores the buffer pool pages at that point as the log indicates later on:
>
> 130205 13:16:45 InnoDB: Restoring buffer pool pages from ib_lru_dump
> 130205 13:22:52 InnoDB: Completed reading buffer pool pages (requested: 3178478, read: 3178076)
>
> What do you think ?

Again, Innodb blocks until the buffer pool is reloaded, again incurring the 6 minutes penalty (which just happens to be the time it takes your LRU dump to reload). This is what I would expect with innodb_blocking_buffer_pool_restore=ON

So, here's the issues as I see them:
- The mandatory wsrep_recover step has some interesting side-effects. Why do we even force run that? If we must do that, can we ignore certain variables in the config? (Huge buffer pools can slow this down, I've noticed too)
- LRU restore obviously has some side effect on cluster operation, but it's not clear why that should be the case. This is probably the real bug here.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen
Percona Live in Santa Clara, CA April 22nd-25th 2013
http://www.percona.com/live/mysql-conference-2013/

The buffer pool is clearly getting restored in both places:

On Feb 18, 2013, at 6:24 PM, Ilias Bertsimas <ilias@synthesio.com> wrote:

> I can confirm that innodb_blocking_buffer_pool_restore=ON solves the
> issue but with a weird side-effect. Galera tries to restore the
> wsrep_position and with that option enabled it spends a lot of time on
> that part:
> 
> 130205 13:10:38 mysqld_safe Starting mysqld daemon with databases from /data/mysql
> 130205 13:10:38 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.nci5tZ9xZF
> 130205 13:16:37 mysqld_safe WSREP: Recovered position 402367df-5fd0-11e2-0800-58b321ec9eec:518510084

The --wsrep_recover starts Innodb using our standard my.cnf, and it allocates and loads the full buffer pool (without logging as it seems).

> 
> Almost 6 minutes spent above when without the innodb_blocking_buffer_pool_restore=ON it is pretty much instant.
> It is not that it restores the buffer pool pages at that point as the log indicates later on:
> 
> 130205 13:16:45 InnoDB: Restoring buffer pool pages from ib_lru_dump
> 130205 13:22:52 InnoDB: Completed reading buffer pool pages (requested: 3178478, read: 3178076)
> 
> What do you think ?

Again, Innodb blocks until the buffer pool is reloaded, again incurring the 6 minutes penalty (which just happens to be the time it takes your LRU dump to reload).  This is what I would expect with innodb_blocking_buffer_pool_restore=ON

So, here's the issues as I see them:
- The mandatory wsrep_recover step has some interesting side-effects.  Why do we even force run that?  If we must do that, can we ignore certain variables in the config?  (Huge buffer pools can slow this down, I've noticed too)
- LRU restore obviously has some side effect on cluster operation, but it's not clear why that should be the case.  This is probably the real bug here.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen
Percona Live in Santa Clara, CA  April 22nd-25th 2013
http://www.percona.com/live/mysql-conference-2013/

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2013-02-28: Re: PXC innodb_buffer_pool_restore_at_startup and node joining cluster io resources race condition

#9

Jay,

It should be expected of the LRU restore process to have an impact on the cluster. LRU_restore causes a lot of IO activity especially for large bufferpools. This would cause excessive load on one of the nodes, and inadvertently cause problems to the cluster because that one node might not be able to keep up with cluster replication during the time LRU_restore is in progress.

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-02-28:

#10

Ovais,
I agree with you, but it depends on what symptoms are actually being seen. If the "cluster stalls" are really just flow control, then I agree this isn't technically a bug. If they are something worse, then there may be a larger problem.

Ilias, can you confirm the "stalls" you see? Are they just flow control because the node doing the LRU restore is busy? I would expect to see that node increasing wsrep_flow_control_sent during this window. Or, is it some other symptoms on the cluster at this time?

Revision history for this message

Ilias Bertsimas (d-ilias) wrote on 2013-02-28:

#11

Jay,

Yes it is flow control pause that stalls the cluster. That can be solved with innodb_blocking_buffer_pool_restore=ON
But the side effect I mentioned is the restoration of the buffer pool pages twice.

Once when wsrep tries to recover uuid:gtid in
WSREP: Running position recovery with --log_error=/tmp/tmp.nci5tZ9xZF
And the second time where it should recover the buffer pool pages if the node was not part of a galera cluster.

I suspect wsrep tries to read the uuid:gtid from the innodb tablespace and the blocking is in effect and triggers the buffer pool pages restore before allowing that to happen.

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-02-28: Re: [Bug 1108035] PXC innodb_buffer_pool_restore_at_startup and node joining cluster io resources race condition

#12

On Feb 28, 2013, at 8:37 AM, Ilias Bertsimas <email address hidden> wrote:

> Yes it is flow control pause that stalls the cluster. That can be solved with innodb_blocking_buffer_pool_restore=ON

Ok, great. This bit is "not a bug" technically.

> But the side effect I mentioned is the restoration of the buffer pool pages twice.
>
> Once when wsrep tries to recover uuid:gtid in
> WSREP: Running position recovery with --log_error=/tmp/tmp.nci5tZ9xZF
> And the second time where it should recover the buffer pool pages if the node was not part of a galera cluster.

Yes, precisely. The wsrep_recover should not do an LRU reload. It's not technically a bug, but it's superfluous and obviously extra delay bringing up the node.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen
Percona Live in Santa Clara, CA April 22nd-25th 2013
http://www.percona.com/live/mysql-conference-2013/

Revision history for this message

Ilias Bertsimas (d-ilias) wrote on 2013-02-28: Re: PXC innodb_buffer_pool_restore_at_startup and node joining cluster io resources race condition

#13

Jay,

Yes it is kind of an obstruction especially in one of our nodes with almost 200GB of buffer pool it takes almost an hour to bring it up due to the buffer pool pages restore run twice.

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2013-03-05:

#14

Ilias,

> I suspect wsrep tries to read the uuid:gtid from the innodb tablespace and the blocking is in effect and triggers the buffer pool pages restore before allowing that to happen.

Technically before doing so wsrep should wait for the InnoDB storage engine to report that its initialization is complete, which of course would not be complete until the LRU restore is complete when innodb_blocking_buffer_pool_restore=ON

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-04-04:

#15

This can be fixed by checking for wsrep_recovery variable and
not starting LRU thread if it is set, so, that LRU thread doesn't
run twice.

Changed in percona-xtradb-cluster:
milestone:	none → 5.5.30-24.8
assignee:	nobody → Raghavendra D Prabhu (raghavendra-prabhu)
status:	New → Triaged
summary:	- PXC innodb_buffer_pool_restore_at_startup and node joining cluster io - resources race condition + PXC innodb_buffer_pool_restore_at_startup and wsrep_recovery
tags:	added: lru restore wsrep

Raghavendra D Prabhu (raghavendra-prabhu) on 2013-04-08

Changed in percona-xtradb-cluster:
status:	Triaged → Fix Released
status:	Fix Released → Fix Committed

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-04-10:

#16

Wonder if this should be expanded to prevent some other things are unnecessary for wsrep_recover that cause it to be really slow. I'm thinking about large buffer pools and innodb_buffer_pool_populate, for example

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-04-10:

#17

@Jay,

Yes, I have already added it for buffer_pool_populate as well -
you can review the diff here -- https://bazaar.launchpad.net/~raghavendra-prabhu/percona-xtradb-cluster/bug-1108035/revision/398

Let me know of any other use cases where this may be needed.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-06-07:

#18

If you are hitting this regression with innodb_buffer_pool_populate, then you can do this:

--- mysqld_safe 2013-06-06 16:05:26.000000000 +0530
+++ /tmp/mysqld_safe 2013-06-07 15:27:54.660401912 +0530
@@ -222,7 +222,7 @@
log_notice "WSREP: Running position recovery with --log_error=$wr_logfile \
--pid-file="$DATADIR/`hostname`-recover.pid""

- eval_log_error "$mysqld_cmd --log_error=$wr_logfile --wsrep-recover \
+ eval_log_error "$mysqld_cmd --log_error=$wr_logfile --innodb_buffer_pool_populate=0 --wsrep-recover \
--pid-file="$DATADIR/`hostname`-recover.pid""

local rp="$(grep 'WSREP: Recovered position:' $wr_logfile)"

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-06-07:

#19

Note, due to a bug, mysqld --verbose --help won't show innodb_buffer_pool_populate but it is there.

Revision history for this message

Valerii Kravchuk (valerii-kravchuk) wrote on 2013-06-07:

#20

Can mysqld startup with 144G of buffer pool, innodb_buffer_pool_populate set BUT wsrep_recover NOT set be affected by this bug (or fixed by the patch) when IST ends up with:

...
130607 1:09:43 [ERROR] WSREP: Local state seqno (2249918697) is greater than group seqno (2249916946): states diverged. Aborting to avoid potential data loss. Remove '/data/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)
at galera/src/replicator_str.cpp:state_transfer_required():34

and node can join the cluster only after full state transfer? Should I report separate bug for this?

Should the manual mention this problem for PXC when describing innodb_buffer_pool_populate variable?

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-06-07:

#21

@Valerii, that doesn't seem to be related to this. Have you been able to repeat this with innodb_buffer_pool_populate ON and not repeat it with it being OFF? That looks like an entirely different issue. Yes, you can report a separate bug for this.

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-06-07:

#22

Valerii, Raghu, this does not even look like a bug. At first I'd make sure that there really was no state divergence.

Raghavendra D Prabhu (raghavendra-prabhu) on 2013-06-12

Changed in percona-xtradb-cluster:
status:	Fix Committed → Fix Released

Revision history for this message

Valerii Kravchuk (valerii-kravchuk) wrote on 2013-06-22:

#23

Yes, we had the problem with innodb_buffer_pool_populate=ON at every node restart after successful full state transfer. As soon as we set it to OFF, we had not got a problem.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

#24

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1285

	Status	Importance	Assigned to	Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Fix Released	Undecided	Raghavendra D Prabhu	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.5.31-23.7.5
5.6	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.14-25.1

Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

PXC innodb_buffer_pool_restore_at_startup and wsrep_recovery

Bug Description

Related branches

Other bug subscribers

Remote bug watches