Percona XtraDB Cluster - HA scalable solution for MySQL

GU_AVPHYS_SIZE can report more available memory than can be addressed on 32-bit systems

Reported by Raghavendra D Prabhu on 2013-07-23
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Galera
Medium
Alex Yurchenko
Percona XtraDB Cluster
Undecided
Unassigned

Bug Description

lp:1181347 is regressing on centos6-32

http://jenkins.percona.com/job/percona-xtrabackup-2.1-param/393/BUILD_TYPE=release,Host=centos6-32,xtrabackuptarget=galera55/testReport/junit/(root)/t_xb_galera_sst/sh/

=============================================
130723 15:14:13 [Note] WSREP: Some threads may fail to exit.
130723 15:14:13 [Note] WSREP: Setting wsrep_ready to 0
130723 15:14:13 [Note] WSREP: Read nil XID from storage engines, skipping position init
130723 15:14:13 [Note] WSREP: wsrep_load(): loading provider library '/home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/server/lib/libgalera_smm.so'
130723 15:14:13 [Note] WSREP: wsrep_load(): Galera 2.6(r152) by Codership Oy <email address hidden> loaded succesfully.
130723 15:14:13 [Warning] WSREP: Could not open saved state file for reading: /home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/var/w9/var1/data//grastate.dat
130723 15:14:13 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
130723 15:14:13 [Note] WSREP: Preallocating 134219040/134219040 bytes in '/home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/var/w9/var1/data//galera.cache'...
130723 15:14:13 [ERROR] WSREP: galerautils/src/gu_fifo.c:gu_fifo_create():102: Maximum FIFO size 9663938748 exceeds size_t range 4294967295
130723 15:14:13 [ERROR] WSREP: gcs/src/gcs.c:gcs_create():264: Failed to create recv_q.
130723 15:14:13 [ERROR] WSREP: gcs/src/gcs.c:gcs_create():310: Failed to create GCS connection handle.
130723 15:14:13 [Note] WSREP: Passing config to GCS: base_host = 127.0.0.1; base_port = 4567; cert.log_conflicts = no; debug = 1; gcache.dir = /home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/var/w9/var1/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /home/jenkins/workspace/percona-xtrabackup-2.1-param/BUILD_TYPE/release/Host/centos6-32/xtrabackuptarget/galera55/test/var/w9/var1/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 2147483647; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://127.0.0.1:31241; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
============================================

Note, that wsrep-debug=1 and debug=1 in wsrep_provider_options as requested.

Related branches

Alex Yurchenko (ayurchen) wrote :

I would not call it "regressing": there it was an undetected overflow resulting in a wrong available memory estimate. Here it is most likely bad initial length estimate in gcs.c rather than a bug in gu_fifo.c. What does getconf -a | grep PAGES say on that system?

Output of getconf -a on that host: https://gist.github.com/3ede88972f3719a96b50

Alex Yurchenko (ayurchen) wrote :

So you have 32Gb of RAM on a machine running 32-bit OS?
Reporting more *available* memory than can be addressed IMO is a bug in libc. But sure, we'll do a workaround.

Changed in galera:
assignee: nobody → Alex Yurchenko (ayurchen)
importance: Undecided → Medium
milestone: none → 23.2.7
status: New → Confirmed
summary: - lp:1181347 regresses on 32 bit builds
+ GU_AVPHYS_SIZE can report more available memory than can be addressed on
+ 32-bit systems

@#3,

yes, looks like that,

             total used free shared buffers cached
Mem: 32032 1236 30796 0 0 1236
-/+ buffers/cache: 0 32032
Swap: 512 2 509
Total: 32544 1238 31306

Actually, GU_AVPHYS_SIZE may not be ideal here.

a) It doesn't take into account fs cache.

b) On PAE systems, applications can address more than 4G. However, size_t on those systems is still 32 bit I presume and will still overflow.

http://compgroups.net/comp.unix.programmer/available-physical-memory/536419 has discussion relating to it.

There is get_avphys_pages , however, it is linux only.

OTOH, if the problem is only on linux/glibc, then this - https://bazaar.launchpad.net/~percona-dev/percona-xtradb-cluster/galera-2.x/revision/126 - should do. (I have tested this).

Alex Yurchenko (ayurchen) wrote :

Raghu,
1) I don't think that this is a Linux-specific problem, at least the link you provided above clearly mentioned 32-bit Solaris supporting 64Gb RAM.
2) Your patch does work and will work for a while, simply because instead of memory size in bytes you use memory size in pages. Which is not what you want. I'd just use min(GU_AVPHYS_SIZE, size_t(-1))

Regarding #2,

it was meant to be https://bazaar.launchpad.net/~percona-dev/percona-xtradb-cluster/galera-2.x/revision/126 than using the pages directly. Though I am not sure if this fixes that (testing on jenkins).

So, using get_avphys_pages also fails, I will test with min(GU_AVPHYS_SIZE, size_t(-1)) on jenkins.

Test with min(GU_AVPHYS_SIZE, size_t(-1)) works fine on jenkins.

Changed in galera:
status: Confirmed → In Progress
Alex Yurchenko (ayurchen) wrote :
Changed in galera:
status: In Progress → Fix Committed
Changed in percona-xtradb-cluster:
milestone: none → 5.5.33-23.7.6
status: New → Fix Committed
Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released
Changed in galera:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers