MySQL-5.7: default log-tc-size too small on POWER. 3 * ( 64K page size) minimum needed

Bug #1706291 reported by bugproxy on 2017-07-25
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
The Ubuntu-power-systems project
High
David Britton
mysql-5.7 (Ubuntu)
High
Robie Basak

Bug Description

Steps-to-reproduce:
  sudo vim /etc/apt/sources.list
  sudo apt update
  sudo apt install ubuntu-dev-tools build-essential
  sudo apt-get build-dep mysql-5.7
  apt source mysql-5.7
  cd mysql-5.7-5.7.19/
  vim debian/rules
    # enable -DENABLE_DOWNLOADS=1 \
  DEB_BUILD_OPTIONS=parallel=160 dpkg-buildpackage -us -uc -nc
  cd builddir
  make gunit_large
  ./unittest/gunit/merge_large_tests-t --gtest_filter=TCLogMMapTest.TClogCommit
  (crash)

== Comment: #0 - Daniel Black
---Problem Description---
MySQL-5.6/5.7: default log-tc-size too small on POWER - 3 * ( 64K page size) minimum needed

---uname output---
Linux p87 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:09:19 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = Any P8

---Steps to Reproduce---

Perform a large number of XA transactions without a binary log configured. Its usually triggered in the MTR test run.

As Power's page size is 64K by default the assumption that 3*8K was sufficient applies to x86 but not Power.

Userspace tool common name: MySQL-5.7

The userspace tool has the following bit modes: both

Userspace rpm: MySQL-5.7

Userspace tool obtained from project website: na

Oracle have corrected this in 8.0 https://bugs.mysql.com/bug.php?id=80818 (https://github.com/mysql/mysql-server/commit/62b80f7d9db06d0edecf5a277e6a7fc489d806d5). A 5.7 backport is here https://bugs.mysql.com/bug.php?id=87175.

Alternately (and more minimally) Alexey's patch in https://bugs.mysql.com/bug.php?id=80818 is sufficient (though replace 65535 -> 65536 in patch).

--- Work arounds --

Set log-tc-size=196k or larger in my.cnf at startup.

bugproxy (bugproxy) on 2017-07-25
tags: added: architecture-ppc64le bugnameltc-156947 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → dbf2mysql (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Server Team (canonical-server)
Changed in ubuntu-power-systems:
importance: Undecided → High
affects: dbf2mysql (Ubuntu) → mysql-5.7 (Ubuntu)
Manoj Iyer (manjo) on 2017-07-31
tags: added: triage-g
Manoj Iyer (manjo) on 2017-08-21
Changed in mysql-5.7 (Ubuntu):
importance: Undecided → High
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → David Britton (davidpbritton)
Manoj Iyer (manjo) on 2017-09-11
tags: added: triage-r
removed: triage-g
Changed in ubuntu-power-systems:
assignee: Canonical Server Team (canonical-server) → David Britton (davidpbritton)
David Britton (davidpbritton) wrote :

Hello -- please include a report of what Ubuntu environment this has been seen on.

Changed in mysql-5.7 (Ubuntu):
status: New → Incomplete
Changed in ubuntu-power-systems:
status: New → Incomplete
Daniel Black (daniel-black) wrote :
Download full text (3.6 KiB)

On xenial. I've used the ENABLE_DOWNLOADS to get the google-mock/test suite to get the unit test TCLogMMapTest.TClogCommit to compile.

$ sudo apt-get builddep mysql-5.7
$ apt source mysql-5.7
$ cd mysql-5.7-5.7.19/
$ vi debian/rules
add to cmake so the google-mock/test is build - getting it to use a package google-mock requires too much editing:

                -DENABLE_DOWNLOADS=1 \

$ DEB_BUILD_OPTIONS=parallel=160 dpkg-buildpackage -us -uc -nc 2>&1 | tee ~/mysql-5.7-package-build.log

cd builddir
make gunit_large

$ gdb unittest/gunit/merge_large_tests-t
(gdb) set args --gtest_filter=TCLogMMapTest.TClogCommit
(gdb) break TC_LOG_MMAP::open
(gdb) run
..
Breakpoint 1, 0x000000001086b938 in TC_LOG_MMAP::open (this=0x11b321d0, opt_name=0x7fffffffea08 "tc_log_mmap_test_105828") at /home/danielgb/mysql-5.7-5.7.19/sql/tc_log.cc:92
92 {
(gdb) p opt_tc_log_size
$6 = 24576
(gdb) n
n
n
n
n
(gdb)
106 fn_format(logname,opt_name,mysql_data_home,"",MY_UNPACK_FILENAME);
(gdb) p tc_log_page_size
$8 = 65536

n
117 file_length= opt_tc_log_size;
(gdb)
n
n
n
137 data= (uchar *)my_mmap(0, (size_t)file_length, PROT_READ|PROT_WRITE,
(gdb)
139 if (data == MAP_FAILED)
(gdb)
138 MAP_NOSYNC|MAP_SHARED, fd, 0);
(gdb) p file_length
$9 = 24576
(gdb) n
139 if (data == MAP_FAILED)
(gdb)
146 npages=(uint)file_length/tc_log_page_size;
(gdb) n
148 if (!(pages=(PAGE *)my_malloc(key_memory_TC_LOG_MMAP_pages,
                                        npages*sizeof(PAGE), MYF(MY_WME|MY_ZEROFILL))))

(gdb) n
(gdb)
152 for (pg=pages, i=0; i < npages; i++, pg++)
(gdb) p npages
$16 = 0
(gdb) p pages
$17 = (TC_LOG_MMAP::st_page *) 0x11b324f0
(gdb) p key_memory_TC_LOG_MMAP_pages
$18 = 0

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x000000001086bcec in TC_LOG_MMAP::open (this=0x11b321d0, opt_name=<optimised out>) at /home/danielgb/mysql-5.7-5.7.19/sql/tc_log.cc:166
166 izeof(tc_l...

Read more...

So [2] being the fix but [1] with s/65535/65536/ would be the minimal SRU here.
One would have to test the comments on c#4 if they would serve as steps to reproduce and verification. I guess using that is better than needing a gigantic DB, but one has to confirm these are useful as verification steps.
Trying to verify with these steps on artful now.

@David : overall are you managing that case to eventually have Lars or Robie include a patch?

[1]: https://bugs.mysql.com/file.php?id=24112&bug_id=80818
[2]: https://bugs.mysql.com/file.php?id=25648&bug_id=87175

Changed in mysql-5.7 (Ubuntu):
status: Incomplete → New
Changed in ubuntu-power-systems:
status: Incomplete → New
Changed in mysql-5.7 (Ubuntu):
assignee: David Britton (davidpbritton) → Robie Basak (racb)

Hi Daniel,
I tried if I could use your gdb example to verify the issue which would then serve as vreification on the SRU as well.
I don't need the gdb part to do so, but while going on I found that all works find until this step:
$ make gunit_large

That was in artful, so that seems to be renamed in latter versions maybe?
Or more than the enable downloads needed?
Trying the last version from Xenial (which also is 5.7.19-0ubuntu0.16.04.1) ended up the same way.

OTOH enable downloads sounded suspicious as I usually have only connection to a reduced set of canonical systems, so I set up proxies and re-executed the steps.
That was it ending up in:
$ ./unittest/gunit/merge_large_tests-t --gtest_filter=TCLogMMapTest.TClogCommit
# Running 1 test from 1 test case
1..1
# Global test environment set-up
# Run 1 TCLogMMapTest.TClogCommit
15:04:56 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 59998 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x1001bc465f0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fffcc268aa0 thread_stack 0x40000
./unittest/gunit/merge_large_tests-t[0x10b328f0]
./unittest/gunit/merge_large_tests-t[0x10901354]
[0x74ed428504d8]
./unittest/gunit/merge_large_tests-t[0x108c4b48]
./unittest/gunit/merge_large_tests-t[0x108c4a34]
./unittest/gunit/merge_large_tests-t[0x1021e274]
./unittest/gunit/merge_large_tests-t[0x110a4134]
./unittest/gunit/merge_large_tests-t[0x110949f0]
./unittest/gunit/merge_large_tests-t[0x11094d78]
./unittest/gunit/merge_large_tests-t[0x110951bc]
./unittest/gunit/merge_large_tests-t[0x11095be0]
./unittest/gunit/merge_large_tests-t[0x11095e5c]
./unittest/gunit/merge_large_tests-t[0x10061ff0]
/lib/powerpc64le-linux-gnu/libc.so.6(+0x2291c)[0x74ed4211291c]
/lib/powerpc64le-linux-gnu/libc.so.6(__libc_start_main+0xb8)[0x74ed42112b18]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 1
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

So this can be used as steps-to-reproduce

description: updated

Added summarized steps to the description to be reusable in SRU templates later on.
Open for Robie/Lars to be checked as a patch through all kind of extra verifications they usually have.

Daniel Black (daniel-black) wrote :

An alternate to ENABLE_DOWNLOAD

apt install googletest (needs 1.8.0 - so zesty or artful)

sudo ln -s /usr/src/googletest /usr/src/googletest-release-1.8.0

and -DDOWNLOAD_ROOT=/usr/src in debian/rules instead of -DENABLE_DOWNLOADS=1

unit tests will be build then.

Daniel Black (daniel-black) wrote :

FWIW the gdb was to show that 0 memory pages was allocated as file_size, aka opt_tc_log_size (24k based on the minimum and default log-tc-size setting), is less than tc_log_page_size (64k Power page size)

(gdb)
146 npages=(uint)file_length/tc_log_page_size;
(gdb) p npages
$16 = 0
(gdb) p pages
$17 = (TC_LOG_MMAP::st_page *) 0x11b324f0
(gdb) p key_memory_TC_LOG_MMAP_pages
$18 = 0

Robie Basak (racb) wrote :

I've asked upstream to backport their fix from 8.0 to 5.7 as David linked at https://bugs.mysql.com/bug.php?id=80818. In time, this should mean that the fix will arrive for 5.7 and be updated in all supported releases using 5.7 - either directly or together with a security update, which seems likely.

In the meantime, we could use the minimal fix Christian identified in comment 5 in both development and stable releases. That feels like the most frictionless way to fix this for me. The patch is trivial enough that it should be much easier for us to maintain it until upstream have a backport integrated. In the meantime, if there were any future difficulty, I appreciate the long backport already prepared is linked to from here, so the more full backport could always be used at the time of a rebase if necessary.

I'd like to bundle this with a fix for bug 1706281 also. Daniel, have you made any progress there please?

If a fix for this is more urgent than a fix for that other bug, then I'm not opposed to landing this first. We can only do one at a time though, and expect the process to take two to four weeks, perhaps more, for a fix to land in the stable release. Please let me know if you'd like to get this patch in first.

Separately it is likely that there will be a security update included in Oracle's next quarterly security roundup which is due on the 17th, as it is unusual for MySQL to not receive security updates on their cadence. We can test and land a development fix now, as that will be relatively quick. An SRU would best be coordinated against the likely security update.

Robie Basak (racb) on 2017-10-17
Changed in mysql-5.7 (Ubuntu):
status: New → Incomplete
Daniel Black (daniel-black) wrote :

I've updated bug 1706281. No reproducible test case yet. 1706281 is more important that this one (if I can reproduce the fault yet).

If I don't get a confirmation for #1706281 in the next few days can you please process this one separately.

I've also checked and the latest 5.7.20 release doesn't include a fix for this bug.

Daniel Black (daniel-black) wrote :

https://bugs.mysql.com/bug.php?id=87995 coming in 5.7.21

Thanks Robie for getting traction there.

Changed in ubuntu-power-systems:
status: New → Incomplete
tags: added: triage-a
removed: triage-r

------- Comment From <email address hidden> 2018-01-29 00:39 EDT-------
After discussing with Daniel, where he said -
"5.7.21 hasn't been released yet however since its coming in a few months I think it can be closed."

Closing this bugzilla..

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-31 01:11 EDT-------
Hi Canonical,
Before closing..
Can you tell us which release will come with this 5.7.21 package ?

Dimitri John Ledkov (xnox) wrote :

https://launchpad.net/ubuntu/+source/mysql-5.7

5.7.21-0ubuntu0.16.04.1 shipped in xenial on 2018-01-22

5.7.21-0ubuntu0.17.10.1 shipped in artful on 2018-01-22

Yet to be included in development release bionic.

Robie Basak (racb) wrote :

> Yet to be included in development release bionic.

Lars is working on it. I expect the merge to land before Bionic is released.

Manoj Iyer (manjo) on 2018-02-12
Changed in ubuntu-power-systems:
status: Incomplete → Won't Fix
Changed in mysql-5.7 (Ubuntu):
status: Incomplete → Won't Fix
status: Won't Fix → Incomplete
Changed in ubuntu-power-systems:
status: Won't Fix → Incomplete
Changed in mysql-5.7 (Ubuntu):
status: Incomplete → In Progress
Changed in ubuntu-power-systems:
status: Incomplete → In Progress
tags: added: triage-g
removed: triage-a
Robie Basak (racb) wrote :

5.7.21 is now in Bionic, so I believe this bug is fixed on all supported Ubuntu releases that have MySQL 5.7 (Xenial onwards).

Changed in mysql-5.7 (Ubuntu):
status: In Progress → Fix Released
Changed in ubuntu-power-systems:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.