mysql-8.0 regressed on riscv64 due to new glibc

Bug #1915275 reported by Gianfranco Costamagna
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
glibc (Ubuntu)
Invalid
Low
Unassigned
mysql-8.0 (Ubuntu)
Fix Released
Critical
Robie Basak
php7.4 (Ubuntu)
Invalid
High
Unassigned

Bug Description

Hello, looks like mysql-8.0 can't run anymore its testsuite on riscv64, due to a probable bug in glibc.
See e.g.
https://launchpad.net/ubuntu/+source/mysql-8.0/8.0.23-1ubuntu1/+build/20983618

I'm currently building on bileto with release pocket, to have a fast migration, because this is blocking some reverse-dependencies such as boinc from building correctly on riscv64.
make[1]: Entering directory '/<<PKGBUILDDIR>>'
RULES.override_dh_auto_test
touch builddir/mysql-test/skiplist
# Tests that are known to be unstable on all platforms are skipped
# http://bugs.mysql.com/bug.php?id=83340
echo "main.xa_prepared_binlog_off : BUG#00000 - unstable test" >> builddir/mysql-test/skiplist
echo "main.mysql_client_test : BUG#100274 - unstable test" >> builddir/mysql-test/skiplist
echo "main.type_float : BUG#92375 - fails on ppc64el. Ref https://bugs.mysql.com/bug.php?id=92375" >> builddir/mysql-test/skiplist
echo "main.type_newdecimal : BUG#92375 - Same as above" >> builddir/mysql-test/skiplist
echo "main.type_ranges : BUG#92375 - Same as above" >> builddir/mysql-test/skiplist
# https://bugs.mysql.com/bug.php?id=86608
echo "main.mysqlpump_basic : BUG#00000 - needs openssl with zlib" >> builddir/mysql-test/skiplist
# Test is broken for 32bit. Fixed upstream, so remove in 8.0.12+
echo "main.window_functions_explain : BUG#00000 - broken on i386" >> builddir/mysql-test/skiplist
# Skip replication tests since they are timing sensitive and may
# result in false positives.
cd builddir/mysql-test && ./mtr --report-unstable-tests --parallel=8 --skip-rpl --suite=main --force --skip-test-list=./skiplist || true ;
Logging: /<<PKGBUILDDIR>>/mysql-test/mysql-test-run.pl --report-unstable-tests --parallel=8 --skip-rpl --suite=main --force --skip-test-list=./skiplist
terminate called after throwing an instance of 'std::bad_alloc'
  what(): std::bad_alloc
18:50:20 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x46000
/<<PKGBUILDDIR>>/builddir/runtime_output_directory/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x62) [0x2aac5bb7d0]
/<<PKGBUILDDIR>>/builddir/runtime_output_directory/mysqld(handle_fatal_signal+0x270) [0x2aab942e64]
linux-vdso.so.1(__vdso_rt_sigreturn+0) [0x3ff7fe0800]
/lib/riscv64-linux-gnu/libc.so.6(gsignal+0xa2) [0x3ff7420fec]
/lib/riscv64-linux-gnu/libc.so.6(abort+0xb4) [0x3ff741198c]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
mysql-test-run: *** ERROR: Could not find version of MySQL
cd builddir/mysql-test && ./mtr --report-unstable-tests --force innodb_fts.mecab_utf8
Logging: /<<PKGBUILDDIR>>/mysql-test/mysql-test-run.pl --report-unstable-tests --force innodb_fts.mecab_utf8
terminate called after throwing an instance of 'std::bad_alloc'
  what(): std::bad_alloc
18:50:24 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x46000
/<<PKGBUILDDIR>>/builddir/runtime_output_directory/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x62) [0x2aac5bb7d0]
/<<PKGBUILDDIR>>/builddir/runtime_output_directory/mysqld(handle_fatal_signal+0x270) [0x2aab942e64]
linux-vdso.so.1(__vdso_rt_sigreturn+0) [0x3ff7fe0800]
/lib/riscv64-linux-gnu/libc.so.6(gsignal+0xa2) [0x3ff7420fec]
/lib/riscv64-linux-gnu/libc.so.6(abort+0xb4) [0x3ff741198c]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
mysql-test-run: *** ERROR: Could not find version of MySQL
make[1]: *** [debian/rules:154: override_dh_auto_test] Error 1
make[1]: Leaving directory '/<<PKGBUILDDIR>>'
make: *** [debian/rules:226: build-arch] Error 2
dpkg-buildpackage: error: debian/rules build-arch subprocess returned exit status 2

Revision history for this message
Balint Reczey (rbalint) wrote :

Build-time tests are disabled on riscv64 by default: LP: #1891686.
This may be a real regression, but in itself this should not be enough to hold back glibc from migrating.

Please disable the test on riscv64 if needed in mysql-8.0 and we can triage this separately.

Changed in glibc (Ubuntu):
importance: Undecided → Low
tags: added: update-excuse
removed: block-proposed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

i wonder if that also fails on more recent riscv64 kernels, i.e. on 5.8.

however our builders are still on v5.4

Revision history for this message
Robie Basak (racb) wrote :

src:mysql-8.0 is supposed to honour nocheck already, I think, via debhelper (it only overrides dh_auto_test). Did you get the test run because the riscv64 nocheck disablement doesn't apply to bileto?

Revision history for this message
Robie Basak (racb) wrote :

> ...because the riscv64 nocheck disablement doesn't apply to bileto?

To be clear, I'm not asserting that; I'm asking.

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) wrote :

The only difference between bileto and the archive build is that bileto has been run against only release pocket, nothing else.

So tests seems to be run on both release/proposed/bileto.
Ok about disabling the test run, but in any case the bug looks real and breaking the whole mysql on riscv64.

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) wrote :

I'm going to disable the tests for mysql-8.0 on riscv64, but this is a real regression and now riscv64 will have a broken mysql-8.0 then.

Revision history for this message
Balint Reczey (rbalint) wrote :

It is a real regression, but needs triaging and should not hold back glibc's landing.

tags: added: server-next
Revision history for this message
Sebastien Bacher (seb128) wrote :

That's now making php7.4 fails to build on riscv64 which is needed to complete the libzip transition

Changed in mysql-8.0 (Ubuntu):
importance: Undecided → High
tags: added: rls-hh-incoming
Changed in php7.4 (Ubuntu):
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was able to reproduce this in a riscv64 qemu emulation.
IMHO one central issue in handling the bug so far was the assumption that this "only" breaks the tests. It does not, it makes mysql-server totally unusable/uninstallable.
And by uninstallable it also FTBFSes all that is listed by
  $ reverse-depends --release hirsute --build-depends src:mysql-8.0

An immediate search for this to be reported upstream didn't show anything. But OTOH I'm not knowing enough yet what to search for since std::bad_alloc is a very generic error in C++.
We'll need to at least find what happens underneath and if possible what in glibc changed to trigger the issue.

Changed in mysql-8.0 (Ubuntu):
status: New → Confirmed
importance: High → Critical
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note for everyone that wants to retry, follow https://wiki.ubuntu.com/RISC-V
which works fine in e.g. a hirsute LXD container without messing anything up in your main system.

After eliminating all of the postinst and config this can be triggered by the test-for-startability call like:

$ mysqld --verbose --help --innodb-read-only
terminate called after throwing an instance of 'std::bad_alloc'
  what(): std::bad_alloc
07:55:23 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x46000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x62) [0x2ae71187d0]
/usr/sbin/mysqld(handle_fatal_signal+0x270) [0x2ae649fe64]
linux-vdso.so.1(__vdso_rt_sigreturn+0) [0x3fe1fba800]
/lib/riscv64-linux-gnu/libc.so.6(gsignal+0xa2) [0x3fe13f9fec]
/lib/riscv64-linux-gnu/libc.so.6(abort+0xb4) [0x3fe13ea98c]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was following:
- https://dev.mysql.com/doc/refman/8.0/en/using-gdb-on-mysqld.html I
- https://dev.mysql.com/doc/refman/8.0/en/crashing.html

Which mostly comes down to:
$ apt source mysql-server-8.0
$ apt install gdb mysql-server-core-8.0-dbgsym libc6-dbg libstdc++6-dbgsym
$ sudo gdb /usr/sbin/mysqld
(gdb) run --skip-stack-trace --gdb --core-file --general-log --general-log-file --verbose --innodb-read-only --help --debug

To be clear - in a normal working environment that would not even start the server for real - it would initialize a bit and then report on the used and possible config options.

For comparison in a similar environment I'll also use 8.0.23-0ubuntu0.20.04.1 in Focal.
Hoping that this might help to differentiate the noise from data.
But since things work there I'll need to find which function to break on ...

I'll attach an initial backtrace here ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

From the backtrace at frist we see:

At the start of mysql it calls init_server_components -> delegates_init.
That then initializes a new object
  alignas(Trans_delegate) static char place_trans_mem[sizeof(Trans_delegate)];
  ...
  transaction_delegate = new (place_trans_mem) Trans_delegate;

That then triggers the allocation of a lock::Shared_spin_lock::Shared_spin_lock
Which itself uses the default constructor:
  Shared_spin_lock() = default;

That then goes into memory::Aligned_atomic<long>::Aligned_atomic

There (at least to my naive view) it seems to go terribly wrong with it's size assumptions because if sz is the size in bytes then
  #6 0x0000003ff76389b8 in operator new (sz=18446744073709551615)
clearly is too much.

...
#6 0x0000003ff76389b8 in operator new (sz=18446744073709551615)
    at ../../../../src/libstdc++-v3/libsupc++/new_op.cc:54
        handler = <optimized out>
        p = <optimized out>
#7 0x0000003ff76389d6 in operator new[] (sz=<optimized out>)
    at ../../../../src/libstdc++-v3/libsupc++/new_opv.cc:32
No locals.
#8 0x0000002aac352ed8 in memory::Aligned_atomic<long>::Aligned_atomic (
    this=0x2aae2463d0 <delegates_init()::place_trans_mem+112>)
    at ./sql/memory/aligned_atomic.h:282
No locals.
#9 memory::Aligned_atomic<long>::Aligned_atomic (value=0,
    this=0x2aae2463d0 <delegates_init()::place_trans_mem+112>)
    at ./sql/memory/aligned_atomic.h:287
...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In my comparison on Focal (older libc / stdc++) all calls to "operator new[]" and "operator new" had sz=0 like:
  Breakpoint 5, operator new (sz=0) at ../../../../src/libstdc++-v3/libsupc++/new_op.cc:47

The first operator new that didn't have sz=0 in this good case was a totally different backtrace at:
Breakpoint 5, operator new (sz=sz@entry=72)
    at ../../../../src/libstdc++-v3/libsupc++/new_op.cc:47
47 in ../../../../src/libstdc++-v3/libsupc++/new_op.cc
(gdb) bt
#0 operator new (sz=sz@entry=72) at ../../../../src/libstdc++-v3/libsupc++/new_op.cc:47
#1 0x0000002add4f7104 in gtid_server_init () at ./sql/mysqld.cc:2372
#2 0x0000002add502c20 in init_server_components () at ./sql/mysqld.cc:5872
#3 0x0000002add509054 in mysqld_main (argc=<optimized out>, argv=<optimized out>)
    at ./sql/mysqld.cc:7097
#4 0x0000003fbc0d3204 in __libc_start_main (main=0x2add4c4084 <main(int, char**)>,
    argc=<optimized out>, argv=0x3fffe67528, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=<optimized out>) at libc-start.c:308
#5 0x0000002add4f18cc in _start () at ./include/my_sys.h:864

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok I found in the stdc++ implementation of "new" that sz indeed is passed to malloc.
  void *ptr = std::malloc(sz))
Thereby it indeed means "bytes" and that means ~18 exabyte which is slightly more than most systems can offer :-)

The question now becomes, where is this value created/derived from and can we reproduce this maybe even outside of the complexity of mysql. To me it starts to seem more like an issue in the lib, but to know that for sure we will need to track down where this value really comes from.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I have separated the problem from all the mysql code.
This file here works on x86 and on older riscv64.
But if ran on current hirsute riscv64 it breaks as mysql-server-8.0 does.

I further simplified it and attach it hereby for debugging.
It seems that the detection of the cache-line-size is failing and from there things go south.

riscv64 @ Hirsute
CL 18446744073709551615
terminate called after throwing an instance of 'std::bad_alloc'
  what(): std::bad_alloc
Aborted (core dumped)

riscv64 @ Focal
CL 0
static 32@0x3fff86f2c0

x86 @ Hirsute
CL 64
static 32@0x7fff300363a0

So it seems what is broken is "sysconf(_SC_LEVEL1_DCACHE_LINESIZE);"
On some platforms that returns good values (e.g. x86) and on others it used to return "0".
Mysql had code to cover the "0" case, but the new libc@riscv64 we get a crazy high value and that breaks all that we see.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

And that awkward number if directly consumed is just "-1"

$ cat test-sysconf.cpp
#include <iostream>
#include <unistd.h>

int main() {
    std::cout << "_SC_LEVEL1_DCACHE_LINESIZE = " << sysconf(_SC_LEVEL1_DCACHE_LINESIZE) << "\n";
    return 0;
}

$ rm test-sysconf; g++ -Wall -o test-sysconf test-sysconf.cpp && ./test-sysconf

riscv64 @ Hirsute
_SC_LEVEL1_DCACHE_LINESIZE = -1

riscv64 @ Focal
_SC_LEVEL1_DCACHE_LINESIZE = 0

So it changed the "I can't get this" result from 0 to -1 and that is what broke us.

This does not talk about the proper fail RC :-/
https://www.gnu.org/software/libc/manual/html_node/Constants-for-Sysconf.html

Man page mentions it:
RETURN VALUE
       The return value of sysconf() is one of the following:

       * On error, -1 is returned and errno is set to indicate the cause of the error (for example, EINVAL, indicating that name is invalid).

       * If name corresponds to an option, a positive value is returned if the option is supported, and -1 is returned if the option is not supported.

So I guess we need to teach mysql to handle that well.

Changed in mysql-8.0 (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Submitted to Debian (for an upload there into experimental and into Hirsute)
=> https://salsa.debian.org/mariadb-team/mysql/-/merge_requests/46
Test PPA:
=> https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4487/+packages

Balint Reczey (rbalint)
Changed in glibc (Ubuntu):
status: New → Invalid
Revision history for this message
Bryce Harrington (bryce) wrote :

Excellent work tracking this down Christian; I appreciated being able to follow along with your detailed analysis.

Marking this not a php7.4 bug specifically, but will make sure php7.4 rebuilds once this is in.

Changed in php7.4 (Ubuntu):
status: New → Invalid
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.5 KiB)

FYI Builds completed just fine for all arches in https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4487/+packages

Furthermore using that PPA it really works to install now

ubuntu@ubuntu:~$ sudo apt dist-upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages will be REMOVED:
  mysql-server-core-8.0-dbgsym
The following NEW packages will be installed:
  libmd0
The following packages will be upgraded:
  libbsd0 mysql-client-8.0 mysql-client-core-8.0 mysql-server mysql-server-8.0 mysql-server-core-8.0
6 upgraded, 1 newly installed, 1 to remove and 0 not upgraded.
2 not fully installed or removed.
Need to get 20.8 MB/25.1 MB of archives.
After this operation, 277 MB disk space will be freed.
Do you want to continue? [Y/n] Y
Get:1 http://ppa.launchpad.net/ci-train-ppa-service/4487/ubuntu hirsute/main riscv64 mysql-server-8.0 riscv64 8.0.23-3ubuntu1~ppa1 [1185 kB]
Get:2 http://ppa.launchpad.net/ci-train-ppa-service/4487/ubuntu hirsute/main riscv64 mysql-server-core-8.0 riscv64 8.0.23-3ubuntu1~ppa1 [19.6 MB]
Fetched 20.8 MB in 6s (3499 kB/s)
Preconfiguring packages ...
(Reading database ... 76902 files and directories currently installed.)
Preparing to unpack .../mysql-client-core-8.0_8.0.23-3ubuntu1~ppa1_riscv64.deb ...
Unpacking mysql-client-core-8.0 (8.0.23-3ubuntu1~ppa1) over (8.0.23-3build1) ...
Preparing to unpack .../mysql-client-8.0_8.0.23-3ubuntu1~ppa1_riscv64.deb ...
Unpacking mysql-client-8.0 (8.0.23-3ubuntu1~ppa1) over (8.0.23-3build1) ...
(Reading database ... 76902 files and directories currently installed.)
Removing mysql-server-core-8.0-dbgsym (8.0.23-3build1) ...
(Reading database ... 76827 files and directories currently installed.)
Preparing to unpack .../mysql-server-8.0_8.0.23-3ubuntu1~ppa1_riscv64.deb ...
Unpacking mysql-server-8.0 (8.0.23-3ubuntu1~ppa1) over (8.0.23-3build1) ...
Preparing to unpack .../mysql-server-core-8.0_8.0.23-3ubuntu1~ppa1_riscv64.deb ...
Unpacking mysql-server-core-8.0 (8.0.23-3ubuntu1~ppa1) over (8.0.23-3build1) ...
Preparing to unpack .../mysql-server_8.0.23-3ubuntu1~ppa1_all.deb ...
Unpacking mysql-server (8.0.23-3ubuntu1~ppa1) over (8.0.23-3build1) ...
Selecting previously unselected package libmd0:riscv64.
Preparing to unpack .../libmd0_1.0.3-3build1_riscv64.deb ...
Unpacking libmd0:riscv64 (1.0.3-3build1) ...
Preparing to unpack .../libbsd0_0.11.3-1build1_riscv64.deb ...
Unpacking libbsd0:riscv64 (0.11.3-1build1) over (0.10.0-1) ...
Setting up mysql-client-core-8.0 (8.0.23-3ubuntu1~ppa1) ...
Setting up mysql-server-core-8.0 (8.0.23-3ubuntu1~ppa1) ...
Setting up libmd0:riscv64 (1.0.3-3build1) ...
Setting up mysql-client-8.0 (8.0.23-3ubuntu1~ppa1) ...
Setting up libbsd0:riscv64 (0.11.3-1build1) ...
Setting up mysql-server-8.0 (8.0.23-3ubuntu1~ppa1) ...
Renaming removed key_buffer and myisam-recover options (if present)
mysqld will log errors to /var/log/mysql/error.log
mysqld is running as pid 2684
Created symlink /etc/systemd/system/multi-user.target.wants/mysql.service → /lib/systemd/system/mysql.service.
Setting up mysql-server (8.0.23-3ubuntu1~ppa1) ...
Processing trigge...

Read more...

Changed in mysql-8.0 (Ubuntu):
assignee: nobody → Robie Basak (racb)
Robie Basak (racb)
Changed in mysql-8.0 (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mysql-8.0 - 8.0.23-3ubuntu1

---------------
mysql-8.0 (8.0.23-3ubuntu1) hirsute; urgency=medium

  [ Frans Spiesschaert ]
  * Update nl.po translation file (Closes: #970039)

  [ Helge Kreutzmann ]
  * Update de.po translation file (Closes: #968847)

  [ Christian Ehrhardt ]
  * d/p/lp-1915275-fix-handling-of-SC_LEVEL1_DCACHE_LINESIZE.patch: unbreak
    mysql on riscv64 (LP: #1915275)

 -- Robie Basak <email address hidden> Sat, 13 Mar 2021 02:13:37 +0000

Changed in mysql-8.0 (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.