FTBFS: mariadb fails to start due to low MEMLOCK limit
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mariadb-10.6 (Ubuntu) |
Fix Released
|
Undecided
|
Andreas Hasenack | ||
Jammy |
Fix Released
|
High
|
Andreas Hasenack | ||
systemd (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Jammy |
Won't Fix
|
High
|
Unassigned |
Bug Description
[Impact]
Jammy's MariaDB will fail to build, and also fail to start, if the underlying kernel is 5.4.x (focal's) and if it's running in an unprivileged container (lxd, docker). It will also fail to build in launchpad builders.
Common scenarios where this combination exists is a focal host, running an unprivileged jammy container (lxd or docker), or even a chroot (the launchpad builders).
Jammy's MariaDB was built with io_uring support, and it tries to enable it at runtime if it deems it's running on a supported kernel. There is a range of kernel versions it checks, but of interest for this SRU, the Focal (5.4.x) kernel is inside this range and io_uring will be enabled. The jammy kernel (5.15) is not, so io_uring will not be enabled there, and thus the bug is not manifested in that case.
If io_uring is enabled, a higher MEMLOCK limit is required than what is set by default in focal or jammy (64Mb is what we get, 256Mb or more is required).
The systemd unit file for mariadb tries to raise this limit, but in an unprivileged container this won't work.
MariaDB has checks in place to catch when MEMLOCK is too low, in which case it will not use io_uring, but these checks are failing because of the LTO build flags that were used in the jammy build of mariadb. It's unclear if it's a bug in gcc or something else. There is more information in comment #1, comment #5 and later.
The suggested fix here is to disable LTO for the jammy build. This has been done for kinetic already, and is also applied to the debian packaging (inside a distro-specific conditional).
[Test Plan]
The test plan is to launch mariadb in a jammy lxd container running on a focal host.
lxc launch ubuntu:focal f --vm
lxc shell f
lxd init # just hit enter for all questions
lxc launch ubuntu:jammy j
lxc shell j
ulimit -l # confirm it's less than 256
apt update && apt install mariadb-server -y
After the installation, mariadb will not be running, and this can be checked with:
systemctl status mariadb.service
or
journalctl -u mariadb.server
You will see something like this:
Jun 17 18:32:01 jammy-mariadb systemd[1]: Starting MariaDB 10.6.7 database server...
Jun 17 18:32:01 jammy-mariadb mariadbd[1864]: 2022-06-17 18:32:01 0 [Note] /usr/sbin/mariadbd (server 10.6.7-
Jun 17 18:32:01 jammy-mariadb mariadbd[1864]: 2022-06-17 18:32:01 0 [Note] InnoDB: The first data file './ibdata1' did not exist. A new tablespace will be created!
Jun 17 18:32:01 jammy-mariadb mariadbd[1864]: 2022-06-17 18:32:01 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
Jun 17 18:32:01 jammy-mariadb mariadbd[1864]: 2022-06-17 18:32:01 0 [Note] InnoDB: Using transactional memory
Jun 17 18:32:01 jammy-mariadb mariadbd[1864]: 2022-06-17 18:32:01 0 [Note] InnoDB: Number of pools: 1
Jun 17 18:32:01 jammy-mariadb mariadbd[1864]: 2022-06-17 18:32:01 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
Jun 17 18:32:01 jammy-mariadb mariadbd[1864]: 2022-06-17 18:32:01 0 [Warning] mariadbd: io_uring_
Jun 17 18:32:01 jammy-mariadb mariadbd[1864]: 220617 18:32:01 [ERROR] mysqld got signal 6 ;
And a crash dump.
With the fixed version, the service will be running normally after installation.
[Where problems could occur]
The proposed fix is not a surgical strike. It's unfortunate that we didn't get to the bottom of why LTO is causing this behavior. Reverting it is still the quickest and less risky change at the moment, though. This gets us on par with upstream binary builds, and debian builds, and these also have wide test coverage and ample user base.
The other regression possibility is that this is a rebuild of mariadb in the current jammy environment, and the package that is currently in jammy was built on March 10th, 2022. Most likely the reverse dependencies were updated in jammy since then.
It's unclear how 10.6.7-2ubuntu1 migrated in jammy. I checked build logs and dep8 logs, and can't tell why the tests passed. At least the build log in jammy shows the host kernel was 5.4.x, so it should have been affected. My only explanation is that at that time, the MEMLOCK limit was higher in that environment for some reason, and didn't trigger this bug. Then at some point later, launchpad builders changed, and MEMLOCK was reduced to 64Mb. https:/
[Other Info]
Not at this time.
[Original Description]
<rbasak> ahasenack: IIRC, originally Launchpad was FTBFSing on mariadb that included io_uring support because upstream were doing a build time test for io_uring (and I think still are), which is wrong because it should be done at runtime since the lack of io_uring availablity at build time doesn't tell us about its availablity at runtime.
<rbasak> But then the Launchpad builders got updated to a newer release and therefore a newer kernel that supported it.
<rbasak> AIUI, that's how we ended up with a successful build in the Jammy release pocket (of 10.6).
<ahasenack> I think the lp builders are using the focal hwe kernel
<ahasenack> 5.4.0-something
<ahasenack> let me check that build log
<rbasak> But then something changed that caused this current FTBFS, and I haven't tracked down what that is.
<ahasenack> hm, both are 10.6.7
<ahasenack> release and proposed
<rbasak> What puzzles me is that if the root cause is a memlock rlimit issue then why did it work before?
<rbasak> So since there's a contradiction somewhere, maybe one or more of my "facts" above is wrong.
<ahasenack> this is the current failure
<ahasenack> 2022-04-14 8:11:49 0 [Warning] mariadbd: io_uring_
<ahasenack> and ulimit -l confirms that the limit is lower
<ahasenack> Max locked memory 65536 65536 bytes
<ahasenack> just 64kbytes
<rbasak> Yeah but then how did the release pocket build work?
<ahasenack> either the limit was different back then
<ahasenack> or ... stuff
Related branches
- git-ubuntu bot: Approve
- Sergio Durigan Junior (community): Approve
- Canonical Server Reporter: Pending requested
- Canonical Server Reporter: Pending requested
- Canonical Server Reporter: Pending requested
- Canonical Server: Pending requested
-
Diff: 32 lines (+13/-0)2 files modifieddebian/changelog (+7/-0)
debian/rules (+6/-0)
- Sergio Durigan Junior (community): Approve
- Canonical Server: Pending requested
-
Diff: 43 lines (+12/-1)3 files modifieddebian/changelog (+6/-0)
debian/control (+2/-1)
debian/rules (+4/-0)
tags: | added: server-todo |
Changed in mariadb-10.6 (Ubuntu Jammy): | |
assignee: | nobody → Andreas Hasenack (ahasenack) |
status: | New → Triaged |
importance: | Undecided → High |
Changed in systemd (Ubuntu Jammy): | |
assignee: | nobody → Andreas Hasenack (ahasenack) |
status: | New → In Progress |
importance: | Undecided → High |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in systemd (Ubuntu Jammy): | |
status: | In Progress → New |
assignee: | Andreas Hasenack (ahasenack) → nobody |
Changed in mariadb-10.6 (Ubuntu Jammy): | |
status: | Triaged → In Progress |
description: | updated |
description: | updated |
Ok, after some experimenting and code inspection, this is the summary.
This is the logic that decides if mariadb 10.6.7 will try to use uring or not, AT RUNTIME: use_native_ aio_default( ) uring_may_ be_unsafe= u.release;
bool innodb_
{
#ifdef HAVE_URING
utsname &u= uname_for_io_uring;
if (!uname(&u) && u.release[0] == '5' && u.release[1] == '.' &&
u.release[2] == '1' && u.release[3] >= '1' && u.release[3] <= '5' &&
u.release[4] == '.')
{
if (u.release[3] == '5') {
const char *s= strstr(u.version, "5.15.");
if (s || (s= strstr(u.release, "5.15.")))
if ((s[5] >= '3' || s[6] >= '0'))
return true; /* 5.15.3 and later should be fine */
}
io_
return false; /* working around io_uring hangs (MDEV-26674) */
}
#endif
return true;
}
As we can see, it depends on the RUNNING kernel, and details are in https:/ /jira.mariadb. org/browse/ MDEV-26674
The jammy kernel is 5.15.0, which makes the above return false, meaning io_uring is NOT used.
The focal kernel is 5.4.0, which makes the above return true, meaning io_uring IS USED.
That's why running the jammy build of mariadb on a focal kernel attempts to use io_uring. And this is what the current launchpad builders are: focal kernel, with chroots for the ubuntu release they are building for. And mariadbd is run in this env as part of the build-time tests.
Note the above is for RUNTIME decision: HAVE_URING is decided at build time, and is not related to this bug.
If uring is used, then the memlock limit comes into play: it indeed needs to be higher. stracing jammy's mariadb on the focal kernel shows that io_uring_setup() failed with ENOMEM.
Quick reproducer:
- create a focal VM
- confirm ulimit -l value in that VM. I've been getting the value "64", but anything lower than 256 will trigger the bug.
- inside that VM, create a jammy LXD
- iside that LXD, install mariadb-server. It will fail.