Ceph-osd fails to start with "error while loading shared libraries: cannot make segment writable for relocation: Permission denied"

Bug #1917414 reported by Insanemal
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
Ceph Daemons will not start on arm64

[Test Case]
Install ceph on arm64 based servers
Daemons will fail to startup with the error message as recorded in the original bug report

[What might got wrong]
Use of the ISA-L Erasure Coding library was enabled for ARM64 in a point release of octopus (15.2.8). Ceph daemons make use of MemoryDenyWriteExecute=true in systemd configurations and the ISA-L support for ARM64 uses some text relocation which breaks under this configuration. Fix was picked from the ISA-L code base.

[Original bug report]
OSD's fail to start with "error while loading shared libraries: cannot make segment writable for relocation: Permission denied"

The specific library is libec_isa.so

And it appears to be while the OSD is starting the Jerasure modules.

I'm going to assume its not compiled with no-PIC which might be a hold-over from previous releases?

Hardware is Rasberry PI 4 4GB
Ubuntu is 20.04 LTS downloaded on the 1/3/2021 (Or for US date format 3/1/2021)

Package version is: 15.2.8-0ubuntu0.20.04.1

Bug reporter wouldn't let me select https://launchpad.net/ubuntu/focal/arm64/ceph-osd/15.2.8-0ubuntu0.20.04.1 as the package

CVE References

Revision history for this message
Insanemal (insanemal) wrote :

Hey can I provide any more info or anything to get this looked at sooner? I've got a mixed x86_64, arm64 cluster and I currently can't use any of my arm nodes. I'm happy to help however I can.

Revision history for this message
Yash (ya5h-linux) wrote :

I am facing the same error with ceph-mon, while running Ubuntu Focal (20.04) pre-installed riscv64 server img on QEMU.
I am manually deploying ceph by following the steps provided on the official site[0].

When executing this cmd:
$ sudo systemctl start ceph-mon@node1
I am getting the below error:

Mar 04 06:35:21 ubuntu systemd[1]: Started Ceph cluster monitor daemon.
Mar 04 06:35:21 ubuntu ceph-mon[5357]: /usr/bin/ceph-mon: error while loading shared libraries: cannot make segment writable for relocation: Operation not permitted
Mar 04 06:35:21 ubuntu systemd[1]: <email address hidden>: Main process exited, code=exited, status=127/n/a
Mar 04 06:35:21 ubuntu systemd[1]: <email address hidden>: Failed with result 'exit-code'.

How were you able to track down that the problem is with "libec_isa.so" in your case?

[0]: https://docs.ceph.com/en/latest/install/manual-deployment/

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ceph (Ubuntu):
status: New → Confirmed
Revision history for this message
Insanemal (insanemal) wrote :

Hi,

In my case it was in the ceph-osd.0.log

I should paste the log line. Let me just log into the rpi, assuming it's still on.

2021-03-01T15:01:44.708+0000 ffffa248a040 -1 load: jerasure load: lrc load dlopen(/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_isa.so): /usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_isa.so: cannot make segment writable for relocation: Operation not permitted

Sorry I should have posted the full log line. I did assume it would cause issues with other libs that were compiled without no-PIC, assuming that is the issue, and it looks like I might have been right.

I didn't need a mon as this is a OSD node, my mon's are currently located on faster x86_64 nodes. But it's interesting to see that it causes issues with MON and OSD services.

Revision history for this message
Insanemal (insanemal) wrote :

Also if you check your ceph-mon.X.log under /var/log/ceph it should have a more detailed error message about what it was doing when it hit the permission error. Unless its actually ceph-mon it self and not an external lib causing the issue.

Anyway it's worth a look.

Revision history for this message
AC (azurecomet) wrote :

So I'm running into the same issue as well. It was working a few days ago even when I restarted my rpi 4 8gb also running ubuntu 20.04.
I upgraded today and amongst the packages was ceph-osd. Then the rpi stopped working after a reboot. The last time the pi was upgraded was 2 weeks ago.
This is still just a test cluster so I'm more than happy to do any testing to figure out what's going on here.

2021-03-05T21:15:56.786-0800 ffff8d008040 0 set uid:gid to 64045:167 (ceph:ceph)
2021-03-05T21:15:56.786-0800 ffff8d008040 0 ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process ceph-osd, pid 3300
2021-03-05T21:15:56.786-0800 ffff8d008040 0 pidfile_write: ignore empty --pid-file
2021-03-05T21:15:56.790-0800 ffff8d008040 1 bdev create path /var/lib/ceph/osd/ceph-3/block type kernel
2021-03-05T21:15:56.790-0800 ffff8d008040 1 bdev(0xaaab170f4000 /var/lib/ceph/osd/ceph-3/block) open path /var/lib/ceph/osd/ceph-3/block
2021-03-05T21:15:56.790-0800 ffff8d008040 1 bdev(0xaaab170f4000 /var/lib/ceph/osd/ceph-3/block) open size 8001561821184 (0x74702400000, 7.3 TiB) block_size 4096 (4 KiB) rotational discard not supported
2021-03-05T21:15:56.790-0800 ffff8d008040 1 bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2021-03-05T21:15:56.790-0800 ffff8d008040 1 bdev create path /var/lib/ceph/osd/ceph-3/block type kernel
2021-03-05T21:15:56.790-0800 ffff8d008040 1 bdev(0xaaab170f4380 /var/lib/ceph/osd/ceph-3/block) open path /var/lib/ceph/osd/ceph-3/block
2021-03-05T21:15:56.794-0800 ffff8d008040 1 bdev(0xaaab170f4380 /var/lib/ceph/osd/ceph-3/block) open size 8001561821184 (0x74702400000, 7.3 TiB) block_size 4096 (4 KiB) rotational discard not supported
2021-03-05T21:15:56.794-0800 ffff8d008040 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-3/block size 7.3 TiB
2021-03-05T21:15:56.794-0800 ffff8d008040 1 bdev(0xaaab170f4380 /var/lib/ceph/osd/ceph-3/block) close
2021-03-05T21:15:57.082-0800 ffff8d008040 1 bdev(0xaaab170f4000 /var/lib/ceph/osd/ceph-3/block) close
2021-03-05T21:15:57.330-0800 ffff8d008040 0 starting osd.3 osd_data /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
2021-03-05T21:15:57.330-0800 ffff8d008040 -1 Falling back to public interface
2021-03-05T21:15:57.350-0800 ffff8d008040 -1 load: jerasure load: lrc load dlopen(/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_isa.so): /usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_isa.so: cannot make segment writable for relocation: Operation not permitted

Revision history for this message
AC (azurecomet) wrote :

A quick update:

For the hell of it I tried to downgrade to 15.2.7-0ubuntu0.20.04.2 instead. The osd all came back online once it was downgraded. Tested a reboot and it was fine as well.

Revision history for this message
Insanemal (insanemal) wrote :

Can confirm downgrade to 15.2.7-0ubuntu0.20.04.2 has got my node working.

Thanks for the heads up.

Obviously this is only a band-aid.

Revision history for this message
Yash (ya5h-linux) wrote :

I was able to get past this issue.
In my case, the ceph-mon was invoked by systemctl service file. So, I had to change the below property under the "[Service]" node of file: /lib/systemd/system/ceph-mon@.service

- MemoryDenyWriteExecute=true
+ MemoryDenyWriteExecute=false

As per the system manpage[0]:

MemoryDenyWriteExecute=
Takes a boolean argument. If set, attempts to create memory mappings that are writable and executable at the same time, or to change existing memory mappings to become executable, or mapping shared memory segments as executable are prohibited. Specifically, a system call filter is added that rejects mmap(2) system calls with both PROT_EXEC and PROT_WRITE set, mprotect(2) or pkey_mprotect(2) system calls with PROT_EXEC set and shmat(2) system calls with SHM_EXEC set. Note that this option is incompatible with programs and libraries that generate program code dynamically at runtime, including JIT execution engines, executable stacks, and code "trampoline" feature of various C compilers.

This particular property was preventing the ceph-mon executable to run due to the above prohibitions.
After making the above change, I had to run
$ systemctl daemon-reload
And then restarting the service worked for me!

[0]: https://www.freedesktop.org/software/systemd/man/systemd.exec.html

Revision history for this message
James Page (james-page) wrote :

15.2.8 appears to have enabled some EC features under ARM:

https://paste.ubuntu.com/p/YWYZrSGPhq/

however I see one of the architectures listed here is RISC so that might not be the cause.

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

I've picked the upstream fix into isa-l into the ceph packages for focal - this probably impacts through to the current snapshot Ubuntu has in hirsute development as well.

Test packages will take a day or so to build here:

  https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3535

Revision history for this message
James Page (james-page) wrote :

Packages have completed build for all architectures in the PPA referenced in #13 - I would appreciate it if one of the bug reporters impacted by this issue on ARM64 could test and confirm whether this resolves the issue they encountered on upgrade to 15.2.8

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 16.1.0-0ubuntu3

---------------
ceph (16.1.0-0ubuntu3) hirsute; urgency=medium

  * d/p/issue49494.patch: Cherry pick fix for issue with preprocessor
    logic which causes backport failures to focal.
  * d/p/bug1917414.patch: Cherry pick fix to isa-l to remove use of text
    relocation calls which cause ceph-osd and ceph-mon daemons to fail
    to start (LP: #1917414).

 -- James Page <email address hidden> Mon, 15 Mar 2021 08:26:01 +0000

Changed in ceph (Ubuntu Hirsute):
status: Confirmed → Fix Released
James Page (james-page)
description: updated
Revision history for this message
Andreas Elvers (itsafire1) wrote :

I suffer from the same problem. Upgraded ARM OSDs not starting up anymore. I am on Ubuntu Bionic, Ceph packages are on 15.2.10. Packages in #13 are too old for trying them.

Revision history for this message
James Page (james-page) wrote :

@itsafire1 - I'm guessing you are using the upstream Ceph project published packages - they will have this issue as the isa-l submodule has not been updated to pickup the required fixes.

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Insanemal, or anyone else affected,

Accepted ceph into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/15.2.11-0ubuntu0.20.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Groovy):
status: New → Fix Committed
tags: added: verification-needed verification-needed-groovy
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Hello Insanemal, or anyone else affected,

Accepted ceph into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/15.2.11-0ubuntu0.20.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

I confirm that focal-proposed is fixed. I've validated this by deploying a Juju OpenStack bundle on focal arm64 using `distro-proposed` as source/openstack-origin. The issue I was seeing in the past (`/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_isa.so` that could not be loaded, leading to the ceph-mon charm being stuck 'executing') has now vanished. That setup is running ceph-common 15.2.11-0ubuntu0.20.04.1

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Insanemal, or anyone else affected,

Accepted ceph into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/15.2.11-0ubuntu0.20.10.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed-focal
removed: verification-done-focal
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Insanemal, or anyone else affected,

Accepted ceph into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/15.2.11-0ubuntu0.20.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

I validated focal already, see comment #20

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Oh I see, I validated 15.2.11-0ubuntu0.20.04.1 but now I need to validate 15.2.11-0ubuntu0.20.04.2, on it.

tags: added: verification-needed-focal
removed: verification-done-focal
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

I confirm that focal-proposed is fixed. I've validated this by deploying a Juju OpenStack bundle on focal arm64 using `distro-proposed` as source/openstack-origin. The issue I was seeing in the past (`/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_isa.so` that could not be loaded, leading to the ceph-mon charm being stuck 'executing') has now vanished. That setup is running ceph-common 15.2.11-0ubuntu0.20.04.2

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

I confirm that groovy-proposed is fixed. I've validated this by deploying a Juju OpenStack bundle on focal arm64 using `distro-proposed` as source/openstack-origin. The issue I was seeing in the past (`/usr/lib/aarch64-linux-gnu/ceph/erasure-code/libec_isa.so` that could not be loaded, leading to the ceph-mon charm being stuck 'executing') has now vanished. That setup is running ceph-common 15.2.11-0ubuntu0.20.10.2

tags: added: verification-done-groovy
removed: verification-needed-groovy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 15.2.11-0ubuntu0.20.10.2

---------------
ceph (15.2.11-0ubuntu0.20.10.2) groovy; urgency=medium

  * d/p/bug1914584.patch: Drop as this patch does not fix the
    actual issue.

ceph (15.2.11-0ubuntu0.20.10.1) groovy; urgency=high

  [ James Page ]
  * d/p/bug1917414.patch: Cherry pick fix to isa-l to remove use of text
    relocation calls which cause ceph-osd and ceph-mon daemons to fail
    to start (LP: #1917414).

  [ Chris MacNaughton ]
  * d/p/bug1914584.patch: Improve rgw diagnostic when reusing email
    (LP: #1914584).

  [ James Page ]
  * SECURITY UPDATE: New upstream stable point release (LP: #1921349).
    - CVE-2021-20288
    - d/p/bug1911900-fix-scrub-blocking-balancer.patch:
      Drop, included in release.
    - d/p/32bit-fixes.patch: Update for mismatched size_t/uint64_t on
      armhf causing compilation failure.

 -- James Page <email address hidden> Fri, 30 Apr 2021 12:10:45 +0100

Changed in ceph (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 15.2.11-0ubuntu0.20.04.2

---------------
ceph (15.2.11-0ubuntu0.20.04.2) focal; urgency=medium

  * d/p/bug1914584.patch: Drop as this patch does not fix the actual
    issue.

ceph (15.2.11-0ubuntu0.20.04.1) focal; urgency=medium

  [ James Page ]
  * d/p/bug1917414.patch: Cherry pick fix to isa-l to remove use of
    text relocation calls which cause ceph-osd and ceph-mon daemons to
    fail to start on aarch64 (LP: #1917414).

  [ Chris MacNaughton ]
  * d/p/bug1914584.patch: Improve rgw diagnostic when reusing email
    (LP: #1914584).

  [ James Page ]
  * SECURITY UPDATE: New upstream stable point release (LP: #1921349):
    - CVE-2021-20288
    - d/p/bug1911900-fix-scrub-blocking-balancer.patch:
      Drop, included in release.
    - d/p/32bit-fixes.patch: Resolve compilation failure on armhf due to
      mismatched size_t/uint64_t types.

 -- James Page <email address hidden> Fri, 30 Apr 2021 12:13:27 +0100

Changed in ceph (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Daniel WIer (greatwhitedan) wrote :

I have run into this exact same issue in 16.2.6 and 16.2.7 on Ubuntu

Changing MemoryDenyWriteExecute to false allows the OSD to start on rPi 4 ARM64 systems.

Kernel version 5.4.0-1045-raspi
Ubuntu 20.04.1 LTS

Just wanted to let you know that those packages have not been patched.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.