How to express memfd_create usage for anonymous files for huge pages correctly in apparmor?

Bug #2073214 reported by Tim Richardson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
apparmor (Ubuntu)
New
Undecided
Unassigned
libvirt (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

Today I upgraded from 22.04 to 24.04

10.0.0-2ubuntu8.2

I have a VM with 16GB of hugepages allocated to it.

This no longer works.

I can not defeat the virtual machine start up error (from virtual manager):

Error starting domain: internal error: QEMU unexpectedly closed the monitor (vm='ubuntu24.04'): 2024-07-16T04:54:51.443751Z qemu-system-x86_64: failed to resize memfd to 17179869184: Permission denied

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 72, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 108, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 57, in newfn
    ret = fn(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 1402, in startup
    self._backend.create()
  File "/usr/lib/python3/dist-packages/libvirt.py", line 1379, in create
    raise libvirtError('virDomainCreate() failed')
libvirt.libvirtError: internal error: QEMU unexpectedly closed the monitor (vm='ubuntu24.04'): 2024-07-16T04:54:51.443751Z qemu-system-x86_64: failed to resize memfd to 17179869184: Permission denied

I have done this:

sudo aa-complain /etc/apparmor.d/usr.lib.libvirt.virt-aa-helper
sudo aa-complain /etc/apparmor.d/usr.sbin.libvirtd

While I did not have to take any such steps with 22.04, I have mounted

sudo mount -t hugetlbfs -o mode=1770,gid=kvm none /dev/hugepages

and these permissions are observed:

root@black:/etc/apparmor.d# ls -ld /dev/hugepages
drwxrwx--t 3 root kvm 0 Jul 16 14:47 /dev/hugepages

root@black:/etc/apparmor.d# id libvirt-qemu
uid=64055(libvirt-qemu) gid=109(kvm) groups=109(kvm),139(libvirt),64055(libvirt-qemu),1002(hugetlb)

I have edited /etc/libvirt/qemu.conf to uncomment the mount point of the hugepages, even though this was not necessary with 22.04

log entries don't add anything (for me):

2024-07-16T04:54:51.443751Z qemu-system-x86_64: failed to resize memfd to 17179869184: Permission denied
2024-07-16 04:54:51.477+0000: shutting down, reason=failed

kernel stuff:

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-6.8.0-38-generic root=UUID=756dca1f-0eb5-440e-a84e-dfa56a27e355 ro quiet splash zswap.enabled=1 zswap.compressor=lz4 zswap.zpool=z3fold mitigations=off vt.handoff=7

summary: - hugespages causes permissions error
+ hugepages causes permissions error
description: updated
description: updated
Revision history for this message
Tim Richardson (tim-richardson) wrote : Re: hugepages causes permissions error

When I reboot I see a directory libvirt and then libvert/qemu

tim@black:/dev/hugepages$ tree -pug
[drwxrwx--t root hugetlb ] .
└── [drwxr-xr-x root root ] libvirt
    └── [drwxr-xr-x root root ] qemu

3 directories, 0 files

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Tim,
you are right that neither of the workaround steps were necessary on 22.04 and they should neither be on 24.04.

### Topic 1 - changes that should not be needed

Ubuntu 24.04 by default has

1. hugetlbfs mounted (was that no more the case for you after upgrade?)
$ mount | grep huge
hugetlbfs on /dev/hugepages type hugetlbfs (rw,nosuid,nodev,relatime,pagesize=2M)

2. uncommenting hugetlbfs_mount in /etc/libvirt/qemu.conf changes nothing
default is
#hugetlbfs_mount = "/dev/hugepages"
Changing that to
hugetlbfs_mount = "/dev/hugepages"
will not change anything.

I assume #1 and #2 have only been things on your try to debug what is wrong - which is fine.
But let us know if it is otherwise.

### Topic 2 - was there an apparmor denial?

Now back to your actual issue, you mentioned that you set apparmor to complain mode.
Was there an indication that apparmor is the issue, a related denial in the syslog when you start your guest? If so could you please pass that denial?

### Topic 3 - repro seems to not trigger this

I picked a guest which before had:
  <memory unit='MiB'>1024</memory>
  <currentMemory unit='MiB'>1024</currentMemory>

And changed it to use huge pages by adding:
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB'/>
    </hugepages>
  </memoryBacking>

To confirm it tries to use the right thing I started it without allocating huge pages and got:
=> qemu-system-x86_64: unable to map backing store for guest RAM: Cannot allocate memory

Then I allocated some huge pages in the system and started it again.
$ echo 512 | sudo tee -a /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
$ virsh start --console test

Worked right away.

I double checked in /proc/meminfo and saw HugePages_Free changing when the guest was running.
So I think, something is configured in an unexpected way.

Could you share
- your full guest-config in regard to memory
- hugeadm --list-all-mounts
- journalctl -f while starting the guest
- grep -i huge /proc/meminfo

Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Tim Richardson (tim-richardson) wrote : Re: [Bug 2073214] Re: hugepages causes permissions error
Download full text (5.5 KiB)

This is an invalid bug report.

But it might still be interesting because the problem seems to be some
change after the upgrade which is reserving some of the exactly 16GiB of
hugepages I reserved via sysctl.conf
I did not see this in 22.04.

So, I think the problem was the amount of memory I was requesting exceeded
free pages due to something else reserving some of the hugepage pool.

Firstly ...

#Topic 1: Unnecessary entry in fstab and /det/libvirt/qemu.conf edit are
now reverted
#Topic 2: there is no indication of apparmor problems, I was just
eliminating it.

guest config which fails to start: "FailConfig":

 <memory unit="KiB">16777216</memory>
  <currentMemory unit="KiB">16777216</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size="2048" unit="KiB"/>
    </hugepages>
    <source type="memfd"/>
    <access mode="shared"/>
  </memoryBacking>

guest config which worked "SucceedConfig":

<memory unit="KiB">8388608</memory>
  <currentMemory unit="KiB">8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size="2048" unit="KiB"/>
    </hugepages>
    <source type="memfd"/>
    <access mode="shared"/>
  </memoryBacking>

tim@black:~$ hugeadm --list-all-mounts
Mount Point Options
/dev/hugepages rw,relatime,gid=1002,mode=1770,pagesize=2M

Prior to starting the VM:

tim@black:~$ grep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 8192
HugePages_Free: 8180
HugePages_Rsvd: 61
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 16777216 kB

(I do not remember noticing the reservation of 61 pages in 22.04)

After starting with SucceedConfig:

tim@black:~$ grep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 8192
HugePages_Free: 4084
HugePages_Rsvd: 61
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 16777216 kB

So, back to FailConfig, but first increase number of pages

try:

tim@black:~$ echo 8500 | sudo tee -a
/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
8500
tim@black:~$ grep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 8500
HugePages_Free: 8488
HugePages_Rsvd: 61
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 17408000 kB

and start with "FailConfig"

Result: Boot is successful!

tim@black:~$ grep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 8500
HugePages_Free: 296
HugePages_Rsvd: 61
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 17408000 kB

Thank you for helping me. I am new to hugepages and without your excellent
reply I would not have had confidence to keep trying to fix the problem.
Hopefully this can serve as some documentation for other beginners.

On Tue, 16 Jul 2024 at 18:35, Christian Ehrhardt  <
<email address hidden>> wrote:

> Hi Tim,
> you are right that neither of the workaround steps were necessary on 22.04
> and they...

Read more...

Changed in libvirt (Ubuntu):
status: Incomplete → Invalid
summary: - hugepages causes permissions error
+ hugepages causes permissions error [invalid, page pool too small]
Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: hugepages causes permissions error [invalid, page pool too small]

Thank you for your reply and I'm glad I could help by providing some cross-checks and context.

You can try to see what allocates the huge pages (as I'm also curious now).
See https://unix.stackexchange.com/questions/167451/how-to-monitor-use-of-huge-pages-per-process

this will list you processes that map hugepages
And the path will include the PID which you can map back to the program using it.

Revision history for this message
Tim Richardson (tim-richardson) wrote :

I can't work out the process owning the reserved pages.

Changed in libvirt (Ubuntu):
status: Invalid → New
Revision history for this message
Tim Richardson (tim-richardson) wrote :

I want to reopen this. I think there is an apparmor problem.

Here is how to reproduce:

I made a new Ubuntu 24.04 desktop to an external drive on my AMD laptop.
I did all suggested updates, set up virt-manager and installed hugeadm

Then I download the image for the 24.04 server and installed it in a VM, I gave it 2GB ram.
Verify that it boots with default memory settings.

Using hugeadm, I create 4000 pages of 2MiB pages.

Then I edit the XML of the vm as above, including shared memory and <hugepages/>

It fails with the same permission error as reported initially.

Then I

sudo systemctl disable apparmor

and restart

Now the VM starts fine.

Enabling apparmor and restarting results again in a VM that will not start with hugepages.

Revision history for this message
Tim Richardson (tim-richardson) wrote :

(earlier I had disabled apparmor while investigating another problem but I forgot that I did this, sorry)

Revision history for this message
Tim Richardson (tim-richardson) wrote (last edit ): Re: hugepages causes permissions error [apparmor profile]

One workaround is to do

aa-complain /etc/apparmor.d/libvirt/libvirt-<UUID>

You may need to
touch /etc/apparmor.d/libvirt/libvirt-<UUID>.files

because the .files may not be present, it is created and removed dynamically by libvirt

Another workaround is to (accidentally) break the apparmor profile so it can't be correctly parsed. I believe that in this case, libvirt launches the VM anyway, but with no apparmor profile ... this is a bit sneaky.

So if you want to investigate apparmor, you have to see the libirt-<UUID> profile in aa-status. It defaults to enforce. If it's not there, fix the problem.

With aa-enforce on, vm launch fails but there is no logging anywhere I can find of a DENIED message.
So as an absolute apparmor beginner, I have no clues.

The best I can do is with strace
on the libvirtd process

root@elecgear:/home/tim# strace -f -p 4818 2>&1 | grep memfd
[pid 11307] memfd_create("test", MFD_CLOEXEC|MFD_ALLOW_SEALING) = 3
[pid 11307] memfd_create("test", MFD_CLOEXEC|MFD_HUGETLB) = 3
[pid 11307] memfd_create("memory-backend-memfd", MFD_CLOEXEC|MFD_ALLOW_SEALING|MFD_HUGETLB|21<<MFD_HUGE_SHIFT) = 20
[pid 11307] write(2, "failed to resize memfd to 214748"..., 55) = 55

summary: - hugepages causes permissions error [invalid, page pool too small]
+ hugepages causes permissions error [apparmor profile]
Revision history for this message
Tim Richardson (tim-richardson) wrote (last edit ):

after enabling audit.log I get this message when the profile is in enforce mode

type=AVC msg=audit(1721533808.319:1238): apparmor="DENIED" operation="truncate" class="file" profile="libvirt-465a6629-3167-43fc-93da-4c7ef837d863" name="/" pid=37291 comm="qemu-system-x86" requested_mask="w" denied_mask="w" fsuid=64055 ouid=64055^]FSUID="libvirt-qemu" OUID="libvirt-qemu"

therefore,adding

/ rw,

to the apparmor profile allows it to work.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.4 KiB)

Thank you Tim for continuing on this.

Thanks for all your efforts in comment #6.
I wonder how you can create a new 24.04 and trigger this while I can not.
At some point we need a third person to decide who of us has the uncommon case.

Thanks for tracking down the denial that you are seeing.
This is all bound to huge page operations and your strace shows it at memfd_create operations.

Those are in qemu.git/util/memfd.c and inspired by it I've written a test program

$ cat /home/paelzer/work/qemu/lp-2073214-hugepage-noble/test.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

// Create an anonymous file in memory with huge pages
int main() {
    off_t reqsize = (2*1024*1024);

    // Test flags one by one
    int mfd = memfd_create("test", MFD_CLOEXEC|MFD_ALLOW_SEALING);
    if (mfd >= 0)
        close(mfd);
    else
        return -1;
    mfd = memfd_create("test", MFD_CLOEXEC|MFD_HUGETLB);
    if (mfd >= 0)
        close(mfd);
    else
        return -1;

    int fd = memfd_create("memory-backend-memfd", MFD_CLOEXEC | MFD_HUGETLB | MFD_ALLOW_SEALING);
    if (fd == -1) {
        perror("memfd_create");
        exit(EXIT_FAILURE);
    }
    printf("Created\n");

    // Truncate it to the desired length
    if (ftruncate(fd, reqsize) == -1) {
        perror("truncate");
        goto err;
    }
    printf("Truncated\n");

    // Add seals
    if (fcntl(fd, F_ADD_SEALS, 1) == -1) {
        perror("seal");
        goto err;
    }
    printf("Added seal\n");

    // Close the file descriptor
    close(fd);
    printf("Closed\n");

    return 0;

err:
    close(fd);
    exit(EXIT_FAILURE);
}

Just like your more complex test with qemu, without an apparmor profile this works just fine.
(Interestingly even without allocating huge pages).

$ ./test
Created
Truncated
Added seal
closed

Running it with the default profile for guests (no custom paths needed) should trigger the same.
So I use this (one might need to adopt paths accordingly) profile.
Inspired by the files we usually generate for the guests:

$ sudo cat /etc/apparmor.d/home.paelzer.work.qemu.lp-2073214-hugepage-noble.test
# Last Modified: Mon Jul 22 10:23:53 2024
abi <abi/3.0>,

include <tunables/global>

/home/paelzer/work/qemu/lp-2073214-hugepage-noble/test flags=(attach_disconnected) {
  include <abstractions/libvirt-qemu>

  /home/paelzer/work/qemu/lp-2073214-hugepage-noble/test mr,

}

$ sudo apparmor_parser --replace /etc/apparmor.d/home.paelzer.work.qemu.lp-2073214-hugepage-noble.test

With that in place I can recreate the problem and the denial matches yours:

$ ./test
Created
truncate: Permission denied

[5365065.306262] audit: type=1400 audit(1721636961.319:9592): apparmor="DENIED" operation="truncate" class="file" profile="/home/paelzer/work/qemu/lp-2073214-hugepage-noble/test" name="/" pid=3113938 comm="test" requested_mask="w" denied_mask="w" fsuid=1000 ouid=1000

I think apparmor struggles to detect and mediate that path accordingly.
name="/" is only there due to a lack of something better.
If I'd not use attach_disconnected it would...

Read more...

Changed in libvirt (Ubuntu):
status: New → Incomplete
summary: - hugepages causes permissions error [apparmor profile]
+ How to express memfd_create usage for anonymous files for huge pages
+ correctly in apparmor?
Revision history for this message
Tim Richardson (tim-richardson) wrote : Re: [Bug 2073214] Re: hugepages causes permissions error [apparmor profile]
Download full text (5.6 KiB)

This works too, which is not quite as bad

owner / rw,

On Mon, 22 Jul 2024 at 18:55, Christian Ehrhardt  <
<email address hidden>> wrote:

> Thank you Tim for continuing on this.
>
> Thanks for all your efforts in comment #6.
> I wonder how you can create a new 24.04 and trigger this while I can not.
> At some point we need a third person to decide who of us has the uncommon
> case.
>
> Thanks for tracking down the denial that you are seeing.
> This is all bound to huge page operations and your strace shows it at
> memfd_create operations.
>
> Those are in qemu.git/util/memfd.c and inspired by it I've written a
> test program
>
> $ cat /home/paelzer/work/qemu/lp-2073214-hugepage-noble/test.c
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
>
> // Create an anonymous file in memory with huge pages
> int main() {
> off_t reqsize = (2*1024*1024);
>
> // Test flags one by one
> int mfd = memfd_create("test", MFD_CLOEXEC|MFD_ALLOW_SEALING);
> if (mfd >= 0)
> close(mfd);
> else
> return -1;
> mfd = memfd_create("test", MFD_CLOEXEC|MFD_HUGETLB);
> if (mfd >= 0)
> close(mfd);
> else
> return -1;
>
> int fd = memfd_create("memory-backend-memfd", MFD_CLOEXEC |
> MFD_HUGETLB | MFD_ALLOW_SEALING);
> if (fd == -1) {
> perror("memfd_create");
> exit(EXIT_FAILURE);
> }
> printf("Created\n");
>
> // Truncate it to the desired length
> if (ftruncate(fd, reqsize) == -1) {
> perror("truncate");
> goto err;
> }
> printf("Truncated\n");
>
> // Add seals
> if (fcntl(fd, F_ADD_SEALS, 1) == -1) {
> perror("seal");
> goto err;
> }
> printf("Added seal\n");
>
> // Close the file descriptor
> close(fd);
> printf("Closed\n");
>
> return 0;
>
> err:
> close(fd);
> exit(EXIT_FAILURE);
> }
>
> Just like your more complex test with qemu, without an apparmor profile
> this works just fine.
> (Interestingly even without allocating huge pages).
>
> $ ./test
> Created
> Truncated
> Added seal
> closed
>
>
>
> Running it with the default profile for guests (no custom paths needed)
> should trigger the same.
> So I use this (one might need to adopt paths accordingly) profile.
> Inspired by the files we usually generate for the guests:
>
> $ sudo cat
> /etc/apparmor.d/home.paelzer.work.qemu.lp-2073214-hugepage-noble.test
> # Last Modified: Mon Jul 22 10:23:53 2024
> abi <abi/3.0>,
>
> include <tunables/global>
>
> /home/paelzer/work/qemu/lp-2073214-hugepage-noble/test
> flags=(attach_disconnected) {
> include <abstractions/libvirt-qemu>
>
> /home/paelzer/work/qemu/lp-2073214-hugepage-noble/test mr,
>
> }
>
> $ sudo apparmor_parser --replace
> /etc/apparmor.d/home.paelzer.work.qemu.lp-2073214-hugepage-noble.test
>
> With that in place I can recreate the problem and the denial matches
> yours:
>
> $ ./test
> Created
> truncate: Permission denied
>
> [5365065.306262] audit: type=1400 audit(1721636961.319:9592):
> apparmor="DENIED" operation="...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.