libvirt profile is blocking global setrlimit despite having no rlimit rule

Bug #1679704 reported by Christian Ehrhardt 
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Critical
Canonical Security Team
apparmor (Ubuntu)
Fix Released
Critical
John Johansen

Bug Description

Hi,
while debugging bug 1678322 I was running along apparmor issues.
Thanks to jjohansen we debugged some of it and eventually I was asked to report to a bug.

Symptom:
[ 8976.950635] audit: type=1400 audit(1491310016.224:48): apparmor="DENIED" operation="setrlimit" profile="/usr/sbin/libvirtd" pid=10034 comm="libvirtd" rlimit=memlock value=1610612736

But none of the profiles has any rlimit statement in it:
$ grep -Hirn limit /etc/apparmor*
/etc/apparmor.d/sbin.dhclient:58: # such, if the dhclient3 daemon is subverted, this effectively limits it to
/etc/apparmor.d/abstractions/ubuntu-helpers:16:# Limitations:
/etc/apparmor.d/abstractions/ubuntu-helpers:64: # in limited libraries so glibc's secure execution should be enough to not
/etc/apparmor.d/cache/.features:13:rlimit {mask {cpu fsize data stack core rss nproc nofile memlock as locks sigpending msgqueue nice rtprio rttime

The profile contains a child profile which makes reading the dumps a bit painful, but I'll attach them anyway for you to take a look.
To "recreate" if needed check out bug 1678322 - TL;DR hot-add some VFs via libvirt.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The profiles and all the rest of the system is default zesty without modifications.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Error in iLO links to http://h17007.www1.hpe.com/docs/enterprise/servers/gen9/tsg/244937.htm
But since multiple systems trigger it I'd not say "hardware is physically damaged".

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Christian, which architecture is this? ISTR some arch having troubles with rlimit and I can't recall details now.

Thanks

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Seth,
so far confirmed on ppc64el and x86.
I haven't tried more, but usually after two it affects all of them.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Leveraging from the original bug this came from when debugging:

As a workaround for the case reported a user might set memtune options for the guest like this:
  <memtune>
    <hard_limit unit='KiB'>16961536</hard_limit>
    <soft_limit unit='KiB'>16961536</soft_limit>
  </memtune>

Needed numbers may vary depending on the case.
Ugly but a workaround at least.

This is still really awkward, at least we need to understand why it is even blocking when it should not.
If there is no fix that makes it "just work" I'm fine SRUing something into the libvirt/qemu profiles but we'd need to know what and so far we don't.

Changed in apparmor (Ubuntu):
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

For documentation purpose here an update.
I found that the last thing libvirt calls is "prlimit"

In glibc that is implemented as syscall prlimit64.
That in turn is on 64 bit:
#define __NR_prlimit64 302

According to the doc of prlimit it needs a capability:
To set or get the resources of a process other than itself, the caller must have
"the CAP_SYS_RESOURCE capability, or the real, effective, and saved set user IDs of the target process must match the real user ID of the caller and the real, effective, and saved set group IDs of the target process must match the real group ID of the caller."

But the profile already holds that with a suspicious comment above it matching my testcase:
  # Needed for vfio
  capability sys_resource,

Did something get more strict, maybe a mismatch on prlimit/setrlimit/syscall mapping here?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Also even when setting the profile to aa-complain I see:
[14406.210381] audit: type=1400 audit(1491482071.335:67): apparmor="ALLOWED" operation="setrlimit" profile="/usr/sbin/libvirtd" pid=7674 comm="libvirtd" rlimit=memlock value=2164260864

So far so good, but still the value is not raised.
As if the action never happened.

So on an ALLOWED setrlimit to pid 7674 the value afterwards is not the value set in the call.
Hrm - puzzled ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Very interesting disabling the profile completely via
 $ sudo aa-disable /usr/sbin/libvirtd

makes it working, so apparmor is involved in some way.
I'm still puzzled that the ALLOWED makes it a no-op still.

Anyway waiting for your reply - thanks a lot already jjohansen for the IRC discussions!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, by the recent insight this bug IS blocking the final resolution of bug 1678322.
I'll work on the other bits of that bug and we will see how this one here turns out.

Revision history for this message
John Johansen (jjohansen) wrote :

I have placed amd64 test kernels at
http://people.canonical.com/~jj/lp1679704/

It fixes the complain issue, which should let you proceed without removing the profile and I am working on a regression test to add to the test suite.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-04-13 21:07 EDT-------
Please, reverse mirror LP1679704 (libvirt profile is blocking global setrlimit despite having no rlimit rule).

tags: added: architecture-ppc64le bugnameltc-153457 severity-high targetmilestone-inin1704
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@JJohansen - for testing I'd need that for ppc64el if possible.
My x86 machines go often down due to FW bugs if testing these cases.
Any chance to build a test kernel for that arch?

Since you have a test kernel it seems you have found the issue.
What is the way of delivery for this - normal kernel updates or anything more special?
Also do you know what Ubuntu releases are affected?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Also updating the bug status to match current work.

Changed in apparmor (Ubuntu):
assignee: nobody → John Johansen (jjohansen)
status: New → In Progress
Revision history for this message
John Johansen (jjohansen) wrote :

Every release that supports prlimit is at least partially affected. However the xenial, yakkety, zesty releases that have support stacking code compound the issue.

I'll look into the ppc64el build, I'm sure its possible it just one that I have never done a test kernel for so I will have to learn the hoops for it.

The fix would be delivered via the normal kernel updates. Once I submit the patch it will have to
 wait for the start of the next cycle and then go through the 3 week SRU cycle without causing an issue that results in a revert.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - the remaining related rules that were blocking us are now SRU'ed.
For now I was verifying with manually increaseing the prlimit and things worked, therefore I assume that this bug over here is the remaining one for the overall case that was initially reported.

So I dupped the other bug onto this one to let the reporter be notified.

P.S. If we would have known in advance it would just have been another task on the other bug, but well we are always smarter after the fact.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-06-20 20:06 EDT-------
*** Bug 151486 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-04 13:56 EDT-------
Hi Christian,

Do you have any updates on this one?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

HI Lagarcia,
I came by on another activity again - but we have to ask @JJohansen what the status of this is.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-10-05 17:29 EDT-------
@JJohansen, any update on this one?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In testing newer virt stack I still hit this and need the workarounds to get it to work :-/
Any update and/or ETA on this?

Frank Heimes (fheimes)
tags: added: ppc64el
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We have another hit of this by memory hot plug (when locked I assume).
I asked the reporters to chime in here.

But even for the former case we had given the time we wait already I want to bump the prio.
This is really important to some use cases.

Changed in apparmor (Ubuntu):
importance: High → Critical
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
status: New → In Progress
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Security Team (canonical-security)
bugproxy (bugproxy)
tags: added: severity-critical
removed: severity-high
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: Test case of the mem hotplug in https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1755153/comments/7

Only triggers on powerpc as they lock some memory while doing so (x86 does not).

Revision history for this message
bugproxy (bugproxy) wrote : /sys/kernel/security/apparmor/policy/profiles/usr.sbin.libvirtd.13/raw_data

Default Comment by Bridge

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Example Deny:
[ 774.341606] audit: type=1400 audit(1522915593.238:42): apparmor="DENIED" operation="setrlimit" info="cap_sys_resource" error=-13 profile="/usr/sbin/libvirtd" pid=8376 comm="libvirtd" rlimit=memlock value=96468992 peer="libvirt-70a586a2-ef34-4954-91ea-9a6ecab52da3"

Source: libvirt
Target: qemu process libvirt-70a586a2-ef34-4954-91ea-9a6ecab52da3
Action: change rlimits

TL;DR to re-summarize:
- certain actions let libvirt change the rlimit of the qemu guest
  - such actions are memory hotplug on ppc
  - pci hotplug of some devices
- libvirtd apparmor profile allows cap_sys_resource
- there is no rlimit rule restricting that in the profile
- a bug in the kernel part of apparmor blocks this and breaks the use-case
- as prechecked by jjohansen he seems to have an idea how to fix (see comment #16)
  - but for yet unknown reasons activity fell silent since a few months
- finding that mem hotplug is also affected bumps the priority

Manoj Iyer (manjo)
tags: added: triage-a
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-04-09 00:43 EDT-------
Any updates here? on bionic we are blocked on memory hotplug testing with this bug.

Revision history for this message
John Johansen (jjohansen) wrote :

So I have been looking at this again, and have found a couple issues.

1. Where prlimit is concerned. AppArmor adds an addition restriction on when cap sys_resource is required. The CAP_SYS_RESOURCE capability is required if the target processes label does not match that of the caller.

Hence why libvirtd requires
  capability sys_resource,

in its profile.

The apparmor check should be broken down further as other ipc checks, but that should not have an effect here based on the peer= field. For the cap_sys_resource check the profile does not need an rlimit rule.

2. if stacking is used the denial message could be misleading as a denied message will be generated for each profile in the stack even if it was not a profile causing the denial.

In this case we should see duplicates of the above denial except with the profile= field changing. So one with libvurtd and another with some other process. With the currently available information this does not seem to be the problem.

3. The rawdata for the libvirtd profile does show the profile has CAP_SYS_RESOURCE permissions

4. It is the CAP_SYS_RESOURCE check that is failing. This check will only be triggered when using prlimit with a target having a different confinement than the setting task. Which is exactly what we see in the audit message.

There is a logic inversion bug in this path.

I have a test kernel building and will update when its ready

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-16 05:49 EDT-------
Any updates on this one? when are we getting this fixed?

Revision history for this message
Frank Heimes (fheimes) wrote :

A merge proposal to incl. the fixes was sent to the kernel-team.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Test kernel somewhere that supports PPC64?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Tested the interim version from [1]
TL;DR: with that it is working

base: 4.15.0-13
proposed fix: 4.15.0.16.17

## Base ##
$virsh attach-device cpaelzer-bionic hp512.xml
error: Failed to attach device from hp512.xml
error: cannot limit locked memory of process 10121 to 96468992: Permission denied

DMESG:
[1031564.759963] audit: type=1400 audit(1523946413.082:15731): apparmor="DENIED" operation="setrlimit" info="cap_sys_resource" error=-13 profile="/usr/sbin/libvirtd" pid=8376 comm="libvirtd" rlimit=memlock value=96468992 peer="libvirt-70a586a2-ef34-4954-91ea-9a6ecab52da3"
[1031564.760010] audit: type=1400 audit(1523946413.082:15732): apparmor="DENIED" operation="setrlimit" info="cap_sys_resource" error=-13 profile="/usr/sbin/libvirtd" pid=8376 comm="libvirtd" rlimit=memlock value=96468992 peer="libvirt-70a586a2-ef34-4954-91ea-9a6ecab52da3"

## proposed fixed kernel ##
$ virsh attach-device cpaelzer-bionic hp512.xml
Device attached successfully

No denies in log.
Guest log on attach:
[ 48.652358] pseries-hotplug-mem: Attempting to hot-add 2 LMB(s) at index 80000008
[ 48.652996] lpar: Attempting to resize HPT to shift 21
[ 48.771485] lpar: Hash collision while resizing HPT
[ 48.771491] Unable to resize hash page table to target order 21: -28
[ 48.785406] Built 1 zonelists, mobility grouping on. Total pages: 28174
[ 48.785409] Policy zone: Normal
[ 48.785951] lpar: Attempting to resize HPT to shift 21
[ 48.898213] lpar: Hash collision while resizing HPT
[ 48.898218] Unable to resize hash page table to target order 21: -28
[ 48.906304] pseries-hotplug-mem: Memory at 80000000 (drc index 80000008) was hot-added
[ 48.906305] pseries-hotplug-mem: Memory at 90000000 (drc index 80000009) was hot-added

[1]: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable/+packages

Frank Heimes (fheimes)
Changed in apparmor (Ubuntu):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Per bug 1763427 this is Fix released since 4.15.0-18.19

Changed in apparmor (Ubuntu):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.