apparmor_parser hangs indefinitely when called by multiple threads

Bug #1645037 reported by Christian Brauner
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
John Johansen
Yakkety
Won't Fix
Undecided
John Johansen
Zesty
Fix Released
Undecided
John Johansen

Bug Description

This bug surfaced when starting ~50 LXC container with LXD in parallel multiple times:

# Create the containers
for c in c foo{1..50}; do lxc launch images:ubuntu/xenial $c; done

# Exectute this loop multiple times until you observe errors.
for c in c foo{1..50}; do lxc restart $c & done

After this you can

ps aux | grep apparmor

and you should see output similar to:

root 19774 0.0 0.0 12524 1116 pts/1 S+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo30
root 19775 0.0 0.0 12524 1208 pts/1 S+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo26
root 19776 0.0 0.0 13592 3224 pts/1 D+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo30
root 19778 0.0 0.0 13592 3384 pts/1 D+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo26
root 19780 0.0 0.0 12524 1208 pts/1 S+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo43
root 19782 0.0 0.0 12524 1208 pts/1 S+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo34
root 19783 0.0 0.0 13592 3388 pts/1 D+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo43
root 19784 0.0 0.0 13592 3252 pts/1 D+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo34
root 19794 0.0 0.0 12524 1208 pts/1 S+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo25
root 19795 0.0 0.0 13592 3256 pts/1 D+ 20:14 0:00 apparmor_parser -RWL /var/lib/lxd/security/apparmor/cache /var/lib/lxd/security/apparmor/profiles/lxd-foo25

apparmor_parser remains stuck even after all LXC/LXD commands have exited.

dmesg output yields lines like:

[41902.815174] audit: type=1400 audit(1480191089.678:43): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-foo30_</var/lib/lxd>" pid=12545 comm="apparmor_parser"

and cat /proc/12545/stack shows:

[<ffffffff8c9b9378>] aa_remove_profiles+0x88/0x270
21:19  brauner  [<ffffffff8c9ac3e4>] profile_remove+0x144/0x2e0
21:19  brauner  [<ffffffff8c8319b8>] __vfs_write+0x18/0x40
21:19  brauner  [<ffffffff8c832108>] vfs_write+0xb8/0x1b0
21:19  brauner  [<ffffffff8c833565>] SyS_write+0x55/0xc0
21:19  brauner  [<ffffffff8ce952f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
21:19  brauner  [<ffffffffffffffff>] 0xffffffffffffffff

This looks like a potential kernel bug.

Revision history for this message
Christian Brauner (cbrauner) wrote :
description: updated
Revision history for this message
Christian Brauner (cbrauner) wrote :

Note that due to a race-condition in low-level LXC that will deadlock when trying to restart a bunch of containers in parallel, you should install the LXC PPA development version when trying to reproduce this bug with LXD.

Revision history for this message
Stéphane Graber (stgraber) wrote :

This has been confirmed to affect both the 4.4 and 4.8 kernels.

affects: apparmor → apparmor (Ubuntu)
Changed in linux (Ubuntu Xenial):
status: New → Triaged
Changed in linux (Ubuntu Yakkety):
status: New → Triaged
Changed in linux (Ubuntu Zesty):
status: New → Triaged
no longer affects: apparmor (Ubuntu Xenial)
no longer affects: apparmor (Ubuntu Yakkety)
no longer affects: apparmor (Ubuntu Zesty)
Changed in apparmor (Ubuntu):
status: New → Triaged
assignee: nobody → John Johansen (jjohansen)
Revision history for this message
Stéphane Graber (stgraber) wrote :

Christian will be testing 4.4.0-45 to see if we hit this issue with a pre-aastacking kernel.

Revision history for this message
Christian Brauner (cbrauner) wrote :

This does not seem to be reproducible on a 4.4.0-45 kernel without AppArmor stacking support.

Changed in linux (Ubuntu Xenial):
assignee: nobody → John Johansen (jjohansen)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → John Johansen (jjohansen)
Changed in linux (Ubuntu Zesty):
assignee: nobody → John Johansen (jjohansen)
status: Triaged → In Progress
Changed in linux (Ubuntu Yakkety):
status: Triaged → In Progress
Changed in linux (Ubuntu Xenial):
status: Triaged → In Progress
Revision history for this message
John Johansen (jjohansen) wrote :

How reliable/repeatable is this for you?

I have been hammering a machine for multiple days and not been able to trip this once.

I have been using the 4.8 ubuntu kernel the ubuntu-lxc/daily and the ubuntu-lxc/stable ppas. Any more info you can provide?

Revision history for this message
Christian Brauner (cbrauner) wrote : Re: [Bug 1645037] Re: apparmor_parser hangs indefinitely when called by multiple threads

On Sat, Dec 03, 2016 at 12:58:54PM -0000, John Johansen wrote:
> How reliable/repeatable is this for you?
>
> I have been hammering a machine for multiple days and not been able to
> trip this once.
>
> I have been using the 4.8 ubuntu kernel the ubuntu-lxc/daily and the
> ubuntu-lxc/stable ppas. Any more info you can provide?

I could reproduce it quite reliably. The trick is to have concurrent restarts
just in the loop I showed. I'll try to verify again in the next few days.
Did you observe any hanging lxc restart commands or multiple hanging LXD
processes?

Revision history for this message
John Johansen (jjohansen) wrote :

No, I haven't. I have been using the instructions you provided with no success. I have started some tests doing lower level direct calls of replace and reload so that I can have even more concurrency.

Revision history for this message
John Johansen (jjohansen) wrote :

I think I may have replicated, in that I got log entries with task blocked for more than 120 seconds, very similar to the above logs. And the apparmor_parser could running ps on the system did show several apparmor_parsers waiting. However it did not crash nor did the apparmor_parser instances hang for ever, it all eventually cleared up.

To replicate I overloaded the system spawning 1000 apparmor_parsers loading/replacing profiles and 1000 apparmor_parsers removing profiles. This resulted in each parser competing for the policy load mutex lock, that causes all loads and replaces to be serialized. With the system under very high load several processes even after obtaining the policy mutex would be slept waiting on the memory subsystem and oom killer.

This isn't an exact parallel as I didn't cause it to create namespaces etc, I am now planning to do that as another round of testing.

Revision history for this message
John Johansen (jjohansen) wrote :

I have fully replicated this with just the apparmor_parser, and bash. It requires using both the fs based namespace mkdir/rmdir namespace interface and regular profile replacement/removal at the same time.

Revision history for this message
John Johansen (jjohansen) wrote :

Christian,

could you please try against my test kernel? It has fixed the issue with my local reproducer

The packages are in
http://people.canonical.com/~jj/linux+jj/

you can probably get away with just installing linux-image-4.8.0-30-generic_4.8.0-30.32+lp1645037_amd64.deb but the other packages are available if needed.

Revision history for this message
Christian Brauner (cbrauner) wrote :

On Thu, Dec 08, 2016 at 11:37:52AM -0000, John Johansen wrote:
> Christian,
>
> could you please try against my test kernel? It has fixed the issue with
> my local reproducer

Sure, I'm currently testing!

Thanks!
Christian

Revision history for this message
Christian Brauner (cbrauner) wrote :

On Thu, Dec 08, 2016 at 03:28:46PM +0100, Christian Brauner wrote:
> On Thu, Dec 08, 2016 at 11:37:52AM -0000, John Johansen wrote:
> > Christian,
> >
> > could you please try against my test kernel? It has fixed the issue with
> > my local reproducer
>
> Sure, I'm currently testing!

Hi John,

this looks good. So far I was not able to reproduce hanging AppArmor parsers.

Christian

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-8.10

---------------
linux (4.10.0-8.10) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1664217

  * [Hyper-V] Bug fixes for storvsc (tagged queuing, error conditions)
    (LP: #1663687)
    - scsi: storvsc: Enable tracking of queue depth
    - scsi: storvsc: Remove the restriction on max segment size
    - scsi: storvsc: Enable multi-queue support
    - scsi: storvsc: use tagged SRB requests if supported by the device
    - scsi: storvsc: properly handle SRB_ERROR when sense message is present
    - scsi: storvsc: properly set residual data length on errors

  * Ubuntu16.10-KVM:Big configuration with multiple guests running SRIOV VFs
    caused KVM host hung and all KVM guests down. (LP: #1651248)
    - KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECT
    - KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter
    - KVM: PPC: Book 3S: XICS: Fix potential issue with duplicate IRQ resends
    - KVM: PPC: Book 3S: XICS: Implement ICS P/Q states
    - KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend

  * overlay: mkdir fails if directory exists in lowerdir in a user namespace
    (LP: #1531747)
    - SAUCE: overlayfs: Skip permission checking for trusted.overlayfs.* xattrs

  * CVE-2016-1575 (LP: #1534961)
    - SAUCE: overlayfs: Skip permission checking for trusted.overlayfs.* xattrs

  * CVE-2016-1576 (LP: #1535150)
    - SAUCE: overlayfs: Skip permission checking for trusted.overlayfs.* xattrs

  * Miscellaneous Ubuntu changes
    - SAUCE: md/raid6 algorithms: scale test duration for speedier boots
    - SAUCE: Import aufs driver
    - d-i: Build message-modules udeb for arm64
    - rebase to v4.10-rc8

  * Miscellaneous upstream changes
    - Revert "UBUNTU: SAUCE: aufs -- remove .readlink assignment"
    - Revert "UBUNTU: SAUCE: (no-up) aufs: for v4.9-rc1, support setattr_prepare()"
    - Revert "UBUNTU: SAUCE: aufs -- Add flags argument to aufs_rename()"
    - Revert "UBUNTU: SAUCE: aufs -- Convert to use xattr handlers"
    - Revert "UBUNTU: SAUCE: Import aufs driver"

  [ Upstream Kernel Changes ]

  * rebase to v4.10-rc8

 -- Tim Gardner <email address hidden> Mon, 06 Feb 2017 08:34:24 -0700

Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-yakkety
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.0 KiB)

This bug was fixed in the package linux - 4.8.0-40.43

---------------
linux (4.8.0-40.43) yakkety; urgency=low

  * linux: 4.8.0-40.43 -proposed tracker (LP: #1667066)

  [ Andy Whitcroft ]
  * NFS client : permission denied when trying to access subshare, since kernel
    4.4.0-31 (LP: #1649292)
    - fs: Better permission checking for submounts

  * shaking screen (LP: #1651981)
    - drm/radeon: drop verde dpm quirks

  * [0bda:0328] Card reader failed after S3 (LP: #1664809)
    - usb: hub: Wait for connection to be reestablished after port reset

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * In Ubuntu 17.04 : after reboot getting message in console like Unable to
    open file: /etc/keys/x509_ima.der (-2) (LP: #1656908)
    - SAUCE: ima: Downgrade error to warning

  * 16.04.2: Extra patches for POWER9 (LP: #1664564)
    - powerpc/mm: Fix no execute fault handling on pre-POWER5
    - powerpc/mm: Fix spurrious segfaults on radix with autonuma

  * ibmvscsis: Add SGL LIMIT (LP: #1662551)
    - ibmvscsis: Add SGL limit

  * [Hyper-V] Bug fixes for storvsc (tagged queuing, error conditions)
    (LP: #1663687)
    - scsi: storvsc: Enable tracking of queue depth
    - scsi: storvsc: Remove the restriction on max segment size
    - scsi: storvsc: Enable multi-queue support
    - scsi: storvsc: use tagged SRB requests if supported by the device
    - scsi: storvsc: properly handle SRB_ERROR when sense message is present
    - scsi: storvsc: properly set residual data length on errors

  * Ubuntu16.10-KVM:Big configuration with multiple guests running SRIOV VFs
    caused KVM host hung and all KVM guests down. (LP: #1651248)
    - KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECT
    - KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter
    - KVM: PPC: Book 3S: XICS: Fix potential issue with duplicate IRQ resends
    - KVM: PPC: Book 3S: XICS: Implement ICS P/Q states
    - KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend

  * ISST-LTE:pNV: ppc64_cpu command is hung w HDs, SSDs and NVMe (LP: #1662666)
    - blk-mq: Avoid memory reclaim when remapping queues
    - blk-mq: Fix failed allocation path when mapping queues
    - blk-mq: Always schedule hctx->next_cpu

  * systemd-udevd hung in blk_mq_freeze_queue_wait testing unpartitioned NVMe
    drive (LP: #1662673)
    - percpu-refcount: fix reference leak during percpu-atomic transition

  * [Yakkety SRU] Enable KEXEC support in ARM64 kernel (LP: #1662554)
    - [Config] Enable KEXEC support in ARM64.

  * [Hyper-V] Fix ring buffer handling to avoid host throttling (LP: #1661430)
    - Drivers: hv: vmbus: On write cleanup the logic to interrupt the host
    - Drivers: hv: vmbus: On the read path cleanup the logic to interrupt the host
    - Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()

  * brd module compiled as built-in (LP: #1593293)
    - CONFIG_BLK_DEV_RAM=m

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in compla...

Read more...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.5 KiB)

This bug was fixed in the package linux - 4.4.0-65.86

---------------
linux (4.4.0-65.86) xenial; urgency=low

  * linux: 4.4.0-65.86 -proposed tracker (LP: #1667052)

  [ Stefan Bader ]
  * Upgrade Redpine RS9113 driver to support AP mode (LP: #1665211)
    - SAUCE: Redpine driver to support Host AP mode

  * NFS client : permission denied when trying to access subshare, since kernel
    4.4.0-31 (LP: #1649292)
    - fs: Better permission checking for submounts

  * [Hyper-V] SAUCE: pci-hyperv fixes for SR-IOV on Azure (LP: #1665097)
    - SAUCE: PCI: hv: Fix wslot_to_devfn() to fix warnings on device removal
    - SAUCE: pci-hyperv: properly handle pci bus remove
    - SAUCE: pci-hyperv: lock pci bus on device eject

  * [Hyper-V/Azure] Please include Mellanox OFED drivers in Azure kernel and
    image (LP: #1650058)
    - net/mlx4_en: Fix bad WQE issue
    - net/mlx4_core: Fix racy CQ (Completion Queue) free
    - net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT
      transitions
    - net/mlx4_core: Avoid command timeouts during VF driver device shutdown

  * Xenial update to v4.4.49 stable release (LP: #1664960)
    - ARC: [arcompact] brown paper bag bug in unaligned access delay slot fixup
    - selinux: fix off-by-one in setprocattr
    - Revert "x86/ioapic: Restore IO-APIC irq_chip retrigger callback"
    - cpumask: use nr_cpumask_bits for parsing functions
    - hns: avoid stack overflow with CONFIG_KASAN
    - ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write
    - target: Don't BUG_ON during NodeACL dynamic -> explicit conversion
    - target: Use correct SCSI status during EXTENDED_COPY exception
    - target: Fix early transport_generic_handle_tmr abort scenario
    - target: Fix COMPARE_AND_WRITE ref leak for non GOOD status
    - ARM: 8642/1: LPAE: catch pending imprecise abort on unmask
    - mac80211: Fix adding of mesh vendor IEs
    - netvsc: Set maximum GSO size in the right place
    - scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed
      send
    - scsi: aacraid: Fix INTx/MSI-x issue with older controllers
    - scsi: mpt3sas: disable ASPM for MPI2 controllers
    - xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()
    - ALSA: seq: Fix race at creating a queue
    - ALSA: seq: Don't handle loop timeout at snd_seq_pool_done()
    - drm/i915: fix use-after-free in page_flip_completed()
    - Linux 4.4.49

  * NFS client : kernel 4.4.0-57 crash with nfsv4 enries in /etc/fstab
    (LP: #1650336)
    - SUNRPC: fix refcounting problems with auth_gss messages.

  * [0bda:0328] Card reader failed after S3 (LP: #1664809)
    - usb: hub: Wait for connection to be reestablished after port reset

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * ibmvscsis: Add SGL LIMIT (LP: #1662551)
    - ibmvscsis: Add SGL limit

  * [Hyper-V] Bug fixes for storvsc (tagged queuing, error conditions)
    (LP: #1663687)
    - scsi: storvsc: Enable tracking of queue depth
    - scsi: storvsc: Remove the ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Stefan Bader (smb) wrote :

Not fixed because we had to revert the commits due to various regressions.

Changed in linux (Ubuntu Xenial):
status: Fix Released → Triaged
Changed in linux (Ubuntu Yakkety):
status: Fix Released → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.8.0-45.48

---------------
linux (4.8.0-45.48) yakkety; urgency=low

  * CVE-2017-7184
    - xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
    - xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder

 -- Stefan Bader <email address hidden> Fri, 24 Mar 2017 12:03:39 +0100

Changed in linux (Ubuntu Yakkety):
status: Triaged → Fix Released
Stefan Bader (smb)
Changed in linux (Ubuntu Yakkety):
status: Fix Released → Triaged
Revision history for this message
Andy Whitcroft (apw) wrote : Closing unsupported series nomination.

This bug was nominated against a series that is no longer supported, ie yakkety. The bug task representing the yakkety nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Yakkety):
status: Triaged → Won't Fix
no longer affects: apparmor (Ubuntu)
no longer affects: linux (Ubuntu Xenial)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.