ubuntu_qrt_apparmor will hang with Bionic kernel on KVM / Azure instance (mmap test timeout)

Bug #1783922 reported by Po-Hsu Lin on 2018-07-27
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QA Regression Testing
linux (Ubuntu)

Bug Description

This test will timeout on a KVM instance (1G ram) with Bionic kernel.

This is the last test executed on this system:
Jul 26 10:10:54 zeppo kernel: [ 2140.591600] audit: type=1400 audit(1532599854.772:275036): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/tmp/testlibJOdhfr/source/bionic/apparmor-2.12/tests/regression/apparmor/mmap" pid=24072 comm="apparmor_parser"

  Run parser regression tests ... (Applying patch 0001-mount-regression-test-convert-mount-test-to-use-MS_N.patch) (Applying patch utils-fix-interpreter-testcase-for-multiple-symlinks.patch) ok
  test_regression_testsuite (__main__.ApparmorTestsuites)
     preparing apparmor_2.12-4ubuntu5.dsc... done
  Timer expired (10800 sec.), nuking pid 20299

$ watch free -m
total used free shared buff/cache available
Mem: 985 445 72 0 467 355
Swap: 1443 7 1436

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-29-generic 4.15.0-29.31
ProcVersionSignature: User Name 4.15.0-29.31-generic 4.15.18
Uname: Linux 4.15.0-29-generic x86_64
 total 0
 crw-rw---- 1 root audio 116, 1 Jul 26 09:35 seq
 crw-rw---- 1 root audio 116, 33 Jul 26 09:35 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Fri Jul 27 02:28:52 2018
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)


ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-29-generic root=UUID=41f2a2b1-0082-4a56-ad3b-9f99ca574aeb ro
 linux-restricted-modules-4.15.0-29-generic N/A
 linux-backports-modules-4.15.0-29-generic N/A
 linux-firmware 1.173.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-xenial
dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-xenial:cvnQEMU:ct1:cvrpc-i440fx-xenial:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-xenial
dmi.sys.vendor: QEMU

Po-Hsu Lin (cypressyew) wrote :
description: updated

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed

Hi Po-Hsu,

Alas, I'm not able to reproduce this in a kvm guest with only 768k:

Ran 56 tests in 1026.130s

ubuntu@sec-bionic-amd64:~/tests/qrt-test-apparmor$ free -m
              total used free shared buff/cache available
Mem: 733 543 59 0 130 84
Swap: 947 65 882
ubuntu@sec-bionic-amd64:~/tests/qrt-test-apparmor$ cat /proc/version_signature
Ubuntu 4.15.0-29.31-generic 4.15.18

Host kernel is also running 4.15.0-29.31-generic.

Is it possible one of the prior tests is running longer? Do the kernel-team's tests still build the entire kernel and modules before starting on the tests?

Po-Hsu Lin (cypressyew) wrote :

Hi Steve,

The node was deployed by our MaaS, we don't build the kernel before testing.

This was tested manually in two ways:
A. with the autotest framework:

  git clone --depth=1 git://kernel.ubuntu.com/ubuntu/autotest-client-tests -b master-next
  git clone --depth=1 git://kernel.ubuntu.com/ubuntu/autotest
  rm -fr autotest/client/tests
  ln -sf ~/autotest-client-tests autotest/client/tests
  AUTOTEST_PATH=/home/ubuntu/autotest sudo -E autotest/client/autotest-local --verbose autotest/client/tests/ubuntu_qrt_apparmor/control

B. Run the python script directly

  # Enable deb-src
  sudo apt-get install -y apparmor apparmor-profiles apparmor-utils apport attr devscripts execstack exim4 gawk git libapparmor-dev libapparmor-perl libcap2-bin libcap-dev libdbus-1-dev libgtk2.0-dev libpam-apparmor netcat pyflakes python3 python3-all-dev python-libapparmor python-pexpect quilt sudo gcc
  git clone --depth 1 https://git.launchpad.net/qa-regression-testing
  cd qa-regression-testing/scripts
  sudo ./test-apparmor.py

I tried several times on this and another KVM instance, it's 100% reproducible for a fresh node with either method A or B (it looks like the test will pass on second run, not sure if it has anything to do with the test setup process)

  1. Run the test either with A or B
  2. Reboot

Everytime when this test hang, it will stuck at the mmap test with operation="profile_load"

This hang (timeout) issue can be found on Azure kernel as well:

The zombie process is affecting the automation.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Po-Hsu Lin (cypressyew) on 2018-08-17
summary: - ubuntu_qrt_apparmor will hang with Bionic kernel on KVM instance
+ ubuntu_qrt_apparmor will hang with Bionic kernel on KVM instance (mmap
+ test timeout)

It looks like this hang issue was introduced because the mmap test has timed out.
And it gets killed by the autotest framework.

However it's not really "killed", you can see that it's still stuck with mmap test:

# ps aux
root 23787 0.0 0.0 4384 748 ? S 06:21 0:00 /tmp/testlibpIXHkr/source/bionic/apparmor-2.12/tests/regression/apparmor/mmap

# ps aux |grep Z
ubuntu 2138 0.0 0.0 0 0 ? Z 05:48 0:00 [sh] <defunct>
root 8307 0.0 0.0 0 0 ? Z 10:15 0:00 [unix_socket_cli] <defunct>

In this case, with PID 23787 killed, the test will keep going and terminates on jenkins properly.

Po-Hsu Lin (cypressyew) on 2018-08-23
summary: - ubuntu_qrt_apparmor will hang with Bionic kernel on KVM instance (mmap
- test timeout)
+ ubuntu_qrt_apparmor will hang with Bionic kernel on KVM / Azure instance
+ (mmap test timeout)
Po-Hsu Lin (cypressyew) wrote :

In this cycle, issue spotted on the following Azure nodes with 4.15 kernel:
* Standard_B4ms
* Standard_D4_v3
* Standard_DS1
* Standard_DS12_v2
* Standard_DS13
* Standard_DS13_v2
* Standard_DS14_v2
* Standard_E32-8s_v3
* Standard_GS5-8

Po-Hsu Lin (cypressyew) wrote :

For Azure 4.15 kernel on Xenial:
* Standard_B1s
* Standard_B4ms
* Standard_DS12_v2
* Standard_DS14_v2
* Standard_DS14_v2_promo
* Standard_DS1_v2
* Standard_E4_v3
* Standard_D32s_v3

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers