test_regression_testsuite from ubuntu_qrt_apparmor failed on Focal zVM / B-GCP-5.4

Bug #1876697 reported by Po-Hsu Lin on 2020-05-04
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QA Regression Testing
Undecided
Unassigned
ubuntu-kernel-tests
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned

Bug Description

Issue found on zVM "kernel04" with 5.4.0-29.33

 ======================================================================
 FAIL: test_regression_testsuite (__main__.ApparmorTestsuites)
 Run kernel regression tests
 ----------------------------------------------------------------------
 Traceback (most recent call last):
   File "./test-apparmor.py", line 1746, in test_regression_testsuite
     self.assertEqual(expected, rc, result + report)
 AssertionError: Got exit code 2, expected 0

 running aa_exec

 running access
 xfail: ACCESS file rx (r)
 xfail: ACCESS file rwx (r)
 xfail: ACCESS file r (wx)
 xfail: ACCESS file rx (wx)
 xfail: ACCESS file rwx (wx)
 xfail: ACCESS dir rwx (r)
 xfail: ACCESS dir r (wx)
 xfail: ACCESS dir rx (wx)
 xfail: ACCESS dir rwx (wx)

 running at_secure

 running introspect

 running capabilities
         (ptrace)
         (sethostname)
         (setdomainname)
         (setpriority)
         (setscheduler)
 Error: syscall_setscheduler failed. Test 'syscall_setscheduler -- unconfined' was expected to 'pass'. Reason for failure 'FAIL: Can't set SCHED_RR: Operation not permitted'
 Error: syscall_setscheduler failed. Test 'syscall_setscheduler -- all caps' was expected to 'pass'. Reason for failure 'FAIL: Can't set SCHED_RR: Operation not permitted'
   preparing apparmor_2.13.3-7ubuntu5.dsc... done
 Error: syscall_setscheduler failed. Test 'syscall_setscheduler -- capability sys_nice' was expected to 'pass'. Reason for failure 'FAIL: Can't set SCHED_RR: Operation not permitted'
 Error: changehat_wrapper failed. Test 'syscall_setscheduler changehat -- all caps' was expected to 'pass'. Reason for failure 'FAIL: Can't set SCHED_RR: Operation not permitted'
 Error: changehat_wrapper failed. Test 'syscall_setscheduler changehat -- capability sys_nice' was expected to 'pass'. Reason for failure 'FAIL: Can't set SCHED_RR: Operation not permitted'
         (reboot)
         (chroot)
         (mlockall)
         (net_raw)

Please find attachment for the complete test log.

Po-Hsu Lin (cypressyew) wrote :
Po-Hsu Lin (cypressyew) wrote :

The qrt test suite was not tested on other s390x instances yet (LPAR / zKVM)

tags: added: s390x sru-20200427 ubuntu-qrt-apparmor
Po-Hsu Lin (cypressyew) wrote :

qrt test suite passed on ARM64 / AMD64 / PowerPC (P8) with this kernel.

P9 was not tested yet as well.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1876697

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: focal

I have seen a similar failure with that specific test when running the tests under virtualbox on x86, though I have not tried it in several years.

If this is the expected behavior going forward on s390s, we can address it in qa-regression-testing.

Thanks.

My guess is that the test might be hitting the following code from kernel/sched/core.c:

#ifdef CONFIG_RT_GROUP_SCHED
                /*
                 * Do not allow realtime tasks into groups that have no runtime
                 * assigned.
                 */
                if (rt_bandwidth_enabled() && rt_policy(policy) &&
                                task_group(p)->rt_bandwidth.rt_runtime == 0 &&
                                !task_group_is_autogroup(task_group(p))) {
                        retval = -EPERM;
                        goto unlock;
                }
#endif

Which is not really about the capabilities. That is just a guess.

Po-Hsu Lin (cypressyew) wrote :

This issue can be found on older Focal kernels as well (since 5.4.0-24.28, first failure found on zVM, however zKVM passed and LPAR was not tested)

With this kernel 5.4.0-29.33, it can be found on LPAR as well (zKVM not tested)
But not with other kernels.

Stefan Bader (smb) wrote :

Reading about SCHED_RR, we had some reports in focal which were about issues setting real-time priority to processes and that appeared to be caused by enabling CONFIG_RT_GROUP_SCHED (which was a request by docker). That happened before release and after 5.4.0-21 (see bug #1873315). And Thadeu's commment #6 points to code which is active only when that config is turned on.

So this is something that we released with and the question is whether it should be "fixed" by turning this off again or having properly handled the special cases.

Steve Beattie (sbeattie) wrote :

All that about CONFIG_RT_GROUP_SCHED seems sensible, but then I am confused as to why is it only showing up in s390x environments?

The test is trying to exercise CAP_SYS_NICE, and doing so by calling

  setpriority(PRIO_PROCESS, 0, -5)

Does the test needs to be put into a cgroup with rt allocations if CONFIG_RT_GROUP_SCHED is set?

Seth Arnold (seth-arnold) wrote :

This feels related to https://bugs.launchpad.net/ubuntu/+source/rtkit/+bug/1875665 which was filed by amd64 users.

Po-Hsu Lin (cypressyew) on 2020-05-13
tags: added: 5.4
Po-Hsu Lin (cypressyew) wrote :

This issue can be found on B-GCP-5.4
5.4.0-1011.11~18.04.1

tags: added: kqa
tags: added: kqa-blocker
removed: kqa sru-20200427
tags: added: sru-20200518
summary: - test_regression_testsuite from ubuntu_qrt_apparmor failed on Focal zVM
+ test_regression_testsuite from ubuntu_qrt_apparmor failed on Focal zVM /
+ B-GCP-5.4

So, B-GCP-5.4 as well as F-GCP 5.4.0-1011 have RT_GROUP_SCHED on, and that has been turned off by later versions, like F-GCP 5.4.0-1012 and B-GCP-5.4 5.4.0-1016.

@cypressyew, can you confirm that it doesn't happen on such kernels?

Thanks.
Cascardo.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers