s390x broken with unknown syscall number on kernels < 5.8

Bug #1895132 reported by Christian Brauner
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Medium
Unassigned
Focal
Fix Released
Medium
Dan Streetman

Bug Description

SRU Justification

Note: I marked this as affecting bionic as well, as discovered in bug 1916485.

Impact: On kernels prior to 5.8 when a task is in traced state (due to audit, ptrace, or seccomp) s390x and a syscall is issued that the kernel doesn't know about s390x will not return ENOSYS in r2 but instead will return the syscall number. This breaks userspace all over the place. The following program compiled on s390x will output 500 instead of -ENOSYS:

root@test:~# cat test.c
#define _GNU_SOURCE
#include <libgen.h>
#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

static inline int dummy_inline_asm(void)
{
        register long r1 asm("r1") = 500;
        register long r2 asm("r2") = -1;
        register long r3 asm("r3") = -1;
        register long r4 asm("r4") = -1;
        register long r5 asm("r5") = -1;
        register long __res_r2 asm("r2");
        asm volatile(
            "svc 0\n\t"
             : "=d"(__res_r2)
             : "d"(r1), "0"(r2), "d"(r3), "d"(r4), "d"(r5)
             : "memory");
        return (int) __res_r2;
}

static inline int dummy_syscall(void)
{
        return syscall(500, -1, -1, -1, -1);
}

int main(int argc, char *argv[])
{
        printf("Uhm: %d\n", dummy_inline_asm());
        printf("Uhm: %d\n", dummy_syscall());

        exit(EXIT_SUCCESS);
}

This breaks LXD on s390x currently completely as well as strace.

Fix: Backport
commit cd29fa798001075a554b978df3a64e6656c25794
Author: Sven Schnelle <email address hidden>
Date: Fri Mar 6 13:18:31 2020 +0100

    s390/ptrace: return -ENOSYS when invalid syscall is supplied

    The current code returns the syscall number which an invalid
    syscall number is supplied and tracing is enabled. This makes
    the strace testsuite fail.

    Signed-off-by: Sven Schnelle <email address hidden>
    Signed-off-by: Vasily Gorbik <email address hidden>

which got released with 5.8. The commit missed to Cc stable and although I've asked Sven to include it in stable I'm not sure when or if it will show up there.

Regression Potential: Limited to s390x.

Test Case: The reproducer given above needs to output -ENOSYS instead of 500.

CVE References

Revision history for this message
Christian Brauner (cbrauner) wrote :

This needs to be backported to our 5.4 kernels.

Changed in linux (Ubuntu):
status: New → Confirmed
description: updated
Revision history for this message
Dan Streetman (ddstreet) wrote :

specifically, this bug was introduced by the commit 69ba0dbfabf6c1cffffcd88eabd2ac3959b3ee08 introduced from stable series bug 1885942, first included in version Ubuntu-5.4.0-43.47.

Revision history for this message
Dan Streetman (ddstreet) wrote :

This also is blocking migration of upstream systemd CI from bionic to focal (on s390x), as the system will hang at boot due to this problem when using upstream systemd code on focal with the latest 5.4 ubuntu kernel. Test systemd build is available at:
https://launchpad.net/~ddstreet/+archive/ubuntu/systemd-focal-ci

Installing that systemd package (without the patched kernel build from that ppa) will cause systemd to hang when restarting any service and will hang the boot.

Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Fix Committed
tags: added: s390x
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I facing a whole load of odd issues in recent Hirsute LXD containers on s390x.
Only s390x, only Hirsute - The guests didn't complete systemd initialization, some processes hang around, journal didn't start ...

ddstreet was so kind to recognize this on IRC and gave me a hint to this bug.
I was fomerly trying all kind of LXD versions, all behaved the same in regard to this issue.

Since it was mentioned to be introduced in 5.4.0-43.47 I was downgrading the kernel from 5.4.0-65 to 5.4.0-26. And e voila - my world was colorful and happy again.
So year, I seem to be affected by this and I must say it is a pretty heavy hitting as well as hard to debug issue.

+1 for a fast resolution ...

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Dan Streetman (ddstreet) wrote :

Verified with 5.4.0-65 kernel, upgrading to the latest upstream systemd hangs trying to restart services and hangs at boot.

Upgrading to the 5.4.0-66 kernel and then upgrading to the latest upstream systemd does not hang and (re)boots successfully.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (60.8 KiB)

This bug was fixed in the package linux - 5.4.0-66.74

---------------
linux (5.4.0-66.74) focal; urgency=medium

  * focal/linux: 5.4.0-66.74 -proposed tracker (LP: #1913152)

  * Add support for selective build of special drivers (LP: #1912789)
    - [Packaging] Add support for ODM drivers
    - [Packaging] Turn on ODM support for amd64

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 460-server series and update the 460 series
    (LP: #1913200)
    - [Config] dkms-versions -- drop NVIDIA 435 455 and 440-server
    - [Config] dkms-versions -- add the 460-server nvidia driver

  * Enable mute and micmute LED on HP EliteBook 850 G7 (LP: #1910102)
    - ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7

  * SYNA30B4:00 06CB:CE09 Mouse on HP EliteBook 850 G7 not working at all
    (LP: #1908992)
    - HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device

  * HD Audio Device PCI ID for the Intel Cometlake-R platform (LP: #1912427)
    - SAUCE: ALSA: hda: Add Cometlake-R PCI ID

  * switch to an autogenerated nvidia series based core via dkms-versions
    (LP: #1912803)
    - [Packaging] nvidia -- use dkms-versions to define versions built
    - [Packaging] update-version-dkms -- maintain flags fields
    - [Config] dkms-versions -- add transitional/skip information for nvidia
      packages

  * udpgro.sh in net from ubuntu_kernel_selftests seems not reflecting sub-test
    result (LP: #1908499)
    - selftests: fix the return value for UDP GRO test

  * qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP
    tx csum offload (LP: #1909062)
    - qede: fix offload for IPIP tunnel packets

  * Use DCPD to control HP DreamColor panel (LP: #1911001)
    - SAUCE: drm/dp: Another HP DreamColor panel brigntness fix

  * kvm: Windows 2k19 with Hyper-v role gets stuck on pending hypervisor
    requests on cascadelake based kvm hosts (LP: #1911848)
    - KVM: x86: Set KVM_REQ_EVENT if run is canceled with req_immediate_exit set

  * Ubuntu 20.10 four needed fixes to 'Add driver for Mellanox Connect-IB
    adapters' (LP: #1905574)
    - net/mlx5: Fix a race when moving command interface to polling mode

  * Fix right sounds and mute/micmute LEDs for HP ZBook Fury 15/17 G7 Mobile
    Workstation (LP: #1910561)
    - ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machines

  * Ubuntu 20.04 - multicast counter is not increased in ip -s (LP: #1901842)
    - net/mlx5e: Fix multicast counter not up-to-date in "ip -s"

  * eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with 5.4 P8 /
    P9 (LP: #1882503)
    - selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic

  * DMI entry syntax fix for Pegatron / ByteSpeed C15B (LP: #1910639)
    - Input: i8042 - unbreak Pegatron C15B

  * CVE-2020-29372
    - mm: check that mm is still valid in madvise()

  * update ENA driver, incl. new ethtool stats (LP: #1910291)
    - net: ena: Change WARN_ON expression in ena_del_napi_in_range()
    - net: ena: ethtool: convert stat_offset to 64 bit resolution
    - net: ena: eth...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Dan Streetman (ddstreet)
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → Medium
Stefan Bader (smb)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Dan Streetman (ddstreet) wrote :

Using verification steps from bug 1916485

ubuntu@test-s390x:~$ uname -a
Linux test-s390x 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:16:31 UTC 2021 s390x s390x s390x GNU/Linux
ubuntu@test-s390x:~$ mkdir h
ubuntu@test-s390x:~$ cd h
ubuntu@test-s390x:~/h$ sudo tar xf ../hirsute-server-cloudimg-s390x-root.tar.xz
ubuntu@test-s390x:~/h$ sudo systemd-nspawn
Spawning container h on /home/ubuntu/h.
Press ^] three times within 1s to kill container.
root@h:~# ls -l /usr/bin/gpg
-rwxr-xr-x 1 root root 1361888 Feb 22 09:33 /usr/bin/gpg
root@h:~# test -x /usr/bin/gpg || echo "fail"
fail

ubuntu@test-s390x:~$ uname -a
Linux test-s390x 4.15.0-144-generic #148-Ubuntu SMP Sat May 8 02:31:39 UTC 2021 s390x s390x s390x GNU/Linux
ubuntu@test-s390x:~$ mkdir h
ubuntu@test-s390x:~$ cd h
ubuntu@test-s390x:~/h$ sudo tar xf ../hirsute-server-cloudimg-s390x-root.tar.xz
ubuntu@test-s390x:~/h$ sudo systemd-nspawn
Spawning container h on /home/ubuntu/h.
Press ^] three times within 1s to kill container.
root@h:~# ls -l /usr/bin/gpg
-rwxr-xr-x 1 root root 1361888 Feb 22 09:33 /usr/bin/gpg
root@h:~# test -x /usr/bin/gpg || echo "fail"
root@h:~#

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (21.6 KiB)

This bug was fixed in the package linux - 4.15.0-144.148

---------------
linux (4.15.0-144.148) bionic; urgency=medium

  * bionic/linux: 4.15.0-144.148 -proposed tracker (LP: #1927648)

  * Introduce the 465 driver series, fabric-manager, and libnvidia-nscq
    (LP: #1925522)
    - debian/dkms-versions -- add NVIDIA 465 and migrate 450 to 460

  * xfrm_policy.sh / pmtu.sh / udpgso_bench.sh from net in
    ubuntu_kernel_selftests will fail if running the whole suite (LP: #1856010)
    - selftests/net: bump timeout to 5 minutes

  * locking/qrwlock: Fix ordering in queued_write_lock_slowpath() (LP: #1926184)
    - locking/barriers: Introduce smp_cond_load_relaxed() and
      atomic_cond_read_relaxed()
    - locking/qrwlock: Fix ordering in queued_write_lock_slowpath()

  * Bionic update: upstream stable patchset 2021-04-30 (LP: #1926808)
    - net: fec: ptp: avoid register access when ipg clock is disabled
    - powerpc/4xx: Fix build errors from mfdcr()
    - atm: eni: dont release is never initialized
    - atm: lanai: dont run lanai_dev_close if not open
    - Revert "r8152: adjust the settings about MAC clock speed down for RTL8153"
    - ixgbe: Fix memleak in ixgbe_configure_clsu32
    - net: tehuti: fix error return code in bdx_probe()
    - sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count
    - gpiolib: acpi: Add missing IRQF_ONESHOT
    - nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default
    - NFS: Correct size calculation for create reply length
    - net: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()
    - net: wan: fix error return code of uhdlc_init()
    - atm: uPD98402: fix incorrect allocation
    - atm: idt77252: fix null-ptr-dereference
    - sparc64: Fix opcode filtering in handling of no fault loads
    - u64_stats,lockdep: Fix u64_stats_init() vs lockdep
    - drm/radeon: fix AGP dependency
    - nfs: we don't support removing system.nfs4_acl
    - ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
    - ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
    - squashfs: fix inode lookup sanity checks
    - squashfs: fix xattr id and id lookup sanity checks
    - arm64: dts: ls1046a: mark crypto engine dma coherent
    - arm64: dts: ls1012a: mark crypto engine dma coherent
    - arm64: dts: ls1043a: mark crypto engine dma coherent
    - ARM: dts: at91-sama5d27_som1: fix phy address to 7
    - dm ioctl: fix out of bounds array access when no devices
    - bus: omap_l3_noc: mark l3 irqs as IRQF_NO_THREAD
    - libbpf: Fix INSTALL flag order
    - macvlan: macvlan_count_rx() needs to be aware of preemption
    - net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port
    - e1000e: add rtnl_lock() to e1000_reset_task
    - e1000e: Fix error handling in e1000_set_d0_lplu_state_82571
    - net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template
    - ftgmac100: Restart MAC HW once
    - can: peak_usb: add forgotten supported devices
    - can: c_can_pci: c_can_pci_remove(): fix use-after-free
    - can: c_can: move runtime PM enable/disable to c_can_platform
    - can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning
    - mac80211: fix rate mask reset
    - net: cdc-pho...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Mathew Hodson (mhodson)
Changed in linux (Ubuntu):
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.