azure 4.15 kernel: reading sysfs file causing oops

Bug #1789638 reported by Colin Ian King on 2018-08-29
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Bionic
Medium
Marcelo Cerri
linux-azure (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned

Bug Description

Kernel: 4.15.0-1021-azure, in Xenial VM on Azure.

How to reproduce:

git clone git://kernel.ubuntu.com/cking/stress-ng
cd stress-ng
make
./stress-ng --sysfs 0 -t 120

One gets the following:

[ 22.451885] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[ 22.455286] IP: read_avail_show+0x1c/0x40
[ 22.455286] PGD 800000042d59e067 P4D 800000042d59e067 PUD 42eb8c067 PMD 0
[ 22.455286] Oops: 0000 [#1] SMP PTI
[ 22.455286] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_security ip_tables x_tables serio_raw joydev hv_balloon ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd hyperv_fb hid_hyperv pata_acpi cfbfillrect hyperv_keyboard cfbimgblt hid cfbcopyarea hv_netvsc hv_utils
[ 22.455286] CPU: 1 PID: 1670 Comm: cat Not tainted 4.15.0-1021-azure #21~16.04.1-Ubuntu
[ 22.455286] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 22.455286] RIP: 0010:read_avail_show+0x1c/0x40
[ 22.455286] RSP: 0018:ffffafa4c4eafdb0 EFLAGS: 00010286
[ 22.455286] RAX: 0000000000000000 RBX: ffff9db36c93e880 RCX: ffff9db36f136908
[ 22.860062] RDX: 0000000000000000 RSI: ffff9db364548000 RDI: ffff9db364548000
[ 22.888042] RBP: ffffafa4c4eafdb0 R08: ffff9db364548000 R09: ffff9db36c049840
[ 22.920041] R10: ffff9db364548000 R11: 0000000000000000 R12: ffffffff92ae9440
[ 22.948058] R13: ffff9db36c22d200 R14: 0000000000000001 R15: ffff9db36c93e880
[ 22.972043] FS: 00007f67eeec6700(0000) GS:ffff9db37fd00000(0000) knlGS:0000000000000000
[ 23.004046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.024016] CR2: 0000000000000004 CR3: 000000042c37a003 CR4: 00000000001606e0
[ 23.048014] Call Trace:
[ 23.060019] vmbus_chan_attr_show+0x21/0x30
[ 23.076018] sysfs_kf_seq_show+0xa2/0x130
[ 23.088030] kernfs_seq_show+0x27/0x30
[ 23.100020] seq_read+0xb7/0x480
[ 23.112014] kernfs_fop_read+0x111/0x190
[ 23.128017] ? security_file_permission+0xa1/0xc0
[ 23.144013] __vfs_read+0x1b/0x40
[ 23.156019] vfs_read+0x93/0x130
[ 23.168013] SyS_read+0x55/0xc0
[ 23.180021] do_syscall_64+0x73/0x130
[ 23.192014] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 23.212022] RIP: 0033:0x7f67ee9d8260
[ 23.224016] RSP: 002b:00007fffdc193ff8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 23.252022] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f67ee9d8260
[ 23.276019] RDX: 0000000000020000 RSI: 00007f67eed0c000 RDI: 0000000000000003
[ 23.300020] RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000
[ 23.328025] R10: 000000000000037b R11: 0000000000000246 R12: 00007f67eed0c000
[ 23.352036] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000020000
[ 23.376678] Code: fb 3a 17 00 48 98 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 8b 87 38 01 00 00 49 89 f0 8b 97 48 01 00 00 4c 89 c7 48 89 e5 <8b> 48 04 8b 00 29 ca 89 c6 29 ce 01 c2 39 c1 0f 46 d6 48 c7 c6
[ 23.444022] RIP: read_avail_show+0x1c/0x40 RSP: ffffafa4c4eafdb0
[ 23.468021] CR2: 0000000000000004
[ 23.481135] ---[ end trace 348a4b7d5a6747d1 ]---

Cornered this down to just reading:

cat /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/99221fa0-24ad-11e2-be98-001aa01bbf6e/channels/4/read_avail

There are various /sysfs VMBUS files that trigger this, see a fix on comment #5 below that addresses all the ones I could find.

Changed in linux (Ubuntu):
importance: Undecided → Medium

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1789638

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Colin Ian King (colin-king) wrote :

In read_avail_show, rbi->ring_buffer is NULL, causing the OOPS.

Colin Ian King (colin-king) wrote :

Tested with today's linux tip, 4.19.0-rc1 @ commit 3f16503b7d2274ac8cbab11163047ac0b4c66cfe, issue still occurs.

Colin Ian King (colin-king) wrote :

Same issue with rbi->ring_buffer on write_avail_show() too.

Colin Ian King (colin-king) wrote :

Workaround fix attached. I suspect there may be a more elegant solution.

description: updated
description: updated
tags: added: patch
Joshua R. Poulson (jrp) wrote :

This has been submitted to 4.19 and stable:

For unsupported device types, the vmbus channel ringbuffer is never
initialized, and therefore reading the sysfs files will return garbage
or cause a kernel OOPS.

Fixes: c2e5df616e1a ("vmbus: add per-channel sysfs info")

Colin Ian King (colin-king) wrote :

The above patch fixes the issue for me. Thanks

Marcelo Cerri (mhcerri) on 2018-09-03
Changed in linux (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → Marcelo Cerri (mhcerri)
importance: Undecided → Medium
Changed in linux-azure (Ubuntu Bionic):
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (23.9 KiB)

This bug was fixed in the package linux-azure - 4.15.0-1025.26~16.04.1

---------------
linux-azure (4.15.0-1025.26~16.04.1) xenial; urgency=medium

  [ Ubuntu: 4.15.0-36.39 ]

  * CVE-2018-14633
    - iscsi target: Use hex2bin instead of a re-implementation
  * CVE-2018-17182
    - mm: get rid of vmacache_flush_all() entirely

linux-azure (4.15.0-1024.25) bionic; urgency=medium

  * linux-azure: 4.15.0-1024.25 -proposed tracker (LP: #1791726)

  * [Regression] kernel crashdump fails on arm64 (LP: #1786878)
    - [config] update configs after rebase

  * azure 4.15 kernel: reading sysfs file causing oops (LP: #1789638)
    - SAUCE: vmbus: don't return values for uninitalized channels

  [ Ubuntu: 4.15.0-35.38 ]

  * linux: 4.15.0-35.38 -proposed tracker (LP: #1791719)
  * device hotplug of vfio devices can lead to deadlock in vfio_pci_release
    (LP: #1792099)
    - SAUCE: vfio -- release device lock before userspace requests
  * L1TF mitigation not effective in some CPU and RAM combinations
    (LP: #1788563)
    - x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit
    - x86/speculation/l1tf: Fix off-by-one error when warning that system has too
      much RAM
    - x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+
  * CVE-2018-15594
    - x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
  * CVE-2017-5715 (Spectre v2 s390x)
    - KVM: s390: implement CPU model only facilities
    - s390: detect etoken facility
    - KVM: s390: add etoken support for guests
    - s390/lib: use expoline for all bcr instructions
    - s390: fix br_r1_trampoline for machines without exrl
    - SAUCE: s390: use expoline thunks for all branches generated by the BPF JIT
  * Ubuntu18.04.1: cpuidle: powernv: Fix promotion from snooze if next state
    disabled (performance) (LP: #1790602)
    - cpuidle: powernv: Fix promotion from snooze if next state disabled
  * Watchdog CPU:19 Hard LOCKUP when kernel crash was triggered (LP: #1790636)
    - powerpc: hard disable irqs in smp_send_stop loop
    - powerpc: Fix deadlock with multiple calls to smp_send_stop
    - powerpc: smp_send_stop do not offline stopped CPUs
    - powerpc/powernv: Fix opal_event_shutdown() called with interrupts disabled
  * Security fix: check if IOMMU page is contained in the pinned physical page
    (LP: #1785675)
    - vfio/spapr: Use IOMMU pageshift rather than pagesize
    - KVM: PPC: Check if IOMMU page is contained in the pinned physical page
  * Missing Intel GPU pci-id's (LP: #1789924)
    - drm/i915/kbl: Add KBL GT2 sku
    - drm/i915/whl: Introducing Whiskey Lake platform
    - drm/i915/aml: Introducing Amber Lake platform
    - drm/i915/cfl: Add a new CFL PCI ID.
  * CVE-2018-15572
    - x86/speculation: Protect against userspace-userspace spectreRSB
  * Support Power Management for Thunderbolt Controller (LP: #1789358)
    - thunderbolt: Handle NULL boot ACL entries properly
    - thunderbolt: Notify userspace when boot_acl is changed
    - thunderbolt: Use 64-bit DMA mask if supported by the platform
    - thunderbolt: Do not unnecessarily call ICM get route
    - thunderbolt: No need to take tb->lock in domain suspend/complete
    - thunderbol...

Changed in linux-azure (Ubuntu):
status: New → Fix Released
status: New → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (23.9 KiB)

This bug was fixed in the package linux-azure - 4.15.0-1025.26

---------------
linux-azure (4.15.0-1025.26) bionic; urgency=medium

  [ Ubuntu: 4.15.0-36.39 ]

  * CVE-2018-14633
    - iscsi target: Use hex2bin instead of a re-implementation
  * CVE-2018-17182
    - mm: get rid of vmacache_flush_all() entirely

linux-azure (4.15.0-1024.25) bionic; urgency=medium

  * linux-azure: 4.15.0-1024.25 -proposed tracker (LP: #1791726)

  * [Regression] kernel crashdump fails on arm64 (LP: #1786878)
    - [config] update configs after rebase

  * azure 4.15 kernel: reading sysfs file causing oops (LP: #1789638)
    - SAUCE: vmbus: don't return values for uninitalized channels

  [ Ubuntu: 4.15.0-35.38 ]

  * linux: 4.15.0-35.38 -proposed tracker (LP: #1791719)
  * device hotplug of vfio devices can lead to deadlock in vfio_pci_release
    (LP: #1792099)
    - SAUCE: vfio -- release device lock before userspace requests
  * L1TF mitigation not effective in some CPU and RAM combinations
    (LP: #1788563)
    - x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit
    - x86/speculation/l1tf: Fix off-by-one error when warning that system has too
      much RAM
    - x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+
  * CVE-2018-15594
    - x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
  * CVE-2017-5715 (Spectre v2 s390x)
    - KVM: s390: implement CPU model only facilities
    - s390: detect etoken facility
    - KVM: s390: add etoken support for guests
    - s390/lib: use expoline for all bcr instructions
    - s390: fix br_r1_trampoline for machines without exrl
    - SAUCE: s390: use expoline thunks for all branches generated by the BPF JIT
  * Ubuntu18.04.1: cpuidle: powernv: Fix promotion from snooze if next state
    disabled (performance) (LP: #1790602)
    - cpuidle: powernv: Fix promotion from snooze if next state disabled
  * Watchdog CPU:19 Hard LOCKUP when kernel crash was triggered (LP: #1790636)
    - powerpc: hard disable irqs in smp_send_stop loop
    - powerpc: Fix deadlock with multiple calls to smp_send_stop
    - powerpc: smp_send_stop do not offline stopped CPUs
    - powerpc/powernv: Fix opal_event_shutdown() called with interrupts disabled
  * Security fix: check if IOMMU page is contained in the pinned physical page
    (LP: #1785675)
    - vfio/spapr: Use IOMMU pageshift rather than pagesize
    - KVM: PPC: Check if IOMMU page is contained in the pinned physical page
  * Missing Intel GPU pci-id's (LP: #1789924)
    - drm/i915/kbl: Add KBL GT2 sku
    - drm/i915/whl: Introducing Whiskey Lake platform
    - drm/i915/aml: Introducing Amber Lake platform
    - drm/i915/cfl: Add a new CFL PCI ID.
  * CVE-2018-15572
    - x86/speculation: Protect against userspace-userspace spectreRSB
  * Support Power Management for Thunderbolt Controller (LP: #1789358)
    - thunderbolt: Handle NULL boot ACL entries properly
    - thunderbolt: Notify userspace when boot_acl is changed
    - thunderbolt: Use 64-bit DMA mask if supported by the platform
    - thunderbolt: Do not unnecessarily call ICM get route
    - thunderbolt: No need to take tb->lock in domain suspend/complete
    - thunderbolt: Use correct I...

Changed in linux-azure (Ubuntu Bionic):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers