bt_iter() crash due to NULL pointer

Bug #1744300 reported by Guilherme G. Piccoli on 2018-01-19
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned
Artful
Undecided
Unassigned

Bug Description

SRU Justification:

[Impact]
The following crash was observed in Ubuntu 16.04 running linux-gcp kernel version 4.13 (specifically 4.13.0-1006.9):

[ 10.972644] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[ 10.980708] IP: bt_iter+0x31/0x50
[ 10.984310] PGD 0
[ 10.984310] P4D 0
[ 10.986439]
[ 10.990190] Oops: 0000 [#1] SMP PTI
[ 11.016282] Workqueue: kblockd blk_mq_timeout_work
[ 11.021196] task: ffff8e7c2e700000 task.stack: ffffb8d4c67a8000
[ 11.027234] RIP: 0010:bt_iter+0x31/0x50
[ 11.031187] RSP: 0018:ffffb8d4c67abda0 EFLAGS: 00010206
[ 11.037730] RAX: ffffb8d4c67abdd0 RBX: 0000000000000180 RCX: 0000000000000000
[ 11.045172] RDX: ffff8e7c34c8d280 RSI: 0000000000000000 RDI: ffff8e7c32dd8000
[ 11.053321] RBP: ffffb8d4c67abe20 R08: 0000000000000000 R09: 0000200000000100
[ 11.060582] R10: 0000000000000130 R11: 00000000fffee5bf R12: ffff8e7c3572c790
[ 11.068094] R13: ffff8e7c3572c780 R14: 0000000000000008 R15: ffff8e7c35e7c180
[ 11.075522] FS: 0000000000000000(0000) GS:ffff8e7c3a4c0000(0000) knlGS:0000000000000000
[ 11.083721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.089593] CR2: 0000000000000030 CR3: 000000009e20a003 CR4: 00000000001606e0
[ 11.096871] Call Trace:
[ 11.099468] ? blk_mq_queue_tag_busy_iter+0xe2/0x1f0
[ 11.104558] ? blk_mq_rq_timed_out+0x70/0x70
[ 11.109130] ? blk_mq_rq_timed_out+0x70/0x70
[ 11.114933] blk_mq_timeout_work+0xbb/0x170
[ 11.119408] process_one_work+0x156/0x410
[ 11.123641] worker_thread+0x4b/0x460
[ 11.127827] kthread+0x109/0x140
[ 11.131186] ? process_one_work+0x410/0x410
[ 11.135499] ? kthread_create_on_node+0x70/0x70
[ 11.140408] ret_from_fork+0x1f/0x30
[ 11.144110] Code: 89 d0 48 8b 3a 0f b6 48 18 48 8b 97 30 01 00 00 84 c9 75 03 03 72 04 48 8b 92 80 00 00 00 89 f6 48 8b 34 f2 48 8b 97 c0 00 00 00 <48> 39 56 30 74 06 b8 01 00 00 00 c3 55 48 8b 50 10 48 89 e5 ff
[ 11.167573] RIP: bt_iter+0x31/0x50 RSP: ffffb8d4c67abda0
[ 11.173028] CR2: 0000000000000030
[ 11.176515] ---[ end trace 2f8e5b1cf4139fec ]---
[ 11.182589] Kernel panic - not syncing: Fatal exception

Basically, we have a NULL pointer dereference while in bt_iter() function - this is caused because after the merge of blk-mq scheduler capability on Linux kernel , tags->rqs[] array has been dinamically assigned and there's a small window of time in which the bit is set but tags->rqs[] array wasn't allocated yet. This was reported to happen in about 5% of test runs (more details on test section).

[Fix]
The fix is small and simple, and it's upstream already. Basically, it adds a NULL pointer check on bt_iter() and bt_tags_iter() functions.

The fix is: 7f5562d5ecc4 ("blk-mq-tag: check for NULL rq when iterating tags"), by Jens Axboe.
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7f5562d5ecc4)

[Testcase]
Since the problem manifests in a small non-deterministic time window, there's no easy test to reproduce this. In our case, it was observed while testing a large number of CPU's and attached disks (>200 disks, >150 cores), trying to exercise all CPUs and disks (the disks with quick dd commands). In this test scenario, as already mentioned, issue occured in about 5% of the runs.

no longer affects: linux-gcp (Ubuntu)

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1744300

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
no longer affects: linux (Ubuntu Bionic)
Changed in linux (Ubuntu Xenial):
status: New → Fix Released
Changed in linux (Ubuntu Artful):
status: New → In Progress
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
tags: added: verification-done-artful
removed: verification-needed-artful
Launchpad Janitor (janitor) wrote :
Download full text (20.1 KiB)

This bug was fixed in the package linux - 4.13.0-36.40

---------------
linux (4.13.0-36.40) artful; urgency=medium

  * linux: 4.13.0-36.40 -proposed tracker (LP: #1750010)

  * Rebuild without "CVE-2017-5754 ARM64 KPTI fixes" patch set

linux (4.13.0-35.39) artful; urgency=medium

  * linux: 4.13.0-35.39 -proposed tracker (LP: #1748743)

  * CVE-2017-5715 (Spectre v2 Intel)
    - Revert "UBUNTU: SAUCE: turn off IBPB when full retpoline is present"
    - SAUCE: turn off IBRS when full retpoline is present
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

linux (4.13.0-34.37) artful; urgency=medium

  * linux: 4.13.0-34.37 -proposed tracker (LP: #1748475)

  * libata: apply MAX_SEC_1024 to all LITEON EP1 series devices (LP: #1743053)
    - libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

  * KVM patches for s390x to provide facility bits 81 (ppa15) and 82 (bpb)
    (LP: #1747090)
    - KVM: s390: wire up bpb feature

  * artful 4.13 i386 kernels crash after memory hotplug remove (LP: #1747069)
    - Revert "mm, memory_hotplug: do not associate hotadded memory to zones until
      online"

  * CVE-2017-5715 (Spectre v2 Intel)
    - x86/feature: Enable the x86 feature to control Speculation
    - x86/feature: Report presence of IBPB and IBRS control
    - x86/enter: MACROS to set/clear IBRS and set IBPB
    - x86/enter: Use IBRS on syscall and interrupts
    - x86/idle: Disable IBRS entering idle and enable it on wakeup
    - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup
    - x86/mm: Set IBPB upon context switch
    - x86/mm: Only set IBPB when the new thread cannot ptrace current thread
    - x86/entry: Stuff RSB for entry to kernel for non-SMEP platform
    - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm
    - x86/kvm: Set IBPB when switching VM
    - x86/kvm: Toggle IBRS on VM entry and exit
    - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature
    - x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control
    - x86/cpu/AMD: Add speculative control support for AMD
    - x86/microcode: Extend post microcode reload to support IBPB feature
    - KVM: SVM: Do not intercept new speculative control MSRs
    - x86/svm: Set IBRS value on VM entry and exit
    - x86/svm: Set IBPB when running a different VCPU
    - KVM: x86: Add speculative control CPUID support for guests
    - SAUCE: turn off IBPB when full retpoline is present

  * Artful 4.13 fixes for tun (LP: #1748846)
    - tun: call dev_get_valid_name() before register_netdevice()
    - tun: allow positive return values on dev_get_valid_name() call
    - tun/tap: sanitize TUNSETSNDBUF input

  * boot failure on AMD Raven + WestonXT (LP: #1742759)
    - SAUCE: drm/amdgpu: add atpx quirk handling (v2)

linux (4.13.0-33.36) artful; urgency=low

  * linux: 4.13.0-33.36 -proposed tracker (LP: #1746903)

  [ Stefan Bader ]
  * starting VMs causing retpoline4 to reboot (LP: #1747507) // CVE-2017-5715
    (Spectre v2 retpoline)
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
    - x86/retpol...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers