OOPS on wily+ for Haswell-ULT and Broadwell

Bug #1577748 reported by nick black on 2016-05-03
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Kamal Mostafa
Vivid
Undecided
Kamal Mostafa
Wily
Undecided
Kamal Mostafa
Xenial
Undecided
Kamal Mostafa

Bug Description

After upgrading to the wily 4.2 kernel, we see early OOPSes similar to the following on Haswell-ULT and Broadwell processors:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 1 at /build/linux-lts-wily-MVxxD8/linux-lts-wily-4.2.0/arch/x86/mm/ioremap.c:198 __ioremap_caller+0x2a2/0x360()
Info: mapping multiple BARs. Your kernel is fine.
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu
Hardware name: Hewlett-Packard HP EliteBook Folio 1040 G1/213E, BIOS L83 Ver. 01.07 03/28/2014
 ffffffff81a91c80 ffff88023522bad8 ffffffff817b5fed 0000000000000001
 ffff88023522bb28 ffff88023522bb18 ffffffff8107799a ffffc90000cbe000
 ffffc90000cb8000 ffffc90000cb8000 0000000000006000 ffff880230297040
Call Trace:
 [<ffffffff817b5fed>] dump_stack+0x45/0x57
 [<ffffffff8107799a>] warn_slowpath_common+0x8a/0xc0
 [<ffffffff81077a16>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff81064942>] __ioremap_caller+0x2a2/0x360
 [<ffffffff81064a17>] ioremap_nocache+0x17/0x20
 [<ffffffff810393c5>] snb_uncore_imc_init_box+0x65/0x90
 [<ffffffff81037955>] uncore_pci_probe+0xd5/0x1c0
 [<ffffffff813f3a75>] local_pci_probe+0x45/0xa0
 [<ffffffff813f4c34>] ? pci_match_device+0xf4/0x120
 [<ffffffff813f4d6a>] pci_device_probe+0xca/0x110
 [<ffffffff814f6293>] driver_probe_device+0x213/0x460
 [<ffffffff814f6570>] __driver_attach+0x90/0xa0
 [<ffffffff814f64e0>] ? driver_probe_device+0x460/0x460
 [<ffffffff814f409d>] bus_for_each_dev+0x5d/0x90
 [<ffffffff814f5c0e>] driver_attach+0x1e/0x20
 [<ffffffff814f5760>] bus_add_driver+0x1d0/0x290
 [<ffffffff81d606cf>] ? uncore_cpu_setup+0x12/0x12
 [<ffffffff814f6f30>] driver_register+0x60/0xe0
 [<ffffffff813f33dc>] __pci_register_driver+0x4c/0x50
 [<ffffffff81d607b5>] intel_uncore_init+0xe6/0x2fc
 [<ffffffff81d606cf>] ? uncore_cpu_setup+0x12/0x12
 [<ffffffff8100213d>] do_one_initcall+0xcd/0x1f0
 [<ffffffff81094e00>] ? parse_args+0x120/0x470
 [<ffffffff810b7528>] ? __wake_up+0x48/0x60
 [<ffffffff81d512c5>] kernel_init_freeable+0x199/0x226
 [<ffffffff81d509a1>] ? initcall_blacklist+0xc0/0xc0
 [<ffffffff817a5f50>] ? rest_init+0x80/0x80
 [<ffffffff817a5f5e>] kernel_init+0xe/0xe0
 [<ffffffff817bd81f>] ret_from_fork+0x3f/0x70
 [<ffffffff817a5f50>] ? rest_init+0x80/0x80
---[ end trace 5f5beae5b9d2dc45 ]---

the issue is fixed in upstream kernels, via two patches:

https://lkml.org/lkml/2015/12/11/165
http://lkml.iu.edu/hypermail/linux/kernel/1601.3/03815.html

each is a one-liner against the mch_quirk_devices[] structure.

[nlb-glaptop](0) $ cat /proc/version_signature
Ubuntu 4.2.0-35.40~14.04.1-generic 4.2.8-ckt5
[nlb-glaptop](0) $

[nlb-glaptop](0) $ lsb_release -rd
Description: Ubuntu 14.04 LTS
Release: 14.04
[nlb-glaptop](0) $

[nlb-glaptop](0) $ apt-cache policy linux-generic-lts-wily
linux-generic-lts-wily:
  Installed: 4.2.0.35.28
  Candidate: 4.2.0.35.28
  Version table:
 *** 4.2.0.35.28 0
        600 https://rapture-prod.corp.google.com/ <email address hidden>/main amd64 Packages
        100 /var/lib/dpkg/status
[nlb-glaptop](0) $

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1577748

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
nick black (dankamongmen) wrote :

apport-collect crashes:

[nlb-glaptop](0) $ apport-collect 1577748
Traceback (most recent call last):
  File "/usr/share/apport/apport-gtk", line 594, in <module>
    app.run_argv()
  File "/usr/lib/python2.7/dist-packages/apport/ui.py", line 653, in run_argv
    return self.run_update_report()
  File "/usr/lib/python2.7/dist-packages/apport/ui.py", line 525, in run_update_report
    pkgs = self.crashdb.get_affected_packages(self.options.update_report)
  File "/usr/lib/python2.7/dist-packages/apport/crashdb_impl/memory.py", line 79, in get_affected_packages
    return [self.reports[id]['report']['SourcePackage']]
IndexError: list index out of range
[nlb-glaptop](1) $

Manually Confirming

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed
Changed in linux (Ubuntu Wily):
status: New → Fix Committed
Changed in linux (Ubuntu Vivid):
status: New → Fix Committed
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux (Ubuntu Wily):
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-vivid
tags: added: verification-needed-wily
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Germán Poo-Caamaño (gpoo) wrote :

I have not been able to replicate the issue with Xenial and the proposed kernel (yet). Tagged it just in case.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (16.9 KiB)

This bug was fixed in the package linux - 4.4.0-23.41

---------------
linux (4.4.0-23.41) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1582431

  * zfs: disable module checks for zfs when cross-compiling (LP: #1581127)
    - [Packaging] disable zfs module checks when cross-compiling

  * Xenial update to v4.4.10 stable release (LP: #1580754)
    - Revert "UBUNTU: SAUCE: (no-up) ACPICA: Dispatcher: Update thread ID for
      recursive method calls"
    - Revert "UBUNTU: SAUCE: nbd: ratelimit error msgs after socket close"
    - Revert: "powerpc/tm: Check for already reclaimed tasks"
    - RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
    - ipvs: handle ip_vs_fill_iph_skb_off failure
    - ipvs: correct initial offset of Call-ID header search in SIP persistence
      engine
    - ipvs: drop first packet to redirect conntrack
    - mfd: intel-lpss: Remove clock tree on error path
    - nbd: ratelimit error msgs after socket close
    - ata: ahci_xgene: dereferencing uninitialized pointer in probe
    - mwifiex: fix corner case association failure
    - CNS3xxx: Fix PCI cns3xxx_write_config()
    - clk-divider: make sure read-only dividers do not write to their register
    - soc: rockchip: power-domain: fix err handle while probing
    - clk: rockchip: free memory in error cases when registering clock branches
    - clk: meson: Fix meson_clk_register_clks() signature type mismatch
    - clk: qcom: msm8960: fix ce3_core clk enable register
    - clk: versatile: sp810: support reentrance
    - clk: qcom: msm8960: Fix ce3_src register offset
    - lpfc: fix misleading indentation
    - ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p
      initialisation
    - mac80211: fix statistics leak if dev_alloc_name() fails
    - tracing: Don't display trigger file for events that can't be enabled
    - MD: make bio mergeable
    - Minimal fix-up of bad hashing behavior of hash_64()
    - mm, cma: prevent nr_isolated_* counters from going negative
    - mm/zswap: provide unique zpool name
    - ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
    - ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
    - xen: Fix page <-> pfn conversion on 32 bit systems
    - xen/balloon: Fix crash when ballooning on x86 32 bit PAE
    - xen/evtchn: fix ring resize when binding new events
    - HID: wacom: Add support for DTK-1651
    - HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
    - Input: zforce_ts - fix dual touch recognition
    - proc: prevent accessing /proc/<PID>/environ until it's ready
    - mm: update min_free_kbytes from khugepaged after core initialization
    - batman-adv: fix DAT candidate selection (must use vid)
    - batman-adv: Check skb size before using encapsulated ETH+VLAN header
    - batman-adv: Fix broadcast/ogm queue limit on a removed interface
    - batman-adv: Reduce refcnt of removed router when updating route
    - writeback: Fix performance regression in wb_over_bg_thresh()
    - MAINTAINERS: Remove asterisk from EFI directory names
    - x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
    - ARM: cpuidle: Pass on arm_cpuidle_s...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.19.0-61.69

---------------
linux (3.19.0-61.69) vivid; urgency=low

  [ Kamal Mostafa ]

  * CVE-2016-1583 (LP: #1588871)
    - ecryptfs: fix handling of directory opening
    - SAUCE: proc: prevent stacking filesystems on top
    - SAUCE: ecryptfs: forbid opening files without mmap handler
    - SAUCE: sched: panic on corrupted stack end

 -- Andy Whitcroft <email address hidden> Wed, 08 Jun 2016 22:25:58 +0100

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.2.0-38.45

---------------
linux (4.2.0-38.45) wily; urgency=low

  [ Kamal Mostafa ]

  * CVE-2016-1583 (LP: #1588871)
    - ecryptfs: fix handling of directory opening
    - SAUCE: proc: prevent stacking filesystems on top
    - SAUCE: ecryptfs: forbid opening files without mmap handler
    - SAUCE: sched: panic on corrupted stack end

 -- Andy Whitcroft <email address hidden> Wed, 08 Jun 2016 22:10:39 +0100

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-24.43

---------------
linux (4.4.0-24.43) xenial; urgency=low

  [ Kamal Mostafa ]

  * CVE-2016-1583 (LP: #1588871)
    - ecryptfs: fix handling of directory opening
    - SAUCE: proc: prevent stacking filesystems on top
    - SAUCE: ecryptfs: forbid opening files without mmap handler
    - SAUCE: sched: panic on corrupted stack end

  * arm64: statically link rtc-efi (LP: #1583738)
    - [Config] Link rtc-efi statically on arm64

 -- Kamal Mostafa <email address hidden> Fri, 03 Jun 2016 10:02:16 -0700

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers