hisilicon hibmc regression due to ea642c3216cb ("drm/ttm: add io_mem_pfn callback")

Bug #1738334 reported by Daniel Axtens
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
New
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Daniel Axtens
Artful
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Daniel Axtens

Bug Description

[SRU Justification]

[Impact]
On Artful and Bionic kernels, X fails to start and a kernel splat is printed.

This is cbecause ea642c3216cb ("drm/ttm: add io_mem_pfn callback") is incomplete: the hisilicon hibmc driver does not contain the callback and so the kernel tries to execute code at NULL.

[Fix]

Bionic: There is a generic fix in 4.16 at c67fa6edc8b11afe22c88a23963170bf5f151acf. It is part of a series that applies this generic fix and does a bunch of cleanups; we can safely just pick up the generic fix.

Artful: Rather than a generic fix, I have submitted a very very minimal fix that only touches hibmc.

[Regression Potential]
Artful: Minimal - fix only touches hibmc driver. Tested on D05 board.
Bionic: fix is to generic drm code, but is small and easily reviewable.

[Testcase]
Install patched kernel, try to start X. If it succeeds, the fix works. If there's a kernel splat, the fix does not work.

[Notes]
Artful: HiSilicon would really like this fix in Artful in such time so that when the next 16.04 point release ships, the HWE kernel will work with Xorg.

Bionic: no extra notes.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1738334

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel Axtens (daxtens) wrote :

Confirmed - the symptom is a kernel splat about "Attempting to execute userspace memory" triggered by Xorg with LR in ttm_bo_vm_fault - see attached screenshot (sorry!)

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

This patch need to merged and fixed for the release 16.04.04. The final SRU4 windows is 8 Dec. to 6 Jan 2018.

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

Notes and some more detail for this bug/patch:

The bug is from/caused by 4.12 DRM architecture, the main line commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.15-rc3&id=ea642c3216cb2a60d1c0e760ae47ee85c9c16447

It will cause x-window start hanged, the detail bug are https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1698700

The fix path link are here: https://lists.freedesktop.org/archives/dri-devel/2017-November/159002.html

This patch only change Huawei driver, it will not affect any other vendors.

Daniel Axtens (daxtens)
description: updated
Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

This patch have been accepted to be merged in in kernel V4.16. Please refer to: https://lkml.org/lkml/2017/12/25/24

Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

whether this patchset will merge into ubuntu 16.04.4 version or not? this patchset will important for our DO5 board, Can you check it please?

thank you

Revision history for this message
Daniel Axtens (daxtens) wrote :

I have talked to the kernel team about this and updated Fred off-line.

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote : Re: [Bug 1738334] Re: hisilicon hibmc regression due to ea642c3216cb ("drm/ttm: add io_mem_pfn callback")

It is good to update to "Fred" (Xin Wei) off-line and thank you very much
for it.

On Tue, Jan 23, 2018 at 8:46 AM, Daniel Axtens <email address hidden>
wrote:

> I have talked to the kernel team about this and updated Fred off-line.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1738334
>
> Title:
> hisilicon hibmc regression due to ea642c3216cb ("drm/ttm: add
> io_mem_pfn callback")
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> [SRU Justification]
>
> [Impact]
> On Artful kernels, X fails to start and a kernel splat is printed.
>
> This is cbecause ea642c3216cb ("drm/ttm: add io_mem_pfn callback") is
> incomplete: the hisilicon hibmc driver does not contain the callback
> and so the kernel tries to execute code at NULL.
>
> [Fix]
> There is a discussion and potential fix at
> https://lists.freedesktop.org/archives/dri-devel/2017-November/159002.html
> The fix hasn't landed yet and it looks like they're going to re-engineer
> the entire section instead.
>
> Rather than wait for that and deal with the massive regression
> potential, the fix I have picked to submit is very very minimal and
> touches only hibmc.
>
> [Regression Potential]
> Minimal - fix only touches hibmc driver. Tested on D05 board.
>
> [Testcase]
> Install patched kernel, try to start X. If it succeeds, the fix works.
> If there's a kernel splat, the fix does not work.
>
> [Notes]
> HiSilicon would really like this fix in Artful in such time so that when
> the next 16.04 point release ships in February, the HWE kernel will work
> with Xorg.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
> 1738334/+subscriptions
>

--
Zhanglei Mao
Solutions Architect, Sales and Business Development
Canonical Group Ltd.
<email address hidden>
+86-13625010929 (m)
+852-6700 6026 (m)
www.ubuntu.com
www.canonical.com

Changed in linux (Ubuntu Artful):
status: New → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Revision history for this message
Daniel Axtens (daxtens) wrote :

Hi,

I installed 4.13.0-35-generic from artful-proposed. The kernel boots and X starts fine, so this has passed verification.

Regards,
Daniel

tags: added: verification-done-artful
removed: verification-needed-artful
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.1 KiB)

This bug was fixed in the package linux - 4.13.0-36.40

---------------
linux (4.13.0-36.40) artful; urgency=medium

  * linux: 4.13.0-36.40 -proposed tracker (LP: #1750010)

  * Rebuild without "CVE-2017-5754 ARM64 KPTI fixes" patch set

linux (4.13.0-35.39) artful; urgency=medium

  * linux: 4.13.0-35.39 -proposed tracker (LP: #1748743)

  * CVE-2017-5715 (Spectre v2 Intel)
    - Revert "UBUNTU: SAUCE: turn off IBPB when full retpoline is present"
    - SAUCE: turn off IBRS when full retpoline is present
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

linux (4.13.0-34.37) artful; urgency=medium

  * linux: 4.13.0-34.37 -proposed tracker (LP: #1748475)

  * libata: apply MAX_SEC_1024 to all LITEON EP1 series devices (LP: #1743053)
    - libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

  * KVM patches for s390x to provide facility bits 81 (ppa15) and 82 (bpb)
    (LP: #1747090)
    - KVM: s390: wire up bpb feature

  * artful 4.13 i386 kernels crash after memory hotplug remove (LP: #1747069)
    - Revert "mm, memory_hotplug: do not associate hotadded memory to zones until
      online"

  * CVE-2017-5715 (Spectre v2 Intel)
    - x86/feature: Enable the x86 feature to control Speculation
    - x86/feature: Report presence of IBPB and IBRS control
    - x86/enter: MACROS to set/clear IBRS and set IBPB
    - x86/enter: Use IBRS on syscall and interrupts
    - x86/idle: Disable IBRS entering idle and enable it on wakeup
    - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup
    - x86/mm: Set IBPB upon context switch
    - x86/mm: Only set IBPB when the new thread cannot ptrace current thread
    - x86/entry: Stuff RSB for entry to kernel for non-SMEP platform
    - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm
    - x86/kvm: Set IBPB when switching VM
    - x86/kvm: Toggle IBRS on VM entry and exit
    - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature
    - x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control
    - x86/cpu/AMD: Add speculative control support for AMD
    - x86/microcode: Extend post microcode reload to support IBPB feature
    - KVM: SVM: Do not intercept new speculative control MSRs
    - x86/svm: Set IBRS value on VM entry and exit
    - x86/svm: Set IBPB when running a different VCPU
    - KVM: x86: Add speculative control CPUID support for guests
    - SAUCE: turn off IBPB when full retpoline is present

  * Artful 4.13 fixes for tun (LP: #1748846)
    - tun: call dev_get_valid_name() before register_netdevice()
    - tun: allow positive return values on dev_get_valid_name() call
    - tun/tap: sanitize TUNSETSNDBUF input

  * boot failure on AMD Raven + WestonXT (LP: #1742759)
    - SAUCE: drm/amdgpu: add atpx quirk handling (v2)

linux (4.13.0-33.36) artful; urgency=low

  * linux: 4.13.0-33.36 -proposed tracker (LP: #1746903)

  [ Stefan Bader ]
  * starting VMs causing retpoline4 to reboot (LP: #1747507) // CVE-2017-5715
    (Spectre v2 retpoline)
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
    - x86/retpol...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

hi daniel

this bug also affect this ubuntu 18.04.0 version which using bionic kernel branch(kernel 4.15), because this patchset have merged into mainline linus kernel 4.16rc1 (after kernel 4.15).

I want to add this bionic tag, but i fail it. Can you add this bionic tag to merge this patchset into bionic branch.

thank you

Revision history for this message
Daniel Axtens (daxtens) wrote :

Hi Fred,

Thanks for the update. I have tried to nominate the bug for Bionic; I think the kernel team normally does this so we will see if that has worked.

More importantly, I will test and send a patch for Bionic shortly.

Regards,
Daniel

Daniel Axtens (daxtens)
description: updated
Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

Hawei team just tested and verified this bug on 4.15.0-10-generic kernel on D05 server.

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

Call traces for this bug on D05 for kernel 4.15.0-10-generic which tested on 1 March 2018 on 18.04 daily build

Seth Forshee (sforshee)
Changed in linux (Ubuntu Bionic):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (40.0 KiB)

This bug was fixed in the package linux - 4.15.0-12.13

---------------
linux (4.15.0-12.13) bionic; urgency=medium

  * linux: 4.15.0-12.13 -proposed tracker (LP: #1754059)

  * CONFIG_EFI=y on armhf (LP: #1726362)
    - [Config] CONFIG_EFI=y on armhf, reconcile secureboot EFI settings

  * ppc64el: Support firmware disable of RFI flush (LP: #1751994)
    - powerpc/pseries: Support firmware disable of RFI flush
    - powerpc/powernv: Support firmware disable of RFI flush

  * [Feature] CFL/CNL (PCH:CNP-H): New GPIO Commit added (GPIO Driver needed)
    (LP: #1751714)
    - gpio / ACPI: Drop unnecessary ACPI GPIO to Linux GPIO translation
    - pinctrl: intel: Allow custom GPIO base for pad groups
    - pinctrl: cannonlake: Align GPIO number space with Windows

  * [Feature] Add xHCI debug device support in the driver (LP: #1730832)
    - usb: xhci: Make some static functions global
    - usb: xhci: Add DbC support in xHCI driver
    - [Config] USB_XHCI_DBGCAP=y for commit mainline dfba2174dc42.

  * [SRU] Lenovo E41 Mic mute hotkey is not responding (LP: #1753347)
    - platform/x86: ideapad-laptop: Increase timeout to wait for EC answer

  * headset mic can't be detected on two Dell machines (LP: #1748807)
    - ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines

  * hisi_sas: Add disk LED support (LP: #1752695)
    - scsi: hisi_sas: directly attached disk LED feature for v2 hw

  * [Feature] [Graphics]Whiskey Lake (Coffelake-U 4+2) new PCI Device ID adds
    (LP: #1742561)
    - drm/i915/cfl: Adding more Coffee Lake PCI IDs.

  * [Bug] [USB Function][CFL-CNL PCH]Stall Error and USB Transaction Error in
    trace, Disable of device-initiated U1/U2 failed and rebind failed: -517
    during suspend/resume with usb storage. (LP: #1730599)
    - usb: Don't print a warning if interface driver rebind is deferred at resume

  * retpoline: ignore %cs:0xNNN constant indirections (LP: #1752655)
    - [Packaging] retpoline -- elide %cs:0xNNNN constants on i386
    - [Config] retpoline -- clean up i386 retpoline files

  * hisilicon hibmc regression due to ea642c3216cb ("drm/ttm: add io_mem_pfn
    callback") (LP: #1738334)
    - drm/ttm: add ttm_bo_io_mem_pfn to check io_mem_pfn

  * [Asus UX360UA] battery status in unity-panel is not changing when battery is
    being charged (LP: #1661876) // AC adapter status not detected on Asus
    ZenBook UX410UAK (LP: #1745032)
    - ACPI / battery: Add quirk for Asus UX360UA and UX410UAK

  * ASUS UX305LA - Battery state not detected correctly (LP: #1482390)
    - ACPI / battery: Add quirk for Asus GL502VSK and UX305LA

  * [18.04 FEAT] Automatically detect layer2 setting in the qeth device driver
    (LP: #1747639)
    - s390/diag: add diag26c support for VNIC info
    - s390/qeth: support early setup for z/VM NICs

  * Bionic update to v4.15.7 stable release (LP: #1752317)
    - netfilter: drop outermost socket lock in getsockopt()
    - arm64: mm: don't write garbage into TTBR1_EL1 register
    - kconfig.h: Include compiler types to avoid missed struct attributes
    - MIPS: boot: Define __ASSEMBLY__ for its.S build
    - xtensa: fix high memory/reserved memory collision
    - scsi: ibmvfc: fix misde...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.