fix regression in mm/hotplug, allows NVIDIA driver to work

Bug #1761104 reported by Colin Ian King on 2018-04-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Artful
High
Colin Ian King

Bug Description

== SRU Justification, ARTFUL ==

Bug fix #1747069 causes an issue for NVIDIA drivers on ppc64el platforms. According to Will Davis at NVIDIA:

"- The original patch 3d79a728f9b2e6ddcce4e02c91c4de1076548a4c changed the call to arch_add_memory in mm/memory_hotplug.c to call with the boolean argument set to true instead of false, and inverted the semantics of that argument in the arch layers.

- The revert patch 4fe85d5a7c50f003fe4863a1a87f5d8cc121c75c reverted the semantic change in the arch layers, but didn't revert the change to the arch_add_memory call in mm/memory_hotplug.c"

And also:

"It looks like the problem here is that the online_type is _MOVABLE but
can_online_high_movable(nid=255) is returning false:

        if ((zone_idx(zone) > ZONE_NORMAL ||
            online_type == MMOP_ONLINE_MOVABLE) &&
            !can_online_high_movable(pfn_to_nid(pfn)))

This check was removed by upstream commit
57c0a17238e22395428248c53f8e390c051c88b8, and I've verified that if I apply
that commit (partially) to the 4.13.0-37.42 tree along with the previous
arch_add_memory patch to make the probe work, I can fully online the GPU device
memory as expected.

Commit 57c0a172.. implies that the can_online_high_movable() checks weren't
useful anyway, so in addition to the arch_add_memory fix, does it make sense to
revert the pieces of 4fe85d5a7c50f003fe4863a1a87f5d8cc121c75c that added back
the can_online_high_movable() check?"

== Fix ==

Fix partial backport from bug #1747069, remove can_online_high_movable and fix the incorrectly set boolean argument to arch_add_memory().

== Testing ==

run ADT memory hotplug test, should not regress this. Without the fix, the nvidia driver on powerpc will not load because it cannot map memory for the device. With the fix it loads.

== Regression Potential ==

This fixes a regression in the original fix and hence the regression potential is the same as the previously SRU'd bug fix for #1747069, namely:

"Reverting this commit does remove some functionality, however this does not regress the kernel compared to previous releases and having a working reliable memory hotplug is the preferred option. This fix does touch some memory hotplug, so there is a risk that this may break this functionality that is not covered by the kernel regression testing."

Changed in linux (Ubuntu):
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
status: New → In Progress
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
status: Fix Committed → In Progress
Stefan Bader (smb) on 2018-04-17
Changed in linux (Ubuntu Artful):
importance: Undecided → High
status: New → In Progress
description: updated
Stefan Bader (smb) on 2018-04-19
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Stefan Bader (smb) on 2018-04-20
Changed in linux (Ubuntu):
status: In Progress → Invalid
Changed in linux (Ubuntu Artful):
assignee: nobody → Colin Ian King (colin-king)
Changed in linux (Ubuntu):
assignee: Colin Ian King (colin-king) → nobody
Breno Leitão (breno-leitao) wrote :

I understand there is not a released kernel with this fix yet, right?

Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Launchpad Janitor (janitor) wrote :
Download full text (5.9 KiB)

This bug was fixed in the package linux - 4.13.0-43.48

---------------
linux (4.13.0-43.48) artful; urgency=medium

  * CVE-2018-3639 (powerpc)
    - SAUCE: rfi-flush: update H_CPU_* macro names to upstream
    - SAUCE: rfi-flush: update plpar_get_cpu_characteristics() signature to
      upstream
    - SAUCE: update pseries_setup_rfi_flush() capitalization to upstream
    - powerpc/pseries: Support firmware disable of RFI flush
    - powerpc/powernv: Support firmware disable of RFI flush
    - powerpc/64s: Allow control of RFI flush via debugfs
    - powerpc/rfi-flush: Move the logic to avoid a redo into the debugfs code
    - powerpc/rfi-flush: Always enable fallback flush on pseries
    - powerpc/rfi-flush: Differentiate enabled and patched flush types
    - powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flags
    - powerpc: Add security feature flags for Spectre/Meltdown
    - powerpc/powernv: Set or clear security feature flags
    - powerpc/pseries: Set or clear security feature flags
    - powerpc/powernv: Use the security flags in pnv_setup_rfi_flush()
    - powerpc/pseries: Use the security flags in pseries_setup_rfi_flush()
    - powerpc/pseries: Fix clearing of security feature flags
    - powerpc: Move default security feature flags
    - powerpc/pseries: Restore default security feature flags on setup
    - powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit

  * CVE-2018-3639 (x86)
    - SAUCE: Add X86_FEATURE_ARCH_CAPABILITIES
    - SAUCE: x86: Add alternative_msr_write
    - x86/nospec: Simplify alternative_msr_write()
    - x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
    - x86/bugs: Concentrate bug detection into a separate function
    - x86/bugs: Concentrate bug reporting into a separate function
    - x86/msr: Add definitions for new speculation control MSRs
    - x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits
    - x86/bugs, KVM: Support the combination of guest and host IBRS
    - x86/bugs: Expose /sys/../spec_store_bypass
    - x86/cpufeatures: Add X86_FEATURE_RDS
    - x86/bugs: Provide boot parameters for the spec_store_bypass_disable
      mitigation
    - x86/bugs/intel: Set proper CPU features and setup RDS
    - x86/bugs: Whitelist allowed SPEC_CTRL MSR values
    - x86/bugs/AMD: Add support to disable RDS on Fam[15,16,17]h if requested
    - x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest
    - x86/speculation: Create spec-ctrl.h to avoid include hell
    - prctl: Add speculation control prctls
    - x86/process: Allow runtime control of Speculative Store Bypass
    - x86/speculation: Add prctl for Speculative Store Bypass mitigation
    - nospec: Allow getting/setting on non-current task
    - proc: Provide details on speculation flaw mitigations
    - seccomp: Enable speculation flaw mitigations
    - SAUCE: x86/bugs: Honour SPEC_CTRL default
    - x86/bugs: Make boot modes __ro_after_init
    - prctl: Add force disable speculation
    - seccomp: Use PR_SPEC_FORCE_DISABLE
    - seccomp: Add filter flag to opt-out of SSB mitigation
    - seccomp: Move speculation migitation control to arch code
    - x86/speculation: Make "seccomp" the...

Read more...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers