Kdump through NMI SMP and single core not working on Ubuntu16.10

Bug #1630924 reported by Ovidiu Rusu
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Xenial
Fix Released
Medium
Joseph Salisbury
Yakkety
Fix Released
Medium
Joseph Salisbury
Zesty
Fix Released
Medium
Joseph Salisbury

Bug Description

During some tests I've encountered an issue with kdump through NMI SMP and single core. After kdump configuration, when I trigger the crash through an NMI call from the host, the VM will panic, however it will not write the vmcore dump files and neither reboot.

REPRO STEPS:

1. configure kdump
 - crashkernel=512M-:384M /boot/grub/grub.cfg
 - USE_KDUMP=1 /etc/default/kdump-tools

2. after configuration reboot the VM

3. check kdump status
 - cat /sys/kernel/kexec_*
 - service kdump-tools status

4. configure NMI
 - sysctl -w kernel.unknown_nmi_panic=1

5. trigger a crash from host
 - Debug-VM -Name $vmName -InjectNonMaskableInterrupt -ComputerName $hvServer -Force

This case also applies to:
    -Ubuntu 16.10 generation 2(kernel version: 4.8.0-17-generic)
    -Ubuntu 16.04.1(kernel: 4.4.0-38-generic)
    -Ubuntu 14.04.5(kernel: 3.19.0-69-generic)

CVE References

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1630924

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Chris Valean (cvalean)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key kernel-hyper-v
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.9 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc8

Revision history for this message
Chris Valean (cvalean) wrote :

Hi Joe,

I did verify this again with the below kernels and results:
- 4.8.0-22 16.10 kernel - kdump through NMI on 4 CPU cores doesn't work - VM gets stuck and there is no crash file nor a reboot.
- 4.9.0-040900RC8 kernel from ubuntu mainline - kdump through NMI works fine when using single core or SMP *with 4 cores*.
Using 8 cores on this kernel, triggering the crash will result in the same bad behavior as from the GA kernel. I don't see any crash messages on the console, and the VM will just hang. Will have to do more debugging on this new pattern.

All tests have been done with crashkernel=512M and 2GB RAM for the VM.

4.9 also just got released, as of 11/12.

Revision history for this message
Chris Valean (cvalean) wrote :

I did re-verify this now with the 4.9 mainline and kdump through NMI works fine.
There might have been additional patches and there is at least one other commit in upstream, however that one is addressing only x86 arch.

Full version info: v4.9 mainline build
v4.9 (69973b830859bc6529a7a0468ba0d80ee5117826)

Changing tag flag kernel-fixed-upstream

tags: added: kernel-fixed-upstream
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Do we happen to know what commits(Other than 69973b8308) are in 4.9 and newer that are also needed in 4.8?

Do you want me to build a Yakkety test kernel with commit 69973b8308?

Revision history for this message
Chris Valean (cvalean) wrote :

Please build a test kernel with these 2 commits:
59107e2f48831daedc46973ce4988605ab066de3
56ef6718a1d8d77745033c5291e025ce18504159

Thank you!

Changed in linux (Ubuntu Vivid):
status: New → Confirmed
Changed in linux (Ubuntu Xenial):
status: New → Confirmed
Changed in linux (Ubuntu Vivid):
importance: Undecided → Medium
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I see commit 59107e2f4 in mainline:
59107e2f4883 x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic

However, I don't see 56ef6718a as of yet. Do you have a patch available?

Revision history for this message
Chris Valean (cvalean) wrote :

Haven't checked mainline, but it is in upstream. Please find it here - http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=56ef6718a

no longer affects: linux (Ubuntu Vivid)
Changed in linux (Ubuntu Vivid):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: Confirmed → In Progress
Changed in linux (Ubuntu Yakkety):
status: Confirmed → In Progress
Changed in linux (Ubuntu Zesty):
status: Confirmed → In Progress
Changed in linux (Ubuntu Vivid):
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built test kernels with these two commits. The test kernels can be downloaded from:

Xenial: http://kernel.ubuntu.com/~jsalisbury/lp1630924/xenial/
Yakkey: http://kernel.ubuntu.com/~jsalisbury/lp1630924/yakkety
Zesty: http://kernel.ubuntu.com/~jsalisbury/lp1630924/zesty

The Vivid test kernel for 14.04.5(kernel: 3.19.0-69-generic) needs some backporting, so I'll post that shortly.

Revision history for this message
Chris Valean (cvalean) wrote :

Hi Joe,
The test kernels with the commits are looking fine, I was able to successfully trigger and complete a kdump over NMI.
Tested Xenial and Yakkety on WS2016 and WS2012R2.

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-yakkety
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Joshua R. Poulson (jrp) wrote :

We will verify this fix is in.

Revision history for this message
Chris Valean (cvalean) wrote :

Verified the following proposed kernels and confirm this as fixed in:
On Gen1VM: 16.04.1 with 4.4.0-63.67
On both Gen1 and Gen2VM: 16.10 with 4.8.0-38.49

Thank you!

tags: added: verification-done-xenial verification-done-yakkety
removed: verification-needed-xenial verification-needed-yakkety
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (23.0 KiB)

This bug was fixed in the package linux - 4.4.0-63.84

---------------
linux (4.4.0-63.84) xenial; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1660704

  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * Kdump through NMI SMP and single core not working on Ubuntu16.10
    (LP: #1630924)
    - x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    - SAUCE: hv: don't reset hv_context.tsc_page on crash

  * [regression 4.8.0-14 -> 4.8.0-17] keyboard and touchscreen lost on Acer
    Chromebook R11 (LP: #1630238)
    - [Config] CONFIG_PINCTRL_CHERRYVIEW=y

  * Call trace when testing fstat stressor on ppc64el with virtual keyboard and
    mouse present (LP: #1652132)
    - SAUCE: HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL

  * VLAN SR-IOV regression for IXGBE driver (LP: #1658491)
    - ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths

  * "Out of memory" errors after upgrade to 4.4.0-59 (LP: #1655842)
    - mm, page_alloc: convert alloc_flags to unsigned
    - mm, compaction: change COMPACT_ constants into enum
    - mm, compaction: distinguish COMPACT_DEFERRED from COMPACT_SKIPPED
    - mm, compaction: simplify __alloc_pages_direct_compact feedback interface
    - mm, compaction: distinguish between full and partial COMPACT_COMPLETE
    - mm, compaction: abstract compaction feedback to helpers
    - mm, oom: protect !costly allocations some more
    - mm: consider compaction feedback also for costly allocation
    - mm, oom, compaction: prevent from should_compact_retry looping for ever for
      costly orders
    - mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION
    - mm, oom: prevent premature OOM killer invocation for high order request

  * Backport 3 patches to fix bugs with AIX clients using IBMVSCSI Target Driver
    (LP: #1657194)
    - SAUCE: ibmvscsis: Fix max transfer length
    - SAUCE: ibmvscsis: fix sleeping in interrupt context
    - SAUCE: ibmvscsis: Fix srp_transfer_data fail return code

  * NVMe: adapter is missing after abnormal shutdown followed by quick reboot,
    quirk needed (LP: #1656913)
    - nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

  * Ubuntu 16.10 KVM SRIOV: if enable sriov while ping flood is running ping
    will stop working (LP: #1625318)
    - PCI: Do any VF BAR updates before enabling the BARs
    - PCI: Ignore BAR updates on virtual functions
    - PCI: Update BARs using property bits appropriate for type
    - PCI: Separate VF BAR updates from standard BAR updates
    - PCI: Don't update VF BARs while VF memory space is enabled
    - PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
    - PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
    - PCI: Add comments about ROM BAR updating

  * Linux rtc self test fails in a VM under xenial (LP: #1649718)
    - kvm: x86: Convert ioapic->rtc_status.dest_map to a struct
    - kvm: x86: Track irq vectors in ioapic->rtc_status.dest_map
    - kvm: x86: Check dest_map->vector to match eoi signals for rtc

  * Xenial update to v4.4.44 stable releas...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.4 KiB)

This bug was fixed in the package linux - 4.8.0-38.41

---------------
linux (4.8.0-38.41) yakkety; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1661232

  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * Kdump through NMI SMP and single core not working on Ubuntu16.10
    (LP: #1630924)
    - x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    - SAUCE: hv: don't reset hv_context.tsc_page on crash

  * Call trace when testing fstat stressor on ppc64el with virtual keyboard and
    mouse present (LP: #1652132)
    - HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL

  * regression in linux-libc-dev in yakkety: C++ style comments are not allowed
    in ISO C90 (LP: #1659654)
    - generic syscalls: kill cruft from removed pkey syscalls

  * [16.04.2] POWER9 patches on top of 4.8 (LP: #1650263)
    - powerpc/book3s: Add a cpu table entry for different POWER9 revs
    - powerpc/mm/radix: Use different RTS encoding for different POWER9 revs
    - powerpc/mm/radix: Use different pte update sequence for different POWER9
      revs
    - powerpc/mm: Update the HID bit when switching from radix to hash
    - powerpc/64/kexec: NULL check "clear_all" in kexec_sequence
    - powerpc/64/kexec: Fix MMU cleanup on radix
    - powerpc/mm: Add radix flush all with IS=3
    - powerpc/64/kexec: Copy image with MMU off when possible
    - powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format
    - powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1
    - powerpc/mm: Fix missing update of HID register on secondary CPUs
    - powerpc/64: Add some more SPRs and SPR bits for POWER9
    - powerpc/64: Provide functions for accessing POWER9 partition table
    - powerpc/powernv: Define real-mode versions of OPAL XICS accessors
    - powerpc/64: Define new ISA v3.00 logical PVR value and PCR register value
    - mm: update mmu_gather range correctly
    - mm/hugetlb: add tlb_remove_hugetlb_entry for handling hugetlb pages
    - mm: add tlb_remove_check_page_size_change to track page size change
    - powerpc: Revert Load Monitor Register Support
    - powerpc/mm: Correct process and partition table max size
    - powernv: Clear SPRN_PSSCR when a POWER9 CPU comes online
    - powerpc/mm/radix: Setup AMOR in HV mode to allow key 0
    - powerpc/mm: Detect instruction fetch denied and report
    - powerpc/mm/radix: Prevent kernel execution of user space
    - powerpc/mm: Rename hugetlb-radix.h to hugetlb.h
    - powerpc/mm/hugetlb: Handle hugepage size supported by hash config
    - powerpc/mm: Introduce _PAGE_LARGE software pte bits
    - powerpc/mm: Add radix__tlb_flush_pte_p9_dd1()
    - powerpc/mm: update radix__ptep_set_access_flag to not do full mm tlb flush
    - powerpc/mm: update radix__pte_update to not do full mm tlb flush
    - powerpc/mm: Batch tlb flush when invalidating pte entries
    - powerpc/sparse: Make a bunch of things static
    - powerpc/perf: factor out the event format field
    - powerpc/perf: update attribute_group data structure
    - powerpc/perf: power9 raw event format en...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-8.10

---------------
linux (4.10.0-8.10) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1664217

  * [Hyper-V] Bug fixes for storvsc (tagged queuing, error conditions)
    (LP: #1663687)
    - scsi: storvsc: Enable tracking of queue depth
    - scsi: storvsc: Remove the restriction on max segment size
    - scsi: storvsc: Enable multi-queue support
    - scsi: storvsc: use tagged SRB requests if supported by the device
    - scsi: storvsc: properly handle SRB_ERROR when sense message is present
    - scsi: storvsc: properly set residual data length on errors

  * Ubuntu16.10-KVM:Big configuration with multiple guests running SRIOV VFs
    caused KVM host hung and all KVM guests down. (LP: #1651248)
    - KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECT
    - KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter
    - KVM: PPC: Book 3S: XICS: Fix potential issue with duplicate IRQ resends
    - KVM: PPC: Book 3S: XICS: Implement ICS P/Q states
    - KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend

  * overlay: mkdir fails if directory exists in lowerdir in a user namespace
    (LP: #1531747)
    - SAUCE: overlayfs: Skip permission checking for trusted.overlayfs.* xattrs

  * CVE-2016-1575 (LP: #1534961)
    - SAUCE: overlayfs: Skip permission checking for trusted.overlayfs.* xattrs

  * CVE-2016-1576 (LP: #1535150)
    - SAUCE: overlayfs: Skip permission checking for trusted.overlayfs.* xattrs

  * Miscellaneous Ubuntu changes
    - SAUCE: md/raid6 algorithms: scale test duration for speedier boots
    - SAUCE: Import aufs driver
    - d-i: Build message-modules udeb for arm64
    - rebase to v4.10-rc8

  * Miscellaneous upstream changes
    - Revert "UBUNTU: SAUCE: aufs -- remove .readlink assignment"
    - Revert "UBUNTU: SAUCE: (no-up) aufs: for v4.9-rc1, support setattr_prepare()"
    - Revert "UBUNTU: SAUCE: aufs -- Add flags argument to aufs_rename()"
    - Revert "UBUNTU: SAUCE: aufs -- Convert to use xattr handlers"
    - Revert "UBUNTU: SAUCE: Import aufs driver"

  [ Upstream Kernel Changes ]

  * rebase to v4.10-rc8

 -- Tim Gardner <email address hidden> Mon, 06 Feb 2017 08:34:24 -0700

Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Released
Revision history for this message
Chris Valean (cvalean) wrote :

Can we get Vivid out of the list please?
I see that it is now EOL.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Chris,

I removed Vivid from the bug.

no longer affects: linux (Ubuntu Vivid)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.