linux-azure-edge: [Ubuntu-azure-edge-4.13.0-1005.5]: refresh the rescind-handling, hv_sock and vPCI drivers

Bug #1736283 reported by Dexuan Cui on 2017-12-05
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Undecided
Marcelo Cerri
Xenial
Undecided
Marcelo Cerri
linux-azure-edge (Ubuntu)
Undecided
Marcelo Cerri
Xenial
Undecided
Marcelo Cerri

Bug Description

Ubuntu-azure-edge-4.13.0-1005.5 (https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/tag/?h=azure-edge-next&id=Ubuntu-azure-edge-4.13.0-1005.5) has some bugs:

1) After "Disable and re-Enable the Integration Services devices (Time Sync, Hearbeat, Shutdown, etc)", the devices can't come back.

2) For a VM with SR-IOV VF configured, the PCI VF device can't come back after we Pause and Resume the VM.

3) When we assign 7 Mellanox ConnectX-3 VFs to a 32-vCPU VM, one of the VFs may fail to receive interrupts, and the Vf driver will time out and fail to create the 7th VF network interface.

My VM info:
# lsb_release -rd
Description: Ubuntu 16.04.3 LTS
Release: 16.04
(I'm using the Ubuntu-azure-edge-4.13.0-1005.5 kernel.)

Dexuan Cui (decui) wrote :

To resolve the above 3 issues, I created this pull request based on Ubuntu-azure-edge-4.13.0-1005.5 (https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/commit/?h=azure-edge-next&id=ec62f77bbe0697ce128f71fac4fc45c99b6f40d1).

The pull request is hosted in my own git repo:
https://github.com/dcui/linux/compare/ec62f77bbe0697ce128f71fac4fc45c99b6f40d1...dcui:decui/azure-edge-next-Ubuntu-azure-edge-4.13.0-1005.5-20171204

The pull request consists of the below changes:
1) I reverted the old version of hv_sock driver and cherry-picked the upstream version of the driver, and the related bug fixes in hv_sock and vmbus drivers;

2) To further fix the rescind-handling, I cherry-picked a patch from KY, and I made the patch "
UBUNTU: SAUCE: vmbus: unregister device_obj->channels_kset" (which hasn't been in any upstream repo as of today). With the 2 patches, issue #1 and #2 are fixed.

3) I cherry-pick "PCI: hv: Use effective affinity mask" to fix the issue #3.

Dexuan Cui (decui) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-azure (Ubuntu):
status: New → Confirmed

Thanks for the pull request. We'll work on getting a test kernel created and post it here when we have it.

As an aside, @jpoulson mentioned that this pull request should also resolve bug https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1735546

A test kernel for is available at the following. Please test and let us know your results.

http://kernel.ubuntu.com/~mhcerri/azure/linux-azure-4.13.0-1002.2~lp1736283/

Oddly enough, @decui in bug 1735546 noted in comment #6 of that bug that they are unable to reproduce this issue with the linux-azure-edge kernel. This is odd because at this point in time, both the 4.13 based linux-azure and 4.13 based linux-azure-edge should be identical. We have examined both repos and see no significant different. The source code is the same, as is the configs and module inclusion lists. We'll see if we can reproduce.

@jpoulson, were both linux-azure and linux-azure-edge tested in the same way?

Regardless, it would be good to have testing feedback on the test kernel posted in comment #5 here. Thanks.

Joshua R. Poulson (jrp) wrote :

Tested with similar suites, but at different times.

Marcelo Cerri (mhcerri) on 2017-12-13
Changed in linux-azure (Ubuntu):
status: Confirmed → Fix Committed
Changed in linux-azure (Ubuntu Xenial):
status: New → Fix Committed
Changed in linux-azure (Ubuntu):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Xenial):
assignee: nobody → Marcelo Cerri (mhcerri)
Marcelo Cerri (mhcerri) on 2017-12-13
tags: added: kernel-block-proposed
Marcelo Cerri (mhcerri) on 2018-01-07
Changed in linux-azure-edge (Ubuntu):
status: New → Fix Committed
Changed in linux-azure-edge (Ubuntu Xenial):
status: New → Fix Committed
Changed in linux-azure-edge (Ubuntu):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux-azure-edge (Ubuntu Xenial):
assignee: nobody → Marcelo Cerri (mhcerri)
Launchpad Janitor (janitor) wrote :
Download full text (107.7 KiB)

This bug was fixed in the package linux-azure - 4.13.0-1005.7

---------------
linux-azure (4.13.0-1005.7) xenial; urgency=low

  * linux-azure: 4.13.0-1005.7 -proposed tracker (LP: #1741957)

  * CVE-2017-5754
    - Revert "UBUNTU: [Config] azure: updateconfigs to enable PTI"
    - [Config] azure: Enable PTI with UNWINDER_FRAME_POINTER

  [ Ubuntu: 4.13.0-25.29 ]

  * linux: 4.13.0-25.29 -proposed tracker (LP: #1741955)
  * CVE-2017-5754
    - Revert "UBUNTU: [Config] updateconfigs to enable PTI"
    - [Config] Enable PTI with UNWINDER_FRAME_POINTER

linux-azure (4.13.0-1004.6) xenial; urgency=low

  * linux-azure: 4.13.0-1004.6 -proposed tracker (LP: #1741747)

  [ Ubuntu: 4.13.0-24.28 ]

  * linux: 4.13.0-24.28 -proposed tracker (LP: #1741745)
  * CVE-2017-5754
    - x86/cpu, x86/pti: Do not enable PTI on AMD processors

linux-azure (4.13.0-1003.5) xenial; urgency=low

  * linux-azure: 4.13.0-1003.5 -proposed tracker (LP: #1741557)

  * CVE-2017-5754
    - [Config] azure: updateconfigs to enable PTI

  [ Ubuntu: 4.13.0-23.27 ]

  * linux: 4.13.0-23.27 -proposed tracker (LP: #1741556)
  * CVE-2017-5754
    - x86/mm: Add the 'nopcid' boot option to turn off PCID
    - x86/mm: Enable CR4.PCIDE on supported systems
    - x86/mm: Document how CR4.PCIDE restore works
    - x86/entry/64: Refactor IRQ stacks and make them NMI-safe
    - x86/entry/64: Initialize the top of the IRQ stack before switching stacks
    - x86/entry/64: Add unwind hint annotations
    - xen/x86: Remove SME feature in PV guests
    - x86/xen/64: Rearrange the SYSCALL entries
    - irq: Make the irqentry text section unconditional
    - x86/xen/64: Fix the reported SS and CS in SYSCALL
    - x86/paravirt/xen: Remove xen_patch()
    - x86/traps: Simplify pagefault tracing logic
    - x86/idt: Unify gate_struct handling for 32/64-bit kernels
    - x86/asm: Replace access to desc_struct:a/b fields
    - x86/xen: Get rid of paravirt op adjust_exception_frame
    - x86/paravirt: Remove no longer used paravirt functions
    - x86/entry: Fix idtentry unwind hint
    - x86/mm/64: Initialize CR4.PCIDE early
    - objtool: Add ORC unwind table generation
    - objtool, x86: Add facility for asm code to provide unwind hints
    - x86/unwind: Add the ORC unwinder
    - x86/kconfig: Consolidate unwinders into multiple choice selection
    - objtool: Upgrade libelf-devel warning to error for CONFIG_ORC_UNWINDER
    - x86/ldt/64: Refresh DS and ES when modify_ldt changes an entry
    - x86/mm: Give each mm TLB flush generation a unique ID
    - x86/mm: Track the TLB's tlb_gen and update the flushing algorithm
    - x86/mm: Rework lazy TLB mode and TLB freshness tracking
    - x86/mm: Implement PCID based optimization: try to preserve old TLB entries
      using PCID
    - x86/mm: Factor out CR3-building code
    - x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
    - x86/mm: Flush more aggressively in lazy TLB mode
    - Revert "x86/mm: Stop calling leave_mm() in idle code"
    - kprobes/x86: Set up frame pointer in kprobe trampoline
    - x86/tracing: Introduce a static key for exception tracing
    - x86/boot: Add early cmdline parsing for options with arguments
    - mm, x86/mm...

Changed in linux-azure (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers