Upstream commits for NUMA node performance in Hyper-V

Bug #1494423 reported by Joshua R. Poulson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Trusty
Won't Fix
High
Joseph Salisbury
Vivid
Fix Released
High
Joseph Salisbury
linux-lts-trusty (Ubuntu)
Won't Fix
High
Joseph Salisbury
Precise
Won't Fix
High
Joseph Salisbury

Bug Description

Microsoft has investigated networking with NUMA nodes and Hyper-V and found a number of areas where Linux performance could be corrected to be in line with other virtual machine types.

Please include the following commits and their necessary prerequisites (which have been accepted upstream into the 4.2 kernel series) into the kernels for 15.10, 15.04, 14.04 (including HWE), and 12.04 (including HWE):

8/5 "Drivers: hv: vmbus: Further improve CPU affiliation logic"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3b71107d73b16074afa7658f3f0fcf837aabfe24

8/5 "Drivers: hv: vmbus: Improve the CPU affiliation for channels"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9f01ec53458d9e9b68f1c555e773b5d1a1f66e94

6/12 "Drivers: hv: vmbus: Allocate ring buffer memory in NUMA aware fashion"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=294409d20572e9bcf857328286433f851168d54a

6/1 "Drivers: hv: vmbus: Implement NUMA aware CPU affinity for channels"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1f656ff3fdddc2f59649cc84b633b799908f1f7b

5/31 "hv_netvsc: Allocate the receive buffer from the correct NUMA node"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0a726c2b499e390b1c1fc3092bd789f2192a2d03

5/31 "hv_netvsc: Allocate the sendbuf in a NUMA aware way"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5defde5946676ee23cd6a9d0e1de899410f4a33f

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1494423

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joshua R. Poulson (jrp) wrote :

No log files required, not a crash report, but rather a request to pick up upstream corrections.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Vivid):
status: New → Triaged
Changed in linux (Ubuntu Wily):
status: Confirmed → Triaged
Changed in linux (Ubuntu Trusty):
status: New → Triaged
Changed in linux (Ubuntu Precise):
status: New → Triaged
importance: Undecided → Medium
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
Changed in linux (Ubuntu Vivid):
importance: Undecided → Medium
Changed in linux (Ubuntu Wily):
importance: Undecided → Medium
tags: added: kernel-hyper-v
tags: added: precise trusty vivid
tags: added: kernel-da-key
Revision history for this message
Joshua R. Poulson (jrp) wrote :

In searching for appropriate prerequisites, I have determined the following are needed:

5/31 "hv_netvsc: Properly size the vrss queues" https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e01ec2199ef22e2cabd7d6e68a192f3eb728029f

5/13 "hv_netvsc: Use the xmit_more skb flag to optimize signaling the host" https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=82fa3c776e5abba7ed6e4b4f4983d14731c37d6a

4/14 "hv_netvsc: Implement partial copy into send buffer" https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=aa0a34be68290aa9aa071c0691fb8b6edda38358

3/29 "hv_netvsc: Implement batching in send buffer" https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7c3877f275ee6b479fa828947811c76d431501ca

Brad Figg (brad-figg)
no longer affects: linux-lts-trusty (Ubuntu Trusty)
no longer affects: linux-lts-trusty (Ubuntu Vivid)
no longer affects: linux-lts-trusty (Ubuntu Wily)
Changed in linux (Ubuntu Precise):
status: Triaged → Won't Fix
Changed in linux (Ubuntu Vivid):
importance: Medium → High
Changed in linux (Ubuntu Trusty):
importance: Medium → High
no longer affects: linux (Ubuntu Precise)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Vivid):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Wily):
importance: Medium → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Trusty):
status: Triaged → In Progress
Changed in linux (Ubuntu Vivid):
status: Triaged → In Progress
Changed in linux (Ubuntu Wily):
status: Triaged → In Progress
Changed in linux-lts-trusty (Ubuntu):
status: New → In Progress
Changed in linux-lts-trusty (Ubuntu Precise):
status: New → In Progress
Changed in linux-lts-trusty (Ubuntu):
importance: Undecided → High
Changed in linux-lts-trusty (Ubuntu Precise):
importance: Undecided → High
Changed in linux-lts-trusty (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux-lts-trusty (Ubuntu Precise):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Wily only required the following two commits:

9f01ec5 Drivers: hv: vmbus: Improve the CPU affiliation for channels
3b71107 Drivers: hv: vmbus: Further improve CPU affiliation logic

The other commits for this bug are already in wily.

I built a wily test kernel, which can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1494423/wily/

Can you test this kernel and confirm it resolves this bug?

I'll also start working on this bug for vivid and trusty.

Revision history for this message
Joshua R. Poulson (jrp) wrote :

Joe, is this over and above Bug 1519917? That one may supercede this one. We are performing full functional tests on 1519917 and it's eating a lot of cycles, so to speak.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Yes, you are correct. The two commits needed by Wily will be pulled in with bug 1519917. I'll mark Wily as invalid in this bug report. I'll focus on Vivid and Trusty in this bug.

Changed in linux (Ubuntu Wily):
status: In Progress → Invalid
no longer affects: linux (Ubuntu Wily)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Vivid test kernel with the requested commits and needed prereqs. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1494423/vivid

Can you test this kernel and confirm it resolves this bug?

Changed in linux (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Chris Valean (chvale) wrote :

Hi Joe,
I ran some core tests for the Vivid test kernel provided and the results are looking good. I don't see any errors when using different NUMA topologies, please include the commits for release.

Brad Figg (brad-figg)
Changed in linux (Ubuntu Vivid):
status: In Progress → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-vivid
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Chris or Josh,

Can you verify the kernel in -proposed as requested in #9? That way the fix can move into -updates.

Thanks in advance!

Revision history for this message
Joshua R. Poulson (jrp) wrote :

We're looking at it.

Revision history for this message
Chris Valean (cvalean) wrote :

I see that the proposed kernel is tagged as 3.19.0-49-generic #55

The kernel is running fine and has passed the core tests.

tags: added: verification-done-vivid
removed: verification-needed-vivid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.19.0-49.55

---------------
linux (3.19.0-49.55) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1536775

  [ Colin Ian King ]

  * SAUCE: (no-up) ACPI / tables: Add acpi_force_32bit_fadt_addr option to
    force 32 bit FADT addresses
    - LP: #1529381

  [ Tim Gardner ]

  * [Config] Add DRM ast driver to udeb installer image
    - LP: #1514711
  * SAUCE: (no-up) Revert "[SCSI] libiscsi: Reduce locking contention in
    fast path"
    - LP: #1517142

  [ Upstream Kernel Changes ]

  * powerpc/eeh: Fix recursive fenced PHB on Broadcom shiner adapter
    - LP: #1532942
  * Drivers: hv: vmbus: prevent cpu offlining on newer hypervisors
    - LP: #1440103
  * Drivers: hv: vmbus: teardown hv_vmbus_con workqueue and
    vmbus_connection pages on shutdown
    - LP: #1440103
  * drivers: hv: vmbus: Teardown synthetic interrupt controllers on module
    unload
    - LP: #1440103
  * clockevents: export clockevents_unbind_device instead of
    clockevents_unbind
    - LP: #1440103
  * Drivers: hv: vmbus: Teardown clockevent devices on module unload
    - LP: #1440103
  * Drivers: hv: vmbus: Add support for VMBus panic notifier handler
    - LP: #1440103
  * hv: run non-blocking message handlers in the dispatch tasklet
    - LP: #1440103
  * Drivers: hv: vmbus: unregister panic notifier on module unload
    - LP: #1440103
  * Drivers: hv: vmbus: Implement the protocol for tearing down vmbus state
    - LP: #1440103
  * kexec: define kexec_in_progress in !CONFIG_KEXEC case
    - LP: #1440103
  * Drivers: hv: vmbus: add special kexec handler
    - LP: #1440103
  * Drivers: hv: don't do hypercalls when hypercall_page is NULL
    - LP: #1440103
  * Drivers: hv: vmbus: add special crash handler
    - LP: #1440103
  * Drivers: hv: vmbus: prefer 'die' notification chain to 'panic'
    - LP: #1440103
  * hyperv: Implement netvsc_get_channels() ethool op
    - LP: #1494423
  * hv_netvsc: Properly size the vrss queues
    - LP: #1494423
  * hv_netvsc: Allocate the sendbuf in a NUMA aware way
    - LP: #1494423
  * hv_netvsc: Allocate the receive buffer from the correct NUMA node
    - LP: #1494423
  * Drivers: hv: vmbus: Implement NUMA aware CPU affinity for channels
    - LP: #1494423
  * Drivers: hv: vmbus: Allocate ring buffer memory in NUMA aware fashion
    - LP: #1494423
  * Drivers: hv: vmbus: Improve the CPU affiliation for channels
    - LP: #1494423
  * Drivers: hv: vmbus: Further improve CPU affiliation logic
    - LP: #1494423

linux (3.19.0-48.54) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1536124
  * Merged back Ubuntu-3.19.0-46.52

 -- Brad Figg <email address hidden> Thu, 21 Jan 2016 12:29:48 -0800

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Chris Valean (cvalean) wrote :

Joe, can we please get a test kernel for Trusty (3.13 kernel) built?
I might have a lead that without these NUMA patches, we might be seeing the performance regression issues from 1519897 (comment #33) . Still have to confirm this from our end, but it's worth a try.
Thank you!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'll work on building one for you.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Chris and Josh,

I just wanted to let you know I'm still working on the backports for this bug and bug 1519897 . There are 22 other commits that are also required as prerequisites and most need backporting, which it is taking longer than expected to get a test kernel.

I should have a test kernel available shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The primary 6 commits are currently requiring 22 prerequisite commits and still require addition patches. Some of those commits change core networking and the sk_buff struct.

The following prereq commit required extensive backporting:
commit b1937227316417aa7568d01e6fa1f272e98fb890
Author: Eric Dumazet <email address hidden>
Date: Sun Sep 28 22:18:47 2014 -0700

    net: reorganize sk_buff for faster __copy_skb_header()

Due to this need to change core networking, approval for a stable release update would not be acked.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu Trusty):
status: In Progress → Won't Fix
Changed in linux-lts-trusty (Ubuntu):
status: In Progress → Won't Fix
Changed in linux-lts-trusty (Ubuntu Precise):
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.