Drivers: hv: vmbus: Fix duplicate CPU assignments within a device
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-azure (Ubuntu) |
Incomplete
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Tim Gardner | ||
Hirsute |
Fix Released
|
Undecided
|
Tim Gardner |
Bug Description
SRU Justification
[Impact]
Customers have degraded network performance on Hyper-V/Azure
This is a request to pick up a patch from the upstream, the patch fixes an issue with Ubuntu as a hyper-v and Azure guest. This patch need to get picked up for 20.04, 18.04. The link to the upstream patch follows:
Description of issue and solution:
The vmbus module uses a rotational algorithm to assign target CPUs to
a device's channels. Depending on the timing of different device's channel
offers, different channels of a device may be assigned to the same CPU.
For example on a VM with 2 CPUs, if NIC A and B's channels are offered
in the following order, NIC A will have both channels on CPU0, and
NIC B will have both channels on CPU1 -- see below. This kind of
assignment causes RSS load that is spreading across different channels
to end up on the same CPU.
Timing of channel offers:
NIC A channel 0
NIC B channel 0
NIC A channel 1
NIC B channel 1
VMBUS ID 14: Class_ID = {f8615163-
Device_ID = {cab064cd-
Sysfs path: /sys/bus/
Rel_ID=14, target_cpu=0
Rel_ID=17, target_cpu=0
VMBUS ID 16: Class_ID = {f8615163-
Device_ID = {244225ca-
Sysfs path: /sys/bus/
Rel_ID=16, target_cpu=1
Rel_ID=18, target_cpu=1
Update the vmbus CPU assignment algorithm to avoid duplicate CPU
assignments within a device.
The new algorithm iterates num_online_cpus + 1 times.
The existing rotational algorithm to find "next NUMA & CPU" is still here.
But if the resulting CPU is already used by the same device, it will try
the next CPU.
In the last iteration, it assigns the channel to the next available CPU
like the existing algorithm. This is not normally expected, because
during device probe, we limit the number of channels of a device to
be <= number of online CPUs.
[Test Plan]
This could be tough to test as the patch fixes a race condition.
[Where problems could occur]
Network performance issues could persist.
[Other Info]
SF:#00315347
CVE References
affects: | linux (Ubuntu) → linux-azure (Ubuntu) |
Changed in linux-azure (Ubuntu Hirsute): | |
status: | In Progress → Fix Committed |
Changed in linux-azure (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Changed in linux-azure (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Test kernels at https:/ /launchpad. net/~timg- tpi/+archive/ ubuntu/ hyperv- azure-lp1937078