linux-azure: Focal 5.4 arm64 support

Bug #1965618 reported by Tim Gardner
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Medium
Tim Gardner

Bug Description

SRU Justification

[Impact]

Focal linux-azure does not support arm64

[Fix]

Backport/cherry-pick approximately 100 patches to support arm64 on hyperv

[Where things could go wrong]

A number of the patches reorganize hyperv CPU support. Therefore amd64 could be affected.

[Other Info]

SF: #00310705

Tim Gardner (timg-tpi)
Changed in linux-azure (Ubuntu):
status: New → Fix Released
Changed in linux-azure (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Tim Gardner (timg-tpi)
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux-azure (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.4.0-1075.78 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Microsoft tested

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure-5.4/5.4.0-1076.79~18.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Tim Gardner (timg-tpi)
tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-azure - 5.4.0-1077.80

---------------
linux-azure (5.4.0-1077.80) focal; urgency=medium

  * focal/linux-azure: 5.4.0-1077.80 -proposed tracker (LP: #1968796)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.03.21)

  * focal/linux-azure: Enable missing config options (LP: #1968749)
    - Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device
    - PCI: hv: Propagate coherence from VMbus device to PCI device
    - [Config] azure: arm64: Ignore module movements
    - [Config] azure: arm64: Enable RAID config options
    - [Config] azure: arm64: Enable SQUASHFS config options

 -- Tim Gardner <email address hidden> Tue, 12 Apr 2022 18:53:53 -0600

Changed in linux-azure (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Dexuan Cui (decui) wrote :

The 5.4.0-1075-azure and newer kernels are broken in that the VM can easily panic when the Mellanox VF NIC is removed and added due to Azure host servicing events or the below manual "unbind/bind" test (here the GUID can be different in different VMs):

for i in `seq 1 1000`;
do
    cd /sys/bus/vmbus/drivers/hv_pci;
    echo abdc2107-402e-4704-8c88-c2b850696c3c > unbind;
    echo abdc2107-402e-4704-8c88-c2b850696c3c > bind;
done

A sample panic call-trace is:
[ 107.359954] kernel BUG at /build/linux-azure-5.4-4I3kFs/linux-azure-5.4-5.4.0/mm/slub.c:4020!
[ 107.363858] invalid opcode: 0000 [#1] SMP NOPTI
[ 107.365870] CPU: 0 PID: 334 Comm: kworker/0:2 Not tainted 5.4.0-1077-azure #80~18.04.1-Ubuntu
[ 107.369589] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[ 107.373811] Workqueue: events vmbus_onmessage_work
[ 107.375909] RIP: 0010:kfree+0x1d2/0x240

[ 107.413789] Call Trace:
[ 107.414867] kobject_uevent_env+0x1b5/0x7e0
[ 107.416747] kobject_uevent+0xb/0x10
[ 107.418327] device_release_driver_internal+0x191/0x1c0
[ 107.420653] device_release_driver+0x12/0x20
[ 107.422523] bus_remove_device+0xe1/0x150
[ 107.424279] device_del+0x167/0x380
[ 107.425824] device_unregister+0x1a/0x60
[ 107.427536] vmbus_device_unregister+0x27/0x50
[ 107.429528] vmbus_onoffer_rescind+0x1d0/0x1f0
[ 107.431474] vmbus_onmessage+0x2c/0x70
[ 107.433104] vmbus_onmessage_work+0x22/0x30
[ 107.434919] process_one_work+0x209/0x400
[ 107.436661] worker_thread+0x34/0x40

It turns out there is a bug in https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/bionic/commit/?id=16a3c750a78d8, which misses the second hunk of the upstream patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=877b911a5ba0.

Please apply the below patch to fix the issue:

--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3653,7 +3653,7 @@ static int hv_pci_remove(struct hv_device *hdev)

        hv_put_dom_num(hbus->bridge->domain_nr);

- free_page((unsigned long)hbus);
+ kfree(hbus);
        return ret;
 }

BTW, please apply this patch as well (Note: this patch is not really required as it's only for error handling path, which is usually unlikely):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=42c3d41832ef4fcf60aaa6f748de01ad99572adf

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (linux-meta-azure/5.4.0.1081.80)

All autopkgtests for the newly accepted linux-meta-azure (5.4.0.1081.80) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

v4l2loopback/0.12.3-1ubuntu0.4 (amd64)
dahdi-linux/1:2.11.1~dfsg-1ubuntu6.3 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#linux-meta-azure

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.