linux-azure 4.15 fails to boot on Standard_M416s_v2 in Azure

Bug #1951924 reported by Gauthier Jolly
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
In Progress
Medium
Bartlomiej Zolnierkiewicz
linux-azure-4.15 (Ubuntu)
Fix Committed
Undecided
Krzysztof Kozlowski
Bionic
Fix Released
Undecided
Krzysztof Kozlowski

Bug Description

To reproduce:

 * start a bionic VM in azure:
az vm create --name bionic --resource-group test-bionic --image "Canonical:UbuntuServer:18_04-lts-gen2:latest" --size Standard_M416s_v2 --admin-username ubuntu --ssh-key-value SSH_KEY_PATH

 * "downgrade" the kernel to 4.15 and delete the 5.4 kernel

 * reboot (the machine should fail to boot)

The serial console logs can be found on the azure portal (boot diagnostic needs to be enabled for the VM first).

Logs: https://pastebin.ubuntu.com/p/mhKMdMJCtX/

Gauthier Jolly (gjolly)
Changed in linux-azure (Ubuntu):
status: New → Confirmed
Revision history for this message
Bartlomiej Zolnierkiewicz (bzolnier) wrote :

[SRU Justification]

[Impact]

Current linux-azure-4.15 kernel in bionic fails to boot Standard_M416s_v2 VM because of x2apic being disabled:

[ 0.000000] x2apic: IRQ remapping doesn't support X2APIC mode
[ 0.000000] unchecked MSR access error: WRMSR to 0x1b (tried to write 0x00000000fee00100) at rIP: 0xffffffff9325d1f8 (native_write_msr+0x8/0x30)
[ 0.000000] Call Trace:
[ 0.000000] __x2apic_disable.part.5+0x49/0x80
[ 0.000000] enable_IR_x2apic+0x123/0x18c
[ 0.000000] default_setup_apic_routing+0x16/0x73
[ 0.000000] apic_intr_mode_init+0x84/0x91
[ 0.000000] x86_late_time_init+0x24/0x2b
[ 0.000000] start_kernel+0x444/0x505
[ 0.000000] x86_64_start_reservations+0x24/0x26
[ 0.000000] x86_64_start_kernel+0x74/0x77
[ 0.000000] secondary_startup_64+0xa5/0xb0
[ 0.000000] x2apic disabled
[ 0.000000] Switched APIC routing to physical flat.

[Test Plan]

Run updated kernel (with x2apic support backported from linux-azure-5.4 kernel).

It should boot fine and display:

[ 0.000000] Setting APIC routing to physical x2apic.

[Where problems could occur]

Potentially x2apic may now be used on other VM instances.

[Other Info]

None.

Stefan Bader (smb)
Changed in linux-azure (Ubuntu Bionic):
assignee: nobody → Bartlomiej Zolnierkiewicz (bzolnier)
importance: Undecided → Medium
status: New → In Progress
Changed in linux-azure (Ubuntu):
status: Confirmed → Invalid
Changed in linux-azure-4.15 (Ubuntu):
status: New → In Progress
Changed in linux-azure-4.15 (Ubuntu Bionic):
status: New → In Progress
Changed in linux-azure-4.15 (Ubuntu):
status: In Progress → Fix Committed
assignee: nobody → Krzysztof Kozlowski (krzk)
Changed in linux-azure-4.15 (Ubuntu Bionic):
status: In Progress → Fix Committed
assignee: nobody → Krzysztof Kozlowski (krzk)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure-4.15/4.15.0-1130.143 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Kernel version 4.15.0-1130.143 is booting fine on M416s_v2 instances.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (19.6 KiB)

This bug was fixed in the package linux-azure-4.15 - 4.15.0-1130.143

---------------
linux-azure-4.15 (4.15.0-1130.143) bionic; urgency=medium

  * bionic/linux-azure-4.15: 4.15.0-1130.143 -proposed tracker (LP: #1955257)

  * linux-azure 4.15 fails to boot on Standard_M416s_v2 in Azure (LP: #1951924)
    - PCI: hv: Replace hv_vp_set with hv_vpset
    - PCI: hv: Refactor hv_irq_unmask() to use cpumask_to_vpset()
    - x86/Hyper-V: Set x2apic destination mode to physical when x2apic is
      available
    - iommu/hyper-v: Add Hyper-V stub IOMMU driver
    - [Config] linux-azure: CONFIG_HYPERV_IOMMU=y

  [ Ubuntu: 4.15.0-167.175 ]

  * bionic/linux: 4.15.0-167.175 -proposed tracker (LP: #1955276)
  * hisi_sas driver may oops in prep_ssp_v3_hw() (LP: #1953386)
    - scsi: hisi_sas: Fix to only call scsi_get_prot_op() for non-NULL scsi_cmnd
  * Bionic update: upstream stable patchset 2021-12-13 (LP: #1954703)
    - xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good
      delay
    - binder: use euid from cred instead of using task
    - Input: elantench - fix misreporting trackpoint coordinates
    - Input: i8042 - Add quirk for Fujitsu Lifebook T725
    - libata: fix read log timeout value
    - ocfs2: fix data corruption on truncate
    - mmc: dw_mmc: Dont wait for DRTO on Write RSP error
    - parisc: Fix ptrace check on syscall return
    - tpm: Check for integer overflow in tpm2_map_response_body()
    - media: ite-cir: IR receiver stop working after receive overflow
    - ALSA: ua101: fix division by zero at probe
    - ALSA: 6fire: fix control and bulk message timeouts
    - ALSA: line6: fix control and interrupt message timeouts
    - ALSA: synth: missing check for possible NULL after the call to kstrdup
    - ALSA: timer: Fix use-after-free problem
    - ALSA: timer: Unconditionally unlink slave instances, too
    - x86/irq: Ensure PI wakeup handler is unregistered before module unload
    - cavium: Return negative value when pci_alloc_irq_vectors() fails
    - scsi: qla2xxx: Fix unmap of already freed sgl
    - cavium: Fix return values of the probe function
    - sfc: Don't use netif_info before net_device setup
    - hyperv/vmbus: include linux/bitops.h
    - mmc: winbond: don't build on M68K
    - bpf: Prevent increasing bpf_jit_limit above max
    - xen/netfront: stop tx queues during live migration
    - spi: spl022: fix Microwire full duplex mode
    - watchdog: Fix OMAP watchdog early handling
    - vmxnet3: do not stop tx queues after netif_device_detach()
    - btrfs: fix lost error handling when replaying directory deletes
    - hwmon: (pmbus/lm25066) Add offset coefficients
    - regulator: s5m8767: do not use reset value as DVS voltage if GPIO DVS is
      disabled
    - regulator: dt-bindings: samsung,s5m8767: correct s5m8767,pmic-buck-default-
      dvs-idx property
    - EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
    - mwifiex: fix division by zero in fw download path
    - ath6kl: fix division by zero in send path
    - ath6kl: fix control-message timeout
    - ath10k: fix control-message timeout
    - ath10k: fix division by zero in send path
    - PCI: Mark Atheros QCA6174 t...

Changed in linux-azure-4.15 (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.