linux-azure: Case VM fails to initialize CX4 VF due to mem fragmentation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-azure (Ubuntu) |
Fix Released
|
Medium
|
Tim Gardner | ||
Jammy |
Fix Released
|
Medium
|
Tim Gardner |
Bug Description
SRU Justification
[Impact]
Below are the kernel messages showing the VF being removed at 12:21:35, and then being re-added starting at 12:22:44. You can see the stack traces trying to make an order 7 allocation, and then and order 8 allocation, both of which fail.
[Sun Jan 23 12:21:34 2022] infiniband mlx5_0: wait_for_
[Sun Jan 23 12:21:35 2022] hv_netvsc 000d3a7c-
[Sun Jan 23 12:21:35 2022] hv_netvsc 000d3a7c-
[Sun Jan 23 12:22:44 2022] hv_pci 61367817-
[Sun Jan 23 12:22:44 2022] hv_pci 61367817-
[Sun Jan 23 12:22:44 2022] pci_bus ab6d:00: root bus resource [mem 0xfe0000000-
[Sun Jan 23 12:22:44 2022] pci ab6d:00:02.0: [15b3:1016] type 00 class 0x020000
[Sun Jan 23 12:22:44 2022] pci ab6d:00:02.0: reg 0x10: [mem 0xfe0000000-
[Sun Jan 23 12:22:44 2022] pci ab6d:00:02.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown speed x0 link at ab6d:00:02.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
[Sun Jan 23 12:22:44 2022] pci ab6d:00:02.0: BAR 0: assigned [mem 0xfe0000000-
[Sun Jan 23 12:22:44 2022] mlx5_core ab6d:00:02.0: firmware version: 14.30.1210
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 24 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 25 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 26 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 27 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 28 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 29 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 30 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 31 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 32 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 33 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 34 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 35 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 36 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 37 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 38 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 39 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 40 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 41 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 42 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 43 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 44 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 45 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 46 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 47 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 48 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 49 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 50 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 51 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 52 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 53 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 54 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] mlx5_core ab6d:00:02.0: irq 55 for MSI/MSI-X
[Sun Jan 23 12:22:45 2022] kworker/18:1: page allocation failure: order:7, mode:0x80d0
[Sun Jan 23 12:22:45 2022] CPU: 18 PID: 11869 Comm: kworker/18:1 Kdump: loaded Tainted: G ------------ T 3.10.0-
[Sun Jan 23 12:22:45 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[Sun Jan 23 12:22:45 2022] Workqueue: hv_pri_chan vmbus_add_
[Sun Jan 23 12:22:45 2022] Call Trace:
[Sun Jan 23 12:22:45 2022] [<ffffffffa6783
[Sun Jan 23 12:22:45 2022] [<ffffffffa61c4
[Sun Jan 23 12:22:45 2022] [<ffffffffa61c9
[Sun Jan 23 12:22:45 2022] [<ffffffffa6033
[Sun Jan 23 12:22:45 2022] [<ffffffffa606e
[Sun Jan 23 12:22:45 2022] [<ffffffffc0763
[Sun Jan 23 12:22:45 2022] [<ffffffffc0763
[Sun Jan 23 12:22:45 2022] [<ffffffffc0763
[Sun Jan 23 12:22:45 2022] [<ffffffffc075d
[Sun Jan 23 12:22:45 2022] [<ffffffffc075d
[Sun Jan 23 12:22:45 2022] [<ffffffffc075e
[Sun Jan 23 12:22:45 2022] [<ffffffffa6076
[Sun Jan 23 12:22:45 2022] [<ffffffffc075f
[Sun Jan 23 12:22:45 2022] [<ffffffffc075d
[Sun Jan 23 12:22:45 2022] [<ffffffffc0756
[Sun Jan 23 12:22:45 2022] [<ffffffffc0756
[Sun Jan 23 12:22:45 2022] [<ffffffffa63d6
[Sun Jan 23 12:22:45 2022] [<ffffffffa63d8
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64b9
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa63cb
[Sun Jan 23 12:22:45 2022] [<ffffffffa63cb
[Sun Jan 23 12:22:45 2022] [<ffffffffc0284
[Sun Jan 23 12:22:45 2022] [<ffffffffc024d
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64b9
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64ba
[Sun Jan 23 12:22:45 2022] [<ffffffffa64b8
[Sun Jan 23 12:22:45 2022] [<ffffffffa64b8
[Sun Jan 23 12:22:45 2022] [<ffffffffc024e
[Sun Jan 23 12:22:45 2022] [<ffffffffc0251
[Sun Jan 23 12:22:45 2022] [<ffffffffa60bd
[Sun Jan 23 12:22:45 2022] [<ffffffffa60be
[Sun Jan 23 12:22:45 2022] [<ffffffffa60be
[Sun Jan 23 12:22:45 2022] [<ffffffffa60c5
[Sun Jan 23 12:22:45 2022] [<ffffffffa60c5
[Sun Jan 23 12:22:45 2022] [<ffffffffa6795
[Sun Jan 23 12:22:45 2022] [<ffffffffa60c5
[Sun Jan 23 12:22:45 2022] Mem-Info:
[Sun Jan 23 12:22:45 2022] active_
active_file:377528 inactive_
unevictable:0 dirty:9 writeback:0 unstable:0
slab_reclaimabl
mapped:9123 shmem:1052916 pagetables:61434 bounce:0
free:173238 free_pcp:171 free_cma:0
[Sun Jan 23 12:22:45 2022] Node 0 DMA free:15892kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimabl
[Sun Jan 23 12:22:45 2022] lowmem_reserve[]: 0 775 128751 128751
[Sun Jan 23 12:22:45 2022] Node 0 DMA32 free:512612kB min:404kB low:504kB high:604kB active_
[Sun Jan 23 12:22:45 2022] lowmem_reserve[]: 0 0 127975 127975
[Sun Jan 23 12:22:45 2022] Node 0 Normal free:164448kB min:67168kB low:83960kB high:100752kB active_
[Sun Jan 23 12:22:45 2022] lowmem_reserve[]: 0 0 0 0
[Sun Jan 23 12:22:45 2022] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[Sun Jan 23 12:22:45 2022] Node 0 DMA32: 242*4kB (UM) 184*8kB (UEM) 121*16kB (UEM) 69*32kB (UEM) 69*64kB (UEM) 25*128kB (UEM) 11*256kB (UM) 2*512kB (UE) 1*1024kB (U) 1*2048kB (U) 120*4096kB (M) = 512632kB
[Sun Jan 23 12:22:45 2022] Node 0 Normal: 8431*4kB (UEM) 7451*8kB (UEM) 2808*16kB (UEM) 442*32kB (UEM) 104*64kB (UEM) 44*128kB (UEM) 10*256kB (UM) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 170324kB
[Sun Jan 23 12:22:45 2022] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
[Sun Jan 23 12:22:45 2022] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
[Sun Jan 23 12:22:45 2022] 1811902 total pagecache pages
[Sun Jan 23 12:22:45 2022] 0 pages in swap cache
[Sun Jan 23 12:22:45 2022] Swap cache stats: add 0, delete 0, find 0/0
[Sun Jan 23 12:22:45 2022] Free swap = 0kB
[Sun Jan 23 12:22:45 2022] Total swap = 0kB
[Sun Jan 23 12:22:45 2022] 33554318 pages RAM
[Sun Jan 23 12:22:45 2022] 0 pages HighMem/MovableOnly
[Sun Jan 23 12:22:45 2022] 589172 pages reserved
[Sun Jan 23 12:22:45 2022] kworker/18:1: page allocation failure: order:8, mode:0xc0d0
[Sun Jan 23 12:22:45 2022] CPU: 18 PID: 11869 Comm: kworker/18:1 Kdump: loaded Tainted: G ------------ T 3.10.0-
[Sun Jan 23 12:22:45 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[Sun Jan 23 12:22:45 2022] Workqueue: hv_pri_chan vmbus_add_
[Sun Jan 23 12:22:45 2022] Call Trace:
[Sun Jan 23 12:22:45 2022] [<ffffffffa6783
[Sun Jan 23 12:22:45 2022] [<ffffffffa61c4
[Sun Jan 23 12:22:45 2022] [<ffffffffa61c9
[Sun Jan 23 12:22:45 2022] [<ffffffffa6218
[Sun Jan 23 12:22:45 2022] [<ffffffffa61e5
[Sun Jan 23 12:22:45 2022] [<ffffffffa6224
[Sun Jan 23 12:22:45 2022] [<ffffffffa614f
[Sun Jan 23 12:22:45 2022] [<ffffffffa6228
[Sun Jan 23 12:22:45 2022] [<ffffffffc0773
[Sun Jan 23 12:22:45 2022] [<ffffffffc0772
[Sun Jan 23 12:22:45 2022] [<ffffffffc0756
[Sun Jan 23 12:22:45 2022] [<ffffffffc0756
[Sun Jan 23 12:22:45 2022] [<ffffffffa63d6
[Sun Jan 23 12:22:45 2022] [<ffffffffa63d8
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64b9
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa63cb
[Sun Jan 23 12:22:45 2022] [<ffffffffa63cb
[Sun Jan 23 12:22:45 2022] [<ffffffffc0284
[Sun Jan 23 12:22:45 2022] [<ffffffffc024d
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64b9
[Sun Jan 23 12:22:45 2022] [<ffffffffa64bb
[Sun Jan 23 12:22:45 2022] [<ffffffffa64ba
[Sun Jan 23 12:22:45 2022] [<ffffffffa64b8
[Sun Jan 23 12:22:45 2022] [<ffffffffa64b8
[Sun Jan 23 12:22:45 2022] [<ffffffffc024e
[Sun Jan 23 12:22:45 2022] [<ffffffffc0251
[Sun Jan 23 12:22:45 2022] [<ffffffffa60bd
[Sun Jan 23 12:22:45 2022] [<ffffffffa60be
[Sun Jan 23 12:22:45 2022] [<ffffffffa60be
[Sun Jan 23 12:22:45 2022] [<ffffffffa60c5
[Sun Jan 23 12:22:45 2022] [<ffffffffa60c5
[Sun Jan 23 12:22:45 2022] [<ffffffffa6795
[Sun Jan 23 12:22:45 2022] [<ffffffffa60c5
[Sun Jan 23 12:22:45 2022] Mem-Info:
[Sun Jan 23 12:22:45 2022] active_
active_file:377553 inactive_
unevictable:0 dirty:11 writeback:0 unstable:0
slab_reclaimabl
mapped:9232 shmem:1052917 pagetables:61460 bounce:0
free:174818 free_pcp:239 free_cma:0
[Sun Jan 23 12:22:45 2022] Node 0 DMA free:15892kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimabl
[Sun Jan 23 12:22:45 2022] lowmem_reserve[]: 0 775 128751 128751
[Sun Jan 23 12:22:45 2022] Node 0 DMA32 free:512632kB min:404kB low:504kB high:604kB active_
[Sun Jan 23 12:22:45 2022] lowmem_reserve[]: 0 0 127975 127975
[Sun Jan 23 12:22:45 2022] Node 0 Normal free:170748kB min:67168kB low:83960kB high:100752kB active_
[Sun Jan 23 12:22:46 2022] lowmem_reserve[]: 0 0 0 0
[Sun Jan 23 12:22:46 2022] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[Sun Jan 23 12:22:46 2022] Node 0 DMA32: 242*4kB (UM) 184*8kB (UEM) 121*16kB (UEM) 69*32kB (UEM) 69*64kB (UEM) 25*128kB (UEM) 11*256kB (UM) 2*512kB (UE) 1*1024kB (U) 1*2048kB (U) 120*4096kB (M) = 512632kB
[Sun Jan 23 12:22:46 2022] Node 0 Normal: 9561*4kB (UEM) 7451*8kB (UEM) 2806*16kB (UEM) 437*32kB (UEM) 103*64kB (UEM) 15*128kB (UEM) 9*256kB (UM) 6*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 170620kB
[Sun Jan 23 12:22:46 2022] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
[Sun Jan 23 12:22:46 2022] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
[Sun Jan 23 12:22:46 2022] 1810845 total pagecache pages
[Sun Jan 23 12:22:46 2022] 0 pages in swap cache
[Sun Jan 23 12:22:46 2022] Swap cache stats: add 0, delete 0, find 0/0
[Sun Jan 23 12:22:46 2022] Free swap = 0kB
[Sun Jan 23 12:22:46 2022] Total swap = 0kB
[Sun Jan 23 12:22:46 2022] 33554318 pages RAM
[Sun Jan 23 12:22:46 2022] 0 pages HighMem/MovableOnly
[Sun Jan 23 12:22:46 2022] 589172 pages reserved
[Sun Jan 23 12:22:46 2022] mlx5_core ab6d:00:02.0: Failed to init flow steering
[Sun Jan 23 12:22:46 2022] mlx5_core ab6d:00:02.0: mlx5_load_one failed with error code -12
[Sun Jan 23 12:22:46 2022] mlx5_core: probe of ab6d:00:02.0 failed with error -12
[Fix]
26bf30902c10473
48f02eef7f764f3
2fdeb4f4c2aea53
38a54cae6f76c3e
b247f32aecad09e
[Test Case]
Microsoft tested
[Where things might go wrong]
VM allocations could continue to fail
[Other Info]
SF: #00327011
CVE References
affects: | linux (Ubuntu) → linux-azure (Ubuntu) |
Changed in linux-azure (Ubuntu): | |
assignee: | nobody → Tim Gardner (timg-tpi) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in linux-azure (Ubuntu Jammy): | |
status: | In Progress → Fix Committed |
This bug was fixed in the package linux-azure - 5.15.0-1002.3
---------------
linux-azure (5.15.0-1002.3) jammy; urgency=medium
* jammy/linux-azure: 5.15.0-1002.3 -proposed tracker (LP: #1965771)
* Packaging resync (LP: #1786013)
- [Packaging] switch dependency from crda to wireless-regdb
* linux-azure: Update HV support to 5.17 (LP: #1961329) ghcb_hv_ call() for use by HyperV queuecommand( ) memory leak device_ remove( )
- x86/sev: Expose sev_es_
- x86/hyperv: Initialize GHCB page in Isolation VM
- x86/hyperv: Initialize shared memory boundary in the Isolation VM.
- x86/hyperv: Add new hvcall guest address host visibility support
- Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM
- x86/hyperv: Add Write/Read MSR registers via ghcb page
- x86/hyperv: Add ghcb hvcall support for SNP VM
- Drivers: hv: vmbus: Add SNP support for VMbus channel initiate message
- Drivers: hv: vmbus: Initialize VMbus ring buffer for Isolation VM
- swiotlb: Add swiotlb bounce buffer remap function for HV IVM
- x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
- hyper-v: Enable swiotlb bounce buffer for Isolation VM
- scsi: storvsc: Add Isolation VM support for storvsc driver
- net: netvsc: Add Isolation VM support for netvsc driver
- swiotlb: Add CONFIG_HAS_IOMEM check around swiotlb_mem_remap()
- Drivers: hv: vmbus: Initialize request offers message for Isolation VM
- scsi: storvsc: Fix storvsc_
- Netvsc: Call hv_unmap_memory() in the netvsc_
- x86/sev: Replace occurrences of sev_active() with cc_platform_has()
- x86/kvm: Don't waste memory if kvmclock is disabled
- x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64
* linux-azure: Case VM fails to initialize CX4 VF due to mem fragmentation
(LP: #1961632)
- net/mlx5: Reduce flow counters bulk query buffer size for SFs
- net/mlx5: Fix flow counters SF bulk query len
- net/mlx5: Dynamically resize flow counters query buffer
* linux-azure: net: mana: Add handling of CQE_RX_TRUNCATED (LP: #1960322) rx_cqe( )
- net: mana: Add handling of CQE_RX_TRUNCATED
- net: mana: Remove unnecessary check of cqe_type in mana_process_
* jammy/linux-azure: CIFS 5.15 backport (LP: #1960671) proc_show( )
- cifs: add mount parameter tcpnodelay
- cifs: Create a new shared file holding smb2 pdu definitions
- cifs: move NEGOTIATE_PROTOCOL definitions out into the common area
- cifs: Move more definitions into the shared area
- cifs: Move SMB2_Create definitions to the shared area
- smb3: add dynamic trace points for socket connection
- cifs: send workstation name during ntlmssp session setup
- cifs: fix print of hdr_flags in dfscache_
- cifs: introduce new helper for cifs_reconnect()
- cifs: convert list_for_each to entry variant
- cifs: split out dfs code from cifs_reconnect()
- cifs: for compound requests, use open handle if possible
- cifs: support nested dfs links over reconnect
- smb3: remove trivial dfs compile warning
- smb3: add additional null check in SMB2_ioctl
- smb3: add additional null che...