[Azure][CVM] Fix swiotlb_max_mapping_size() for potential bounce buffer allocation failure in storvsc

Bug #1973169 reported by Tim Gardner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
New
Undecided
Unassigned
Jammy
Fix Released
Medium
Tim Gardner

Bug Description

SRU Justification

[Impact]
Description of problem:

When the v5.15 linux-azure kernel is used for CVM on Azure, it uses swiotlb for bounce buffering.
We recently found an issue in swiotlb_max_mapping_size(), which is used by the SCSI subsytem APIs, which are used by the hv_storvsc driver.

The issue is: currently swiotlb_max_mapping_size() always reports 256KB (i.e. 128 bounce buffer slots), but swiotlb_tbl_map_single() is unable to allocate a bounce buffer for an unaligned 256KB request, and eventually it can get stuck and we see this call-trace (BTW, this call-trace is obtained from a SLES VM, but I believe the issue exists in all distro kernels supporting CVM, and Tianyu says he's able to repro the issue in a Ubuntu CVM when trying to mount a XFS file system):

[ 186.458666][ C1] swiotlb_tbl_map_single+0x396/0x920
[ 186.458669][ C1] swiotlb_map+0xaa/0x2d0
[ 186.458674][ C1] dma_direct_map_sg+0xee/0x2c0
[ 186.458677][ C1] __dma_map_sg_attrs+0x30/0x70
[ 186.458680][ C1] dma_map_sg_attrs+0xa/0x20
[ 186.458681][ C1] scsi_dma_map+0x35/0x40
[ 186.458684][ C1] storvsc_queuecommand+0x20b/0x890
[ 186.458696][ C1] scsi_queue_rq+0x606/0xb80
[ 186.458698][ C1] __blk_mq_try_issue_directly+0x149/0x1c0
[ 186.458702][ C1] blk_mq_try_issue_directly+0x15/0x50
[ 186.458704][ C1] blk_mq_submit_bio+0x4b6/0x620
[ 186.458706][ C1] __submit_bio+0xe8/0x160
[ 186.458708][ C1] submit_bio_noacct_nocheck+0xf0/0x2b0
[ 186.458713][ C1] submit_bio+0x42/0xd0
[ 186.458714][ C1] submit_bio_wait+0x54/0xb0
[ 186.458718][ C1] xfs_rw_bdev+0x180/0x1b0 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.458769][ C1] xlog_do_io+0x8d/0x140 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.458819][ C1] xlog_bread+0x1f/0x40 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.458859][ C1] xlog_find_verify_cycle+0xc8/0x180 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.458899][ C1] xlog_find_head+0x2ae/0x3a0 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.458937][ C1] xlog_find_tail+0x44/0x360 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.458978][ C1] xlog_recover+0x2b/0x170 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.459056][ C1] xfs_log_mount+0x15b/0x270 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.459098][ C1] xfs_mountfs+0x49e/0x830 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.459224][ C1] xfs_fs_fill_super+0x5c2/0x7c0 [xfs 172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
[ 186.459303][ C1] get_tree_bdev+0x163/0x260
[ 186.459307][ C1] vfs_get_tree+0x25/0xc0
[ 186.459309][ C1] path_mount+0x704/0x9c0

Details: For example, the original physical address from the SCSI layer can be 0x1_0903_f200 with size=256KB, and when swiotlb_tbl_map_single() calls swiotlb_find_slots(), it passes "alloc_size + offset" (i.e. 256KB + 0x200 ) to swiotlb_find_slots(), which then calculates "nslots = nr_slots(alloc_size) ==> 129" and fails to allocate a bounce buffer as the maximum allowable number of contiguous slabs to map is IO_TLB_SEGSIZE (128).

The issue affects the hv_storvsc driver, as it calls
dma_set_min_align_mask(&device->device, HV_HYP_PAGE_SIZE - 1);

dma_set_min_align_mask() is also called by hv_netvsc, but netvsc is not affected as netvsc never calls swiotlb_tbl_map_single() with a near-to-256KB size.

dma_set_min_align_mask() is also called by the NVMe driver, but since we don't support PCI device assignment for CVM for now, that's not affected for now.

Tianyu Lan made a fix which is under review:
https://lwn.net/ml/linux-kernel/20220510142109.777738-1-ltykernel%40gmail.com/

Note: the linux-azure-cvm v5.4 kernel doesn't need the fix, as that kernel uses a vmbus private bounce buffering implementation (drivers/hv/hv_bounce.c) rathen than swiotlb.

[Test Case]

Microsoft tested

[Where things could go wrong]

Bounce buffers may fail to allocate.

[Other Info]

SF: #00336634

CVE References

Tim Gardner (timg-tpi)
affects: linux (Ubuntu) → linux-azure (Ubuntu)
Changed in linux-azure (Ubuntu Jammy):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.15.0-1006.7 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-jammy
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Microsoft tested. Marking verification done.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (105.5 KiB)

This bug was fixed in the package linux-azure - 5.15.0-1008.9

---------------
linux-azure (5.15.0-1008.9) jammy; urgency=medium

  * jammy/linux-azure: 5.15.0-1008.9 -proposed tracker (LP: #1974294)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.04.18)

  * [Azure] WARNING: CPU: 0 PID: 499 at include/linux/dma-mapping.h:555
    netvsc_probe+0x3c9/0x3e0 (LP: #1975717)
    - Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64)
    - Drivers: hv: vmbus: Fix initialization of device object in
      vmbus_device_register()

  * config CONFIG_HISI_PMU for kunpeng920 (LP: #1956086)
    - [Config] azure: CONFIG_HISI_PMU=m

  * linux: CONFIG_SERIAL_8250_MID=y (LP: #1967338)
    - [Config] azure: CONFIG_SERIAL_8250_MID=y

  * Support AMD P-State cpufreq control mechanism (LP: #1956509) // Enable
    speakup kernel modules to allow the speakup screen reader to function
    (LP: #1967702)
    - [Config] azure: Update configs after rebase

  * Azure: swiotlb patch needed for CVM (LP: #1971701) // [Azure][CVM] Fix
    swiotlb_max_mapping_size() for potential bounce buffer allocation failure in
    storvsc (LP: #1973169)
    - SAUCE: swiotlb: Max mapping size takes min align mask into account

  * Azure: swiotlb patch needed for CVM (LP: #1971701)
    - SAUCE: treewide: Replace the use of mem_encrypt_active() with
      cc_platform_has()
    - SAUCE: swiotlb: use bitmap to track free slots
    - SAUCE: swiotlb: allocate memory in a cache-friendly way
    - SAUCE: swiotlb: Split up single swiotlb lock

  * jammy/linux-azure: Update cifs to 5.15 backport (LP: #1970977)
    - improve error message when mount options conflict with posix
    - cifs: call cifs_reconnect when a connection is marked
    - cifs: call helper functions for marking channels for reconnect
    - cifs: mark sessions for reconnection in helper function
    - treewide: Replace zero-length arrays with flexible-array members
    - smb3: fix incorrect session setup check for multiuser mounts
    - cifs: truncate the inode and mapping when we simulate fcollapse
    - cifs: use a different reconnect helper for non-cifsd threads
    - cifs: do not skip link targets when an I/O fails
    - cifs: convert the path to utf16 in smb2_query_info_compound
    - cifs: change smb2_query_info_compound to use a cached fid, if available
    - cifs: fix bad fids sent over wire
    - cifs: fix incorrect use of list iterator after the loop
    - move more common protocol header definitions to smbfs_common
    - smb3: move defines for ioctl protocol header and SMB2 sizes to smbfs_common
    - smb3: move defines for query info and query fsinfo to smbfs_common
    - smb3: cleanup and clarify status of tree connections
    - smb3: fix ksmbd bigendian bug in oplock break, and move its struct to
      smbfs_common
    - fs: Remove ->readpages address space operation
    - cifs: fix potential race with cifsd thread
    - cifs: remove check of list iterator against head past the loop body
    - cifs: force new session setup and tcon for dfs
    - cifs: update internal module number
    - cifs: Check the IOCB_DIRECT flag, not O_DIRECT
    - cifs: Split the smb3_a...

Changed in linux-azure (Ubuntu Jammy):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.