[Azure][CVM] hv/bounce buffer: Fix a race that can fail disk detection

Bug #1971164 reported by Tim Gardner
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure-cvm (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Medium
Tim Gardner

Bug Description

SRU Justification

[Impact]

The linux-azure-cvm kernel (e.g. Ubuntu-azure-cvm-5.4.0-1078.81+cvm1) has a race condition bug in the Linux vmbus bounce buffer code (drivers/hv/hv_bounce.c), and as a result somtimes the kenrel fails to detect some of the SCSI disks, and the Linux dmesg log may show one of the 2 messages:

#1: [ 2.995732] sd 3:0:0:3: [sdd] Sector size 0 reported, assuming 512.

#2: [ 3.651293] scsi host3: scsi scan: INQUIRY result too short (5), using 36

Sometimes I see a strange call-trace (the 'order's is 18, if I print it)
2022-04-26T20:10:18,398144+00:00 kmalloc_order_trace+0x1e/0x80
2022-04-26T20:10:18,398147+00:00 __kmalloc+0x3ae/0x4c0
2022-04-26T20:10:18,398150+00:00 __scsi_scan_target+0x283/0x590
2022-04-26T20:10:18,398155+00:00 scsi_scan_channel.part.16+0x62/0x80
2022-04-26T20:10:18,398158+00:00 scsi_scan_host_selected+0xd5/0x150
2022-04-26T20:10:18,398160+00:00 store_scan+0xc8/0xe0
(This is very strange because 'order 18' means (1 << 18) * 4096 bytes = 1GBytes.)

After some investigation, we eventually got the root cause and made a fix:
https://github.com/dcui/linux-azure-cvm/commit/ddde4dc33242794000e1d9667a5f9cfa31c15fdf

With the fix, we no longer see the above strange symptoms.
Please include the fix into the next release of the v5.4 linux-azure-cvm kernel. Thanks!

[Test case]

Microsoft tested

[Where things could go wrong]

Some SCSI drives may continue to go undetected.

[Other Info]

SF: #00335631

CVE References

Tim Gardner (timg-tpi)
affects: linux (Ubuntu) → linux-azure-cvm (Ubuntu)
Changed in linux-azure-cvm (Ubuntu):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux-azure-cvm (Ubuntu):
status: In Progress → Invalid
assignee: Tim Gardner (timg-tpi) → nobody
importance: Medium → Undecided
Changed in linux-azure-cvm (Ubuntu Focal):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → In Progress
Tim Gardner (timg-tpi)
Changed in linux-azure-cvm (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure-cvm/5.4.0-1080.83+cvm1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Tim Gardner (timg-tpi) wrote :

MSFT tested. Marking verification-done-focal

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-azure-cvm - 5.4.0-1080.83+cvm1

---------------
linux-azure-cvm (5.4.0-1080.83+cvm1) focal; urgency=medium

  * focal/linux-azure-cvm: 5.4.0-1080.83+cvm1 -proposed tracker (LP: #1973948)

  * [Azure][CVM] hv/bounce buffer: Fix a race that can fail disk detection
    (LP: #1971164)
    - SAUCE: hv/bounce buffer: Fix a race that can fail disk detection

  [ Ubuntu: 5.4.0-1080.83 ]

  * focal/linux-azure: 5.4.0-1080.83 -proposed tracker (LP: #1973952)
  * focal/linux: 5.4.0-113.127 -proposed tracker (LP: #1973980)
  * CVE-2022-29581
    - net/sched: cls_u32: fix netns refcount changes in u32_change()
  * CVE-2022-1116
    - io_uring: fix fs->users overflow
  * ext4: limit length to bitmap_maxbytes (LP: #1972281)
    - ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
  * Unprivileged users may use PTRACE_SEIZE to set PTRACE_O_SUSPEND_SECCOMP
    option (LP: #1972740)
    - ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE

 -- Marcelo Henrique Cerri <email address hidden> Mon, 23 May 2022 18:06:01 -0300

Changed in linux-azure-cvm (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.