b/linux-gcp-5.4: log_check WARNING on n2d-standard-64

Bug #1958416 reported by Francis Ginther
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Undecided
Unassigned
Focal
Triaged
Medium
Thadeu Lima de Souza Cascardo
linux-gcp (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
High
Thadeu Lima de Souza Cascardo

Bug Description

[Impact]
Small SWIOTLB might cause DMA failures during High IO on GCP instances using SEV.

[Test case]
Boot n2d-standard-2 or n2d-standard-64 instances on GCP, issue a lot of IO, and look for DMA overflow or full swiotlb messages on dmesg.

[Potential regression]
Boot failures should not be discarded when SWIOTLB cannot be allocated.

------------------

ubuntu_boot log_check fails with: WARNING: CPU: 34 PID: 8648 at /build/linux-gcp-5.4-zTWCml/linux-gcp-5.4-5.4.0/kernel/dma/direct.c:35 report_addr+0x33/0x90

From the serial console:

[ 137.928758] nvme 0000:00:04.0: overflow 0x0000801f68b5f000+131072 of DMA mask ffffffffffffffff bus mask 0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.928758] nvme 0000:00:04.0: overflow 0x0000801f68b5f000+131072 of DMA mask ffffffffffffffff bus mask 0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938685] ------------[ cut here ]------------
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938691] WARNING: CPU: 34 PID: 8648 at /build/linux-gcp-5.4-zTWCml/linux-gcp-5.4-5.4.0/kernel/dma/direct.c:35 report_addr+0x33/0x90
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938691] Modules linked in: ip6table_filter ip6_tables iptable_filter bpfilter nls_iso8859_1 input_leds pvpanic serio_raw mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper psmouse i2c_piix4 gve
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938712] CPU: 34 PID: 8648 Comm: apt-check Not tainted 5.4.0-1060-gcp #64~18.04.1-Ubuntu
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938712] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938714] RIP: 0010:report_addr+0x33/0x90
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938716] Code: 48 83 ec 08 48 8b 87 30 02 00 00 48 89 75 f8 48 85 c0 74 26 4c 8b 00 b8 fe ff ff ff 49 39 c0 76 0d 80 3d 47 67 b2 01 00 74 2e <0f> 0b c9 c3 48 83 bf 40 02 00 00 00 75 e9 eb f0 80 3d 2f 67 b2 01
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938716] RSP: 0018:ffffbd77cf99f4e0 EFLAGS: 00010282
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938718] RAX: 0000000000000000 RBX: ffff9773f8b960b0 RCX: 0000000000000000
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938718] RDX: 0000000000000000 RSI: ffff9753ff897448 RDI: 0000000000000000
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938719] RBP: ffffbd77cf99f4e8 R08: 0000000000000305 R09: 0000000000000022
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938719] R10: 000000000016dfee R11: ffffbd77cf99f218 R12: 0000000000020000
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938720] R13: ffff9753abfe9000 R14: 0000000000000001 R15: 0000000000000100
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938723] FS: 00007f55bb523740(0000) GS:ffff9753ff880000(0000) knlGS:0000000000000000
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938723] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938724] CR2: 00007f55b41d6000 CR3: 0000801f6be28000 CR4: 0000000000340ee0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938726] Call Trace:
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938730] dma_direct_map_page+0xe2/0xf0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938731] dma_direct_map_sg+0x6c/0xc0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938735] nvme_queue_rq+0x6fb/0xbd0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938738] __blk_mq_try_issue_directly+0x139/0x200
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938741] ? mempool_free_slab+0x1/0x20
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938743] blk_mq_request_issue_directly+0x4b/0xe0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938744] blk_mq_try_issue_list_directly+0x46/0xb0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938746] blk_mq_sched_insert_requests+0xb7/0x100
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938747] blk_mq_flush_plug_list+0x1eb/0x2a0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938749] blk_flush_plug_list+0xd1/0x100
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938750] blk_mq_make_request+0x306/0x5a0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938751] generic_make_request+0x121/0x300
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938754] ? ext4_attr_store+0x60/0x270
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938755] submit_bio+0x46/0x1c0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938756] ? submit_bio+0x46/0x1c0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938758] ext4_io_submit+0x4d/0x60
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938760] ext4_writepages+0x624/0xe80
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938762] do_writepages+0x4b/0xe0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938763] ? do_writepages+0x4b/0xe0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938765] ? __ext4_find_entry+0x20e/0x440
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938767] ? __wake_up_common_lock+0x8c/0xc0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938768] __filemap_fdatawrite_range+0xcb/0x100
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938769] ? __filemap_fdatawrite_range+0xcb/0x100
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938770] filemap_flush+0x1c/0x20
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938771] ext4_alloc_da_blocks+0x2c/0x70
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938772] ext4_rename+0x6ee/0x930
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938774] ext4_rename2+0x8d/0xc0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938776] vfs_rename+0x3dc/0xa80
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938778] ? __d_lookup+0x104/0x140
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938779] ? __d_lookup+0x90/0x140
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938780] do_renameat2+0x4ca/0x590
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938781] ? do_renameat2+0x4ca/0x590
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938783] __x64_sys_rename+0x20/0x30
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938785] do_syscall_64+0x57/0x190
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938788] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938789] RIP: 0033:0x7f55baf91e17
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938791] Code: 75 12 48 89 df e8 79 60 09 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5b c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 09 f3 c3 0f 1f 80 00 00 00 00 48 8b 15 39 f0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938791] RSP: 002b:00007ffed8ceca78 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938792] RAX: ffffffffffffffda RBX: 00007ffed8cecaf0 RCX: 00007f55baf91e17
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938793] RDX: 0000000001583010 RSI: 000000000181f7e0 RDI: 000000000188c7f0
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938793] RBP: 0000000000000001 R08: 0000000000004f9c R09: 00007f55b41d7958
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938794] R10: 0000000000004f6c R11: 0000000000000246 R12: 00000000017c0310
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938794] R13: 0000000001807f90 R14: 00007ffed8cecc60 R15: 00000000017d5e90
Jan 19 12:54:30 b-lgcp-5-4-gcp-5-4-0-n2dstd64-boot kernel: [ 137.938795] ---[ end trace 6a8dfca91c44dc9b ]---

CVE References

Revision history for this message
Francis Ginther (fginther) wrote :
tags: added: 5.4 focal gcp sru-20220321 sru-20220418 sru-20220711 ubuntu-boot
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This looks like to be caused by a full swiotlb. NVME will use DMA_ATTR_NO_WARN, which means there won't be any message about a full swiotlb. On 5.15, report_addr (in fact, its equivalent dev_WARN_ONCE) is not called in the case of swiotlb failure.

Also, comparing 5.4 and 5.15 on that instance, one is allocatiing 64MiB and the other, 1024MiB, which might explain why swiotlb gets full on 5.4.

description: updated
Changed in linux-gcp (Ubuntu):
status: New → In Progress
status: In Progress → Invalid
Changed in linux-gcp (Ubuntu Focal):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Focal):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1958416

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux-gcp (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gcp/5.4.0-1096.105 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-gcp verification-needed-focal
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This warning + kernel taint has now gone with linux-gcp/5.4.0-1096.105. Thanks!

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.2 KiB)

This bug was fixed in the package linux-gcp - 5.4.0-1097.106

---------------
linux-gcp (5.4.0-1097.106) focal; urgency=medium

  * focal/linux-gcp: 5.4.0-1097.106 -proposed tracker (LP: #1997813)

  * b/linux-gcp-5.4: log_check WARNING on n2d-standard-64 (LP: #1958416)
    - x86, swiotlb: Adjust SWIOTLB bounce buffer size for SEV guests

  [ Ubuntu: 5.4.0-136.153 ]

  * focal/linux: 5.4.0-136.153 -proposed tracker (LP: #1997835)
  * Expose built-in trusted and revoked certificates (LP: #1996892)
    - [Packaging] Expose built-in trusted and revoked certificates
  * [UBUNTU 20.04] KVM: PV: ext call delivered twice when receiver in PSW wait
    (LP: #1995941)
    - KVM: s390: pv: don't present the ecall interrupt twice
  * [UBUNTU 20.04] boot: Add s390x secure boot trailer (LP: #1996071)
    - s390/boot: add secure boot trailer
  * Fix rfkill causing soft blocked wifi (LP: #1996198)
    - platform/x86: hp_wmi: Fix rfkill causing soft blocked wifi
  * md: Replace snprintf with scnprintf (LP: #1993315)
    - md: Replace snprintf with scnprintf
  * input/keyboard: the keyboard on some Asus laptops can't work (LP: #1992266)
    - ACPI: resource: Skip IRQ override on Asus Vivobook K3402ZA/K3502ZA
    - ACPI: resource: Add ASUS model S5402ZA to quirks
  * Focal update: v5.4.218 upstream stable release (LP: #1995530)
    - mm: pagewalk: Fix race between unmap and page walker
    - perf tools: Fixup get_current_dir_name() compilation
    - firmware: arm_scmi: Add SCMI PM driver remove routine
    - dmaengine: xilinx_dma: cleanup for fetching xlnx,num-fstores property
    - dmaengine: xilinx_dma: Report error in case of dma_set_mask_and_coherent API
      failure
    - ARM: dts: fix Moxa SDIO 'compatible', remove 'sdhci' misnomer
    - scsi: qedf: Fix a UAF bug in __qedf_probe()
    - net/ieee802154: fix uninit value bug in dgram_sendmsg
    - um: Cleanup syscall_handler_t cast in syscalls_32.h
    - um: Cleanup compiler warning in arch/x86/um/tls_32.c
    - arch: um: Mark the stack non-executable to fix a binutils warning
    - usb: mon: make mmapped memory read only
    - USB: serial: ftdi_sio: fix 300 bps rate for SIO
    - mmc: core: Replace with already defined values for readability
    - mmc: core: Terminate infinite loop in SD-UHS voltage switch
    - rpmsg: qcom: glink: replace strncpy() with strscpy_pad()
    - nilfs2: fix leak of nilfs_root in case of writer thread creation failure
    - nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
    - ceph: don't truncate file in atomic_open
    - random: clamp credited irq bits to maximum mixed
    - ALSA: hda: Fix position reporting on Poulsbo
    - efi: Correct Macmini DMI match in uefi cert quirk
    - USB: serial: qcserial: add new usb-id for Dell branded EM7455
    - random: restore O_NONBLOCK support
    - random: avoid reading two cache lines on irq randomness
    - random: use expired timer rather than wq for mixing fast pool
    - Input: xpad - add supported devices as contributed on github
    - Input: xpad - fix wireless 360 controller breaking after suspend
    - Linux 5.4.218
  * Focal update: v5.4.217 upstream stable release (LP: #1995528)
    - xfs: fix misuse...

Changed in linux-gcp (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.