io_uring regression - lost write request

Bug #1952222 reported by Stefan Metzmacher
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Impish
Fix Released
Medium
Unassigned
linux-hwe-5.13 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Impish
Invalid
Undecided
Unassigned
linux-oem-5.13 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Won't Fix
Undecided
Unassigned
Impish
Invalid
Undecided
Unassigned
linux-oem-5.14 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
Timo Aaltonen
Impish
Invalid
Undecided
Unassigned

Bug Description

There's a regression with io_uring reported against 5.11-5.14 kernels, see:

https://lore.kernel.<email address hidden>/T/#m8bf4fdb4c91d8ea231517d9f936d1e4354c40a3b

From reading the thread and looking at the history in linux-stable/linux-5.15.y,
I assume that Linux 5.15.3 is the first fully fixed version.
And 5.10 was not affected.

Sadly all other versions are EOL for kernel.org.

But ubuntu provides 5.11, 5.13 and 5.14 based kernels,
which most likely have the bug.

I'm reporting it against linux-hwe-5.13 as that's what I test on most systems,
but others e.g. linux-oem-5.13 and linux-oem-5.14 have the same problem.

affects: samba (Ubuntu) → linux-hwe-5.13 (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1952222

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Stefan Metzmacher (metze) wrote :

apport-collect is not useful, someone needs to read the initial comment and backport
the existing fixes from 5.15.y

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux-hwe-5.13 (Ubuntu):
status: New → Confirmed
Changed in linux-oem-5.13 (Ubuntu):
status: New → Confirmed
Changed in linux-oem-5.14 (Ubuntu):
status: New → Confirmed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

the fix is in 5.15-rc1 and up, jammy has 5.15.x

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Changed in linux-oem-5.13 (Ubuntu Impish):
status: New → Invalid
Changed in linux (Ubuntu Focal):
status: New → Invalid
Changed in linux-oem-5.13 (Ubuntu):
status: Confirmed → Invalid
Changed in linux-oem-5.14 (Ubuntu):
status: Confirmed → Invalid
Changed in linux-oem-5.14 (Ubuntu Impish):
status: New → Invalid
Changed in linux-hwe-5.13 (Ubuntu Impish):
status: New → Invalid
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

oem-5.13 will migrate to oem-5.14 next cycle

Changed in linux-oem-5.13 (Ubuntu Focal):
status: New → Won't Fix
Changed in linux-hwe-5.13 (Ubuntu):
status: Confirmed → Invalid
Changed in linux-oem-5.14 (Ubuntu Focal):
assignee: nobody → Timo Aaltonen (tjaalton)
status: New → Confirmed
Revision history for this message
Stefan Metzmacher (metze) wrote :

Thanks Timo for taking a look!

Changed in linux-hwe-5.13 (Ubuntu Focal):
status: New → Confirmed
Changed in linux (Ubuntu Impish):
status: New → Confirmed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-5.14/5.14.0-1023.25 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Stefan Metzmacher (metze) wrote :

5.14.0-1023.25 seems to work fine.

I used the steps similar to
https://lore.kernel.<email address hidden>/

cmake -DPLUGIN_{MROONGA,ROCKSDB,CONNECT,SPIDER,SPHINX,S3,COLUMNSTORE}=NO -DWITH_URING=ON ../mariadb-server
make -j8
mysql-test/mtr --mysqld=--innodb_use_native_aio=1 --nowarnings --parallel=4 --force encryption.innochecksum{,,,,,} --mysqld=--innodb_io_capacity=50000 --mysqld=--innodb_io_capacity_max=90000

And it failed twice with 5.14.0-1022.24, while it finished twice on 5.14.0-1023.25

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

thanks for testing!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.14 - 5.14.0-1024.26

---------------
linux-oem-5.14 (5.14.0-1024.26) focal; urgency=medium

  * focal/linux-oem-5.14: 5.14.0-1024.26 -proposed tracker (LP: #1961191)

  * Packaging resync (LP: #1786013)
    - [Config] Update config to match upstream stable release
    - [Packaging] update variants

  * CVE-2022-0435
    - tipc: improve size validations for received domain records

  * CVE-2022-0492
    - cgroup-v1: Require capabilities to set release_agent

  * Use EC GPE for s2idle wakeup on AMD platforms (LP: #1960771)
    - ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"

  * USB port lost function after unplugging usb drive (LP: #1958850)
    - block: add disk sequence number
    - mm: hide laptop_mode_wb_timer entirely behind the BDI API
    - block: pass a gendisk to blk_queue_update_readahead
    - block: add a queue_has_disk helper
    - block: move the bdi from the request_queue to the gendisk
    - block: remove the bd_bdi in struct block_device
    - nvme: use blk_mq_alloc_disk
    - st: do not allocate a gendisk
    - sg: do not allocate a gendisk
    - block: cleanup the lockdep handling in *alloc_disk
    - block: remove alloc_disk and alloc_disk_node
    - block: remove the minors argument to __alloc_disk_node
    - block: pass a request_queue to __blk_alloc_disk
    - block: hold a request_queue reference for the lifetime of struct gendisk
    - block: add an explicit ->disk backpointer to the request_queue
    - writeback: make the laptop_mode prototypes available unconditionally
    - sg: pass the device name to blk_trace_setup
    - block: factor out a blk_try_enter_queue helper
    - block: drain file system I/O on del_gendisk
    - block: keep q_usage_counter in atomic mode after del_gendisk
    - block: drain queue after disk is removed from sysfs
    - nvdimm/pmem: stop using q_usage_count as external pgmap refcount
    - nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned

  * Focal update: upstream stable patchset 2022-01-31 (LP: #1959569)
    - iommu/amd: Restore GA log/tail pointer on host resume

 -- Timo Aaltonen <email address hidden> Thu, 17 Feb 2022 13:23:43 +0200

Changed in linux-oem-5.14 (Ubuntu Focal):
status: Confirmed → Fix Released
Revision history for this message
Stefan Metzmacher (metze) wrote :

@tjaalton what will happen to impish 5.13 and focal hwe-5.13?

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

sent the patch to the list for impish, and hwe-5.13 will get it from there

Stefan Bader (smb)
Changed in linux (Ubuntu Impish):
importance: Undecided → Medium
Changed in linux (Ubuntu Impish):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.13.0-41.46 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-impish' to 'verification-done-impish'. If the problem still exists, change the tag 'verification-needed-impish' to 'verification-failed-impish'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-impish
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hello @metze,

The Impish kernel with the patches for this issue is currently in -proposed. Could you please test this kernel to check whether it's fixed? Thank you.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (33.8 KiB)

This bug was fixed in the package linux-hwe-5.13 - 5.13.0-41.46~20.04.1

---------------
linux-hwe-5.13 (5.13.0-41.46~20.04.1) focal; urgency=medium

  * focal/linux-hwe-5.13: 5.13.0-41.46~20.04.1 -proposed tracker (LP: #1969013)

  [ Ubuntu: 5.13.0-41.46 ]

  * impish/linux: 5.13.0-41.46 -proposed tracker (LP: #1969014)
  * NVMe devices fail to probe due to ACPI power state change (LP: #1942624)
    - ACPI: power: Rework turning off unused power resources
    - ACPI: PM: Do not turn off power resources in unknown state
  * Recent 5.13 kernel has broken KVM support (LP: #1966499)
    - KVM: Add infrastructure and macro to mark VM as bugged
    - KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the VM
    - KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled
  * LRMv6: add multi-architecture support (LP: #1968774)
    - [Packaging] resync dkms-build{,--nvidia-N}
  * io_uring regression - lost write request (LP: #1952222)
    - io-wq: split bounded and unbounded work into separate lists
  * xfrm interface cannot be changed anymore (LP: #1968591)
    - xfrm: fix the if_id check in changelink
  * Use kernel-testing repo from launchpad for ADT tests (LP: #1968016)
    - [Debian] Use kernel-testing repo from launchpad
  * vmx_ldtr_test in ubuntu_kvm_unit_tests failed (FAIL: Expected 0 for L1 LDTR
    selector (got 50)) (LP: #1956315)
    - KVM: nVMX: Set LDTR to its architecturally defined value on nested VM-Exit
  * audio from external sound card is distorted (LP: #1966066)
    - ALSA: usb-audio: Fix packet size calculation regression
  * Impish update: upstream stable patchset 2022-04-12 (LP: #1968771)
    - cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
    - btrfs: tree-checker: check item_size for inode_item
    - btrfs: tree-checker: check item_size for dev_item
    - clk: jz4725b: fix mmc0 clock gating
    - vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
    - parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
    - parisc/unaligned: Fix ldw() and stw() unalignment handlers
    - KVM: x86/mmu: make apf token non-zero to fix bug
    - drm/amdgpu: disable MMHUB PG for Picasso
    - drm/i915: Correctly populate use_sagv_wm for all pipes
    - sr9700: sanity check for packet length
    - USB: zaurus: support another broken Zaurus
    - CDC-NCM: avoid overflow in sanity checking
    - x86/fpu: Correct pkru/xstate inconsistency
    - tee: export teedev_open() and teedev_close_context()
    - optee: use driver internal tee_context for some rpc
    - ping: remove pr_err from ping_lookup
    - perf data: Fix double free in perf_session__delete()
    - bnx2x: fix driver load from initrd
    - bnxt_en: Fix active FEC reporting to ethtool
    - hwmon: Handle failure to register sensor with thermal zone correctly
    - bpf: Do not try bpf_msg_push_data with len 0
    - selftests: bpf: Check bpf_msg_push_data return value
    - bpf: Add schedule points in batch ops
    - io_uring: add a schedule point in io_add_buffers()
    - net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends
    - tipc: Fix end of loop tests for list_for_each_entry()
    - gso...

Changed in linux-hwe-5.13 (Ubuntu Focal):
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (33.7 KiB)

This bug was fixed in the package linux - 5.13.0-41.46

---------------
linux (5.13.0-41.46) impish; urgency=medium

  * impish/linux: 5.13.0-41.46 -proposed tracker (LP: #1969014)

  * NVMe devices fail to probe due to ACPI power state change (LP: #1942624)
    - ACPI: power: Rework turning off unused power resources
    - ACPI: PM: Do not turn off power resources in unknown state

  * Recent 5.13 kernel has broken KVM support (LP: #1966499)
    - KVM: Add infrastructure and macro to mark VM as bugged
    - KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the VM
    - KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled

  * LRMv6: add multi-architecture support (LP: #1968774)
    - [Packaging] resync dkms-build{,--nvidia-N}

  * io_uring regression - lost write request (LP: #1952222)
    - io-wq: split bounded and unbounded work into separate lists

  * xfrm interface cannot be changed anymore (LP: #1968591)
    - xfrm: fix the if_id check in changelink

  * Use kernel-testing repo from launchpad for ADT tests (LP: #1968016)
    - [Debian] Use kernel-testing repo from launchpad

  * vmx_ldtr_test in ubuntu_kvm_unit_tests failed (FAIL: Expected 0 for L1 LDTR
    selector (got 50)) (LP: #1956315)
    - KVM: nVMX: Set LDTR to its architecturally defined value on nested VM-Exit

  * audio from external sound card is distorted (LP: #1966066)
    - ALSA: usb-audio: Fix packet size calculation regression

  * Impish update: upstream stable patchset 2022-04-12 (LP: #1968771)
    - cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
    - btrfs: tree-checker: check item_size for inode_item
    - btrfs: tree-checker: check item_size for dev_item
    - clk: jz4725b: fix mmc0 clock gating
    - vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
    - parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
    - parisc/unaligned: Fix ldw() and stw() unalignment handlers
    - KVM: x86/mmu: make apf token non-zero to fix bug
    - drm/amdgpu: disable MMHUB PG for Picasso
    - drm/i915: Correctly populate use_sagv_wm for all pipes
    - sr9700: sanity check for packet length
    - USB: zaurus: support another broken Zaurus
    - CDC-NCM: avoid overflow in sanity checking
    - x86/fpu: Correct pkru/xstate inconsistency
    - tee: export teedev_open() and teedev_close_context()
    - optee: use driver internal tee_context for some rpc
    - ping: remove pr_err from ping_lookup
    - perf data: Fix double free in perf_session__delete()
    - bnx2x: fix driver load from initrd
    - bnxt_en: Fix active FEC reporting to ethtool
    - hwmon: Handle failure to register sensor with thermal zone correctly
    - bpf: Do not try bpf_msg_push_data with len 0
    - selftests: bpf: Check bpf_msg_push_data return value
    - bpf: Add schedule points in batch ops
    - io_uring: add a schedule point in io_add_buffers()
    - net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends
    - tipc: Fix end of loop tests for list_for_each_entry()
    - gso: do not skip outer ip header in case of ipip and net_failover
    - openvswitch: Fix setting ipv6 fields causing hw csum failure
   ...

Changed in linux (Ubuntu Impish):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.