Focal 20.04.4 5.13.0-27-generic crashing disabling CPUs

Bug #1966870 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi I'm facing the following crash now two times in a row while runnign the same
test - so somewhat reproducible it seems:

[ 1444.399448] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 1444.431172] #PF: supervisor write access in kernel mode
[ 1444.454715] #PF: error_code(0x0002) - not-present page
[ 1444.478052] PGD 0 P4D 0
[ 1444.489448] Oops: 0002 [#1] SMP PTI
[ 1444.505120] CPU: 6 PID: 26233 Comm: chcpu Tainted: P W O 5.13.0-27-generic #29~20.04.1-Ubuntu
[ 1444.549884] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 01/22/2018
[ 1444.587322] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
[ 1444.611352] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
[ 1444.696490] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
[ 1444.720510] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0
[ 1444.752719] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000
[ 1444.784978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8
[ 1444.816712] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000
[ 1444.848844] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005
[ 1444.881389] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) knlGS:0000000000000000
[ 1444.918201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1444.944633] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0
[ 1444.977001] Call Trace:
[ 1444.988071] ? blk_mq_exit_hctx+0x160/0x160
[ 1445.007037] cpuhp_invoke_callback+0x179/0x430
[ 1445.027179] cpuhp_invoke_callback_range+0x44/0x80
[ 1445.048737] _cpu_down+0x109/0x310
[ 1445.064062] cpu_down+0x36/0x60
[ 1445.077882] cpu_device_down+0x16/0x20
[ 1445.094741] cpu_subsys_offline+0xe/0x10
[ 1445.112439] device_offline+0x8e/0xc0
[ 1445.129064] online_store+0x4c/0x90
[ 1445.144835] dev_attr_store+0x17/0x30
[ 1445.161307] sysfs_kf_write+0x3e/0x50
[ 1445.177856] kernfs_fop_write_iter+0x138/0x1c0
[ 1445.198036] new_sync_write+0x117/0x1b0
[ 1445.215386] vfs_write+0x185/0x250
[ 1445.230649] ksys_write+0x67/0xe0
[ 1445.245565] __x64_sys_write+0x1a/0x20
[ 1445.262448] do_syscall_64+0x61/0xb0
[ 1445.278585] ? do_syscall_64+0x6e/0xb0
[ 1445.295940] ? asm_exc_page_fault+0x8/0x30
[ 1445.314969] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1445.338356] RIP: 0033:0x7f1c8fda30a7
[ 1445.355062] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 1445.440161] RSP: 002b:00007fffed1c4418 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1445.474829] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f1c8fda30a7
[ 1445.507219] RDX: 0000000000000001 RSI: 0000559369f25869 RDI: 0000000000000004
[ 1445.539438] RBP: 00007f1c8fe88500 R08: 0000000000000000 R09: 00007fffed1c43c0
[ 1445.572547] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[ 1445.604842] R13: 00007fffed1c4420 R14: 0000559369f25869 R15: 0000000000000001
[ 1445.636897] Modules linked in: vhost_net tap ebtable_filter ebtables veth nbd xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_tables ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 uio_pci_generic uio nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rpcrdma kvm_intel sunrpc kvm rdma_ucm ib_iser libiscsi scsi_transport_iscsi rapl ib_umad rdma_cm ib_ipoib intel_cstate efi_pstore iw_cm ib_cm hpilo ioatdma acpi_ipmi acpi_tad ipmi_si mac_hid acpi_power_meter sch_fq_codel ipmi_devintf ipmi_msghandler msr
[ 1445.636951] ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core ses enclosure mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt fb_sys_fops aesni_intel mlx5_core cec ixgbe pci_hyperv_intf crypto_simd xfrm_algo psample nvme cryptd rc_core hpsa mlxfw i2c_i801 dca xhci_pci drm i2c_smbus lpc_ich tg3 tls xhci_pci_renesas nvme_core mdio scsi_transport_sas wmi
[ 1446.267521] CR2: 0000000000000008
[ 1446.282506] ---[ end trace 99f81ab62ed1f929 ]---
[ 1446.308857] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
[ 1446.311595] ixgbe 0000:04:00.1 eno50: NIC Link is Up 10 Gbps, Flow Control: None
[ 1446.332973] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
[ 1446.332975] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
[ 1446.332977] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0
[ 1446.332978] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000
[ 1446.332978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8
[ 1446.332979] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000
[ 1446.332980] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005
[ 1446.332981] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) knlGS:0000000000000000
[ 1446.332982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1446.332983] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0

The system is somewhat stuck afterwards.
I can't get back to libvirt (not restart the service, not spawn a new guest),
nor openvswitch (ovs-vsctl show) all those calls get stuck while other things
somewhat work. But also e.g. a new ssh login is stuck, so debugging after the
crash is very limited.

The order in which the tests do things is like:
1. set up a simple openvswitch
2. start the libvirt network for this OVS instance
3. disable cpus 5-11 (as I want the test to only have 0-4)
4. start a KVM guest on that OVS with huge pages

Note: I reproduced this without step #4, #1 and #2 in the meantime, so we can ignore the VM and OVS that I originally mentioned.

--- details ---

I originally had details about the OVS and the VM here, but to be honest all that is left is "boot and run chcpu -d => crash". So not much details to share.

---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Mar 29 07:06 seq
 crw-rw---- 1 root audio 116, 33 Mar 29 07:06 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant DL360 Gen9
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic root=UUID=c941b173-e6b5-485a-a02b-8d966b8d3c73 ro --- console=ttyS1,115200
ProcVersionSignature: Ubuntu 5.13.0-27.29~20.04.1-generic 5.13.19
RelatedPackageVersions:
 linux-restricted-modules-5.13.0-27-generic N/A
 linux-backports-modules-5.13.0-27-generic N/A
 linux-firmware 1.187.29
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal uec-images
Uname: Linux 5.13.0-27-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: kvm libvirt
_MarkForUpload: True
dmi.bios.date: 01/22/2018
dmi.bios.release: 2.56
dmi.bios.vendor: HP
dmi.bios.version: P89
dmi.board.name: ProLiant DL360 Gen9
dmi.board.vendor: HP
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.ec.firmware.release: 2.60
dmi.modalias: dmi:bvnHP:bvrP89:bd01/22/2018:br2.56:efr2.60:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:sku780018-S01:
dmi.product.family: ProLiant
dmi.product.name: ProLiant DL360 Gen9
dmi.product.sku: 780018-S01
dmi.sys.vendor: HP

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : CRDA.txt

apport information

tags: added: apport-collected focal uec-images
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Lspci.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Lspci-vt.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Lsusb.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Lsusb-t.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Lsusb-v.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : ProcEnviron.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : ProcModules.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : UdevDb.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : WifiSyslog.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : acpidump.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: Focal 20.04.4 crashing when using openvswitch/hugepages

Ok, I have manually retried this - we do not need the KVM guest, it is the chcpu that kills it.
Simplifying description ...

summary: - Focal 20.04.4 crashing when using openvswitch/hugepages
+ Focal 20.04.4 crashing when using openvswitch and disabling CPUs
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: Focal 20.04.4 crashing when using openvswitch and disabling CPUs
Download full text (6.3 KiB)

I have tried the same again, this time with openvswitch unconfigured (but still running).
ubuntu@node-horsea:~$ sudo ovs-vsctl del-br ovsbr0
ubuntu@node-horsea:~$ sudo ovs-vsctl show
8dfc2067-7b9b-48d7-a50a-df17bbd3cb6c
    ovs_version: "2.13.5"

chcpu disabling/enabling still crashes.

ubuntu@node-horsea:~$ sudo chcpu -d 5-11
Killed

[ +3.357665] IRQ 56: no longer affine to CPU5
[ +0.000021] IRQ 72: no longer affine to CPU5
[ +0.000009] IRQ 82: no longer affine to CPU5
[ +0.000011] IRQ 96: no longer affine to CPU5
[ +0.000019] IRQ 121: no longer affine to CPU5
[ +0.000012] IRQ 136: no longer affine to CPU5
[ +0.002380] smpboot: CPU 5 is now offline
[ +0.000468] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ +0.031357] #PF: supervisor write access in kernel mode
[ +0.023816] #PF: error_code(0x0002) - not-present page
[ +0.023147] PGD 0 P4D 0
[ +0.011391] Oops: 0002 [#1] SMP PTI
[ +0.015688] CPU: 11 PID: 5967 Comm: chcpu Tainted: P W O 5.13.0-27-generic #29~20.04.1-Ubuntu
[ +0.043614] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 01/22/2018
[ +0.037462] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
[ +0.023926] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
[ +0.084916] RSP: 0018:ffffacf3ccddbbb0 EFLAGS: 00010286
[ +0.023534] RAX: ffffccf3bfb7a580 RBX: 0000000000000000 RCX: ffffacf3ccddbbb0
[ +0.033000] RDX: ffffccf3bfb7a588 RSI: 0000000000000000 RDI: 0000000000000000
[ +0.032118] RBP: ffffacf3ccddbbe8 R08: 0000000000000000 R09: ffffacf3ccddbaa8
[ +0.032456] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8da022a50000
[ +0.032331] R13: ffffccf3bfb7a580 R14: ffffacf3ccddbbb0 R15: 0000000000000005
[ +0.032137] FS: 00007f1a4aff9580(0000) GS:ffff8da29fcc0000(0000) knlGS:0000000000000000
[ +0.036480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.025981] CR2: 0000000000000008 CR3: 00000005b29d6006 CR4: 00000000001706e0
[ +0.032158] Call Trace:
[ +0.010991] ? blk_mq_exit_hctx+0x160/0x160
[ +0.018782] cpuhp_invoke_callback+0x179/0x430
[ +0.020112] cpuhp_invoke_callback_range+0x44/0x80
[ +0.021557] _cpu_down+0x109/0x310
[ +0.015284] cpu_down+0x36/0x60
[ +0.014475] cpu_device_down+0x16/0x20
[ +0.016905] cpu_subsys_offline+0xe/0x10
[ +0.017652] device_offline+0x8e/0xc0
[ +0.016520] online_store+0x4c/0x90
[ +0.015662] dev_attr_store+0x17/0x30
[ +0.016581] sysfs_kf_write+0x3e/0x50
[ +0.016546] kernfs_fop_write_iter+0x138/0x1c0
[ +0.020381] new_sync_write+0x117/0x1b0
[ +0.017389] vfs_write+0x185/0x250
[ +0.015407] ksys_write+0x67/0xe0
[ +0.014961] __x64_sys_write+0x1a/0x20
[ +0.016879] do_syscall_64+0x61/0xb0
[ +0.016095] ? syscall_exit_to_user_mode+0x27/0x50
[ +0.021544] ? __x64_sys_faccessat+0x1c/0x20
[ +0.019347] ? do_syscall_64+0x6e/0xb0
[ +0.016881] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ +0.022785] RIP: 0033:0x7f1a4af140a7
[ +0.016084] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f...

Read more...

summary: - Focal 20.04.4 crashing when using openvswitch and disabling CPUs
+ Focal 20.04.4 5.13.0-27-generic crashing disabling CPUs
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (5.1 KiB)

As the last stage in the crash is in
  ? blk_mq_exit_hctx+0x160/0x160
I was looking if there was anything else with block devices going on.
I found another crash right at boot/init time (this one is also in the attached currentDmesg.txt).

[ 537.566942] ------------[ cut here ]------------
[ 537.566946] WARNING: CPU: 7 PID: 2421 at block/blk-mq.c:3087 blk_mq_release+0x45/0xe0
[ 537.566958] Modules linked in: nbd(+) xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_tables ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 uio_pci_generic uio nls_iso8859_1 rpcrdma sunrpc rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm rapl intel_cstate efi_pstore ioatdma hpilo acpi_ipmi ipmi_si acpi_tad mac_hid acpi_power_meter sch_fq_codel ipmi_devintf ipmi_msghandler msr ip_tables x_tables autofs4 btrfs
[ 537.567073] blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core ses enclosure mgag200 i2c_algo_bit crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_kms_helper syscopyarea sysfillrect aesni_intel sysimgblt mlx5_core fb_sys_fops pci_hyperv_intf crypto_simd ixgbe cec psample cryptd xfrm_algo nvme i2c_i801 rc_core mlxfw hpsa xhci_pci dca drm lpc_ich i2c_smbus tg3 xhci_pci_renesas nvme_core tls mdio scsi_transport_sas wmi
[ 537.567146] CPU: 7 PID: 2421 Comm: modprobe Tainted: P O 5.13.0-27-generic #29~20.04.1-Ubuntu
[ 537.567151] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 01/22/2018
[ 537.567154] RIP: 0010:blk_mq_release+0x45/0xe0
[ 537.567159] Code: 48 31 d2 eb 07 83 c2 01 39 f2 74 27 48 63 c2 48 8b 04 c7 48 85 c0 74 ed 48 8b 88 30 02 00 00 48 05 30 02 00 00 48 39 c1 75 db <0f> 0b 83 c2 01 39 f2 75 d9 49 8b 84 24 90 05 00 00 49 8d 9c 24 90
[ 537.567163] RSP: 0018:ffffc287c17f3a58 EFLAGS: 00010246
[ 537.567167] RAX: ffff9f2611f50230 RBX: ffff9f2609bfefa0 RCX: ffff9f2611f50230
[ 537.567170] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9f261e60ef38
[ 537.567173] RBP: ffffc287c17f3a70 R08: 0000000000000004 R09: 000000000000002c
[ 537.567175] R10: ffff9f2601c07800 R11: 00000000000001b6 R12: ffff9f2609bfef20
[ 537.567178] R13: ffff9f2609bfef20 R14: ffff9f2609bfefa0 R15: 0000000000000000
[ 537.567181] FS: 00007f1728b36680(0000) GS:ffff9f2d5fbc0000(0000) knlGS:0000000000000000
[ 537.567185] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 537.567188] CR2: 00007f814132242d CR3: 0000000119bf4003 CR4: 00000000001706e0
[ 537.567191] Call Trace:
[ 537.567197] blk_release_queue+0xbc/0x140
[ 537.567203] kob...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I tried things once more, this time OVS wasn't active at all (in the former try it was active after boot and then disabled) - still failing.

Next I tried to use a different kernel(5.4.0-105-generic)

I don't know when in >5.4 this started to fail, it surely worked a few months ago already using the an HWE kernel back then and running the same tests that now stumbled over this.

I usually run with:
linux-image-generic-hwe-20.04-edge 5.13.0.27.29~20.04.13 => the one that initially failed

And now I cross tested:
linux-generic 5.4.0.105.109 => working fine

Would it be helpful to you to also get (it is somewhat suspicious that egde is behind non-edge):
linux-image-generic-hwe-20.04 5.13.0.39.44~20.04.24
Would there be any other build that I should try helping you to pinpoint this?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI for a different test I upgraded the system to Impish and on the 5.13.0-39-generic there neither of the two kernel bugs happens.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This is actually fixed already in 5.13.0.39.44 as shown above, but the root cause that I'm facing is that edge-Kernel is behind non -egde:

$ rmadison -a amd64 -u ubuntu linux-image-generic-hwe-20.04-edge | grep focal
 linux-image-generic-hwe-20.04-edge | 5.13.0.27.29~20.04.13 | focal-security | amd64
 linux-image-generic-hwe-20.04-edge | 5.13.0.27.29~20.04.13 | focal-updates | amd64
 linux-image-generic-hwe-20.04-edge | 5.15.0.23.23~20.04.5 | focal-proposed | amd64

$ rmadison -a amd64 -u ubuntu linux-image-generic-hwe-20.04 | grep focal
 linux-image-generic-hwe-20.04 | 5.4.0.26.32 | focal | amd64
 linux-image-generic-hwe-20.04 | 5.13.0.39.44~20.04.24 | focal-security | amd64
 linux-image-generic-hwe-20.04 | 5.13.0.39.44~20.04.24 | focal-updates | amd64

And I see that this is not updated
  https://launchpad.net/ubuntu/+source/linux-meta-hwe-5.13
While this is stuck in proposed
  https://launchpad.net/ubuntu/+source/linux-meta-hwe-5.15
And never anyone levft proposed
  https://launchpad.net/ubuntu/+source/linux-meta-hwe-5.15/+publishinghistory

Stuck in proposed, but as long as 5.15 is stuck 5.13 should continue.
I do not have a perfect solution, but somehow edge should not fall behind non-edge.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.