sas driver call trace and insmod/rmmod SAS ko

Bug #1914976 reported by Fred Kimmy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Invalid
Undecided
Unassigned
Ubuntu-18.04-hwe
Invalid
Undecided
Unassigned

Bug Description

Hardware: X6000 (saenger in 18T lab)
Firmware: sudo dmidecode shows "Vendor: Huawei Corp. Version: 0.95 Release Date: 08/15/2019"
Kernel: Linux saenger 5.0.0-23-generic #24~18.04.1-Ubuntu SMP Mon Jul 29 16:10:24 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@saenger:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.0.0-23-generic root=UUID=835de629-96b1-4857-9181-8274f692e799 ro sysrq_always_enabled

[Steps to Reproduce Issue]
1. Deploy the x6000 with bionic-hwe with MaaS. (Install the system via ISO should work as well. I did not try this.)
2. Install the 5.0.0-23 kernel by
KVERSION=5.0.0-23; sudo apt-get update; sudo apt-get install linux-headers-$KVERSION linux-headers-$KVERSION-generic linux-image-$KVERSION-generic linux-modules-$KVERSION-generic linux-modules-extra-$KVERSION-generic -y
3. Boot to the 5.0.0-23 kernel by
sudo grub-reboot '1>2'; sudo reboot
4. sudo rmmod hisi_sas_v3_hw

[Expected Result]
No kernel trace

[Actual Result]
Kernel trace shows up immediately (tail -f /var/log/syslog will help you see it), and the kernel module is not unloaded successfully.

ubuntu@saenger:~$ lsmod | grep sas
hisi_sas_v3_hw 49152 0
hisi_sas_main 57344 1 hisi_sas_v3_hw
megaraid_sas 143360 2
libsas 90112 2 hisi_sas_v3_hw,hisi_sas_main
scsi_transport_sas 40960 3 hisi_sas_v3_hw,hisi_sas_main,libsas
ubuntu@saenger:~$ rmmod hisi_sas_v3_hw
rmmod: ERROR: ../libkmod/libkmod-module.c:793 kmod_module_remove_module() could not remove 'hisi_sas_v3_hw': Operation not permitted
rmmod: ERROR: could not remove module hisi_sas_v3_hw: Operation not permitted
ubuntu@saenger:~$ sudo rmmod hisi_sas_v3_hw
Segmentation fault (core dumped)
ubuntu@saenger:~$ lsmod | grep sas
hisi_sas_v3_hw 49152 -1
hisi_sas_main 57344 1 hisi_sas_v3_hw
megaraid_sas 143360 2
libsas 90112 2 hisi_sas_v3_hw,hisi_sas_main
scsi_transport_sas 40960 3 hisi_sas_v3_hw,hisi_sas_main,libsas
ubuntu@saenger:~$

[Additional Information]
1. According to my test with saenger, the following steps in the original description seem not necessary:
1-1. not necessary to append "cpumax=15 nr_cpus=15" to cause the kernel trace
1-2. not necessary to rmmod/insmod over and over again
1-3. it does not matter if the disk is mounted or not when trying to rmmod. Besides, it does not matter if the hisi_sas_v3_hw kernel module is successfully unloaded or not. Invoking "rmmod" will just raise the kernel trace.
2. bionic-hwe 5.4 ((bionic-hwe, 5.4.0-67-generic) won't reproduce this issue by following the above steps, and the module could successfully unload.

ubuntu@saenger:~$ uname -a
Linux saenger 5.4.0-67-generic #75~18.04.1-Ubuntu SMP Tue Feb 23 19:15:33 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@saenger:~$ sudo rmmod hisi_sas_v3_hw
ubuntu@saenger:~$
ubuntu@saenger:~$ lsmod | grep sas
hisi_sas_main 73728 0
libsas 102400 1 hisi_sas_main
megaraid_sas 163840 2
scsi_transport_sas 45056 2 hisi_sas_main,libsas
ubuntu@saenger:~$

====== Original Bug Description ======

[Bug Description]
if you use "intr_conv/auto_affine_msi_experimental" parameter to rmmod/insmod SAS ko, sas driver call trace.

[Steps to Reproduce]
1)set grub.cfg cmdline to “cpumax=15 nr_cpus=15”
2)boot ubuntu 18.04.3(Ubuntu 5.0.0-23-generic)
3) rmmod hisi_sas_v3_hw
4)insmod hisi_sas_v3_hw.ko auto_affine_msi_experimental=0
5)rmmod hisi_sas_v3_hw
6)insmod hisi_sas_v3_hw.ko auto_affine_msi_experimental=1
7)rmmod hisi_sas_v3_hw
8)insmod hisi_sas_v3_hw.ko intr_conv=0

root@Ubuntu:/lib/modules/5.0.21+/kernel/drivers/scsi/hisi_sas# lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 15
On-line CPU(s) list: 0-14
Thread(s) per core: 1
Core(s) per socket: 15
Socket(s): 1
NUMA node(s): 4
Vendor ID: 0x48
Model: 0
Stepping: 0x1
CPU max MHz: 2600.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 32768K
NUMA node0 CPU(s): 0-14
NUMA node1 CPU(s):
NUMA node2 CPU(s):
NUMA node3 CPU(s):
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asi

[Actual Results]
[ 335.426740] CPU: 4 PID: 2825 Comm: insmod Not tainted 5.0.0-23-generic #24~18.04.1-Ubuntu
[ 335.434878] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B110.01 01/07/2021
[ 335.443708] pstate: 40400009 (nZcv daif +PAN -UAO)
[ 335.448476] pc : free_contig_range+0xbc/0xd8
[ 335.452727] lr : free_contig_range+0xbc/0xd8
[ 335.456975] sp : ffff000011a2b7b0
[ 335.460274] x29: ffff000011a2b7b0 x28: ffff0000110ce000
[ 335.465560] x27: ffff0000116fbfd0 x26: ffff0000092945d0
[ 335.470847] x25: 000000000000102a x24: ffff0000110ce000
[ 335.476133] x23: 0000000000000400 x22: ffff7e0000000000
[ 335.481419] x21: 000000000003cb15 x20: 000000000003cb15
[ 335.486705] x19: 000000000003cb15 x18: ffffffffffffffff
[ 335.491991] x17: 0000000000000000 x16: 0000000000000000
[ 335.497277] x15: ffff0000116cc708 x14: ffff000091a2b497
[ 335.502563] x13: ffff000011a2b4a5 x12: ffff0000116f2000
[ 335.507849] x11: 0000000005f5e0ff x10: ffff0000116cd168
[ 335.513135] x9 : ffff0000112e6018 x8 : ffff000010796668
[ 335.518421] x7 : 6120736567617020 x6 : ffff8027ffee8210
[ 335.523707] x5 : ffff8027ffee8210 x4 : 0000000000000000
[ 335.528993] x3 : ffff8027ffef0948 x2 : ffff8027ffee8210
[ 335.534279] x1 : ed551064fa983e00 x0 : 0000000000000000
[ 335.539566] Call trace:
[ 335.542003] free_contig_range+0xbc/0xd8
[ 335.545908] cma_release+0xc4/0x160
[ 335.549380] dma_free_contiguous+0x68/0xa8
[ 335.553457] dma_direct_free+0x54/0x98
[ 335.557189] dma_free_attrs+0x90/0xd8
[ 335.560835] dmam_release+0x2c/0x38
[ 335.564307] release_nodes+0x150/0x240
[ 335.568039] devres_release_all+0x58/0x90
[ 335.572032] really_probe+0xfc/0x3c8
[ 335.575590] driver_probe_device+0x12c/0x148
[ 335.579841] __driver_attach+0x118/0x140
[ 335.583745] bus_for_each_dev+0x84/0xd8
[ 335.587563] driver_attach+0x30/0x40
[ 335.591121] bus_add_driver+0x174/0x2a8
[ 335.594940] driver_register+0x64/0x110
[ 335.598759] __pci_register_driver+0x58/0x68
[ 335.603012] sas_v3_pci_driver_init+0x30/0x1000 [hisi_sas_v3_hw]
[ 335.608991] do_one_initcall+0x54/0x1f0
[ 335.612810] do_init_module+0x64/0x1d8
[ 335.616542] load_module+0x1850/0x18d8
[ 335.620274] __se_sys_finit_module+0xf0/0x100
[ 335.624611] __arm64_sys_finit_module+0x24/0x30
[ 335.629122] el0_svc_common+0x78/0x120
[ 335.632854] el0_svc_handler+0x38/0x78
[ 335.636586] el0_svc+0x8/0xc
[ 335.639452] ---[ end trace 2d56e2c20fb40c7c ]---
[ 335.644080] BUG: Bad page state in process insmod pfn:37356
[ 335.649722] page:ffff7e0000dcd580 count:1 mapcount:0 mapping:0000000000000000 index:0x0
[ 335.657698] flags: 0xffff0000001000(reserved)
[ 335.662042] raw: 00ffff0000001000 ffff7e0000dcd588 ffff7e0000dcd588 0000000000000000
[ 335.669755] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 335.677466] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 335.683881] bad because of flags: 0x1000(reserved)

[Expected Results]
no error

[Reproducibility]
100%

[Additional information]
(Firmware version, kernel version, affected hardware, etc. if required):
kernel version:Linux Ubuntu 5.0.0-23-generic #24~18.04.1-Ubuntu SMP Mon Jul 29 16:10:24 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
build bionic source deb package,installing this deb, and not reproduce this bug.

[Resolution]
NA

Taihsiang Ho (tai271828)
Changed in kunpeng920:
assignee: nobody → Taihsiang Ho (taihsiangho)
Revision history for this message
Taihsiang Ho (tai271828) wrote :

Hi @Fred, may you elaborate how you boot the system in more details if possible? For example, did you boot the system with ram disk instead of hard disk? It's not possible to unload hisi_sas_v3_hw when your system boots with hard disks, so I suppose you booted the system without mounting any hard disk.

Revision history for this message
dann frazier (dannf) wrote :
Download full text (4.9 KiB)

One way to test this on a system with hisi_sas_v3_hw root is to pause the boot in the initramfs before it mounts the root device. Since the device is not yet in-use, the module should be unloadable. You can cause the initramfs to pause by adding 'break' to the kernel command line. I tested this on one of our systems. With the 5.0.0-23 kernel, I found that the kernel crashed immediately after removing the module, even without specifying any module parameters[*]. The 5.0.0 series is no longer supported, so I retried with the latest 5.4.0 kernel. I was unable to reproduce with 5.4.0[**]. So it seems the issue has since been fixed.

[*] Full console log attached, but here's the portion where the BUG is triggered:
(initramfs) cat /proc/version
Linux version 5.0.0-23-generic (buildd@bos02-arm64-055) (gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)) #24~18.04.1-Ubuntu SMP Mon Jul 29 16:10:24 UTC 2019
(initramfs) cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.0.0-23-generic root=UUID=835de629-96b1-4857-9181-8274f692e799 ro sysrq_always_enabled cpumax=15 nr_cpus=15 break
(initramfs) modprobe -r hisi_sas_v3_hw
[ 32.102724] BUG: Bad page state in process modprobe pfn:2027d5f00

[**]
(initramfs) cat /proc/version
Linux version 5.4.0-65-generic (buildd@bos02-arm64-056) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #73-Ubuntu SMP Mon Jan 18 17:27:25 UTC 2021
(initramfs) cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.4.0-65-generic root=UUID=835de629-96b1-4857-9181-8274f692e799 ro sysrq_always_enabled cpumax=15 nr_cpus=15 break
(initramfs) modprobe -r hisi_sas_v3_hw
(initramfs) modprobe hisi_sas_v3_hw auto_affine_msi_experimental=0
[ 280.585403] scsi host1: hisi_sas_v3_hw
[ 281.827848] scsi host6: hisi_sas_v3_hw
[ 283.068511] cma_alloc: 27 callbacks suppressed
[ 283.068512] cma: cma_alloc: alloc failed, req-size: 128 pages, ret: -12
[ 283.079609] cma: cma_alloc: alloc failed, req-size: 1024 pages, ret: -12
[ 283.087128] scsi host7: hisi_sas_v3_hw
[ 284.323413] cma: cma_alloc: alloc failed, req-size: 64 pages, ret: -12
[ 284.330080] cma: cma_alloc: alloc failed, req-size: 64 pages, ret: -12
[ 284.336624] cma: cma_alloc: alloc failed, req-size: 16 pages, ret: -12
[ 284.343147] cma: cma_alloc: alloc failed, req-size: 64 pages, ret: -12
[ 284.349689] cma: cma_alloc: alloc failed, req-size: 16 pages, ret: -12
[ 284.356211] cma: cma_alloc: alloc failed, req-size: 64 pages, ret: -12
[ 284.362754] cma: cma_alloc: alloc failed, req-size: 16 pages, ret: -12
[ 284.369276] cma: cma_alloc: alloc failed, req-size: 64 pages, ret: -12
[ 284.382496] scsi host8: hisi_sas_v3_hw
(initramfs) modprobe -r hisi_sas_v3_hw
(initramfs) modprobe hisi_sas_v3_hw auto_affine_msi_experimental=1
[ 292.204063] scsi host1: hisi_sas_v3_hw
[ 293.434483] hisi_sas_v3_hw 0000:74:02.0: Enable MSI auto-affinity
[ 293.449862] scsi host6: hisi_sas_v3_hw
[ 294.682483] hisi_sas_v3_hw 0000:74:04.0: Enable MSI auto-affinity
[ 294.698363] cma_alloc: 27 callbacks suppressed
[ 294.698365] cma: cma_alloc: alloc failed, req-size: 128 pages, ret: -12
[ 294.709464] cma: cma_alloc: alloc failed, req-size: 1024 pages, ret: -12
[ 294.716983] scsi host7: hisi_sas_v3_hw
[ 295.946...

Read more...

Revision history for this message
dann frazier (dannf) wrote :

Based on the description, this upstream fix is possibly related:

commit 7f054da7738a66fc70239ee899e74d899bad3834
Author: Luo Jiaxing <email address hidden>
Date: Fri Oct 2 22:30:32 2020 +0800

    scsi: hisi_sas: Use hisi_hba->cq_nvecs for calling calling synchronize_irq()

    A call trace is observed when running function level reset with online CPUs
    less than 16 and MSI auto-affinity enabled.

This landed upstream in v5.10 and is *not* in Ubuntu's 5.4 kernel. So while I could not reproduce using the steps in the description, perhaps the underlying issue is still there and can be reproduced another way. If there is no known reproducer, my suggestion would be for Huawei to submit this fix to the relevant upstream stable trees, and Ubuntu can pick it up from there without requiring a test case.

Revision history for this message
Taihsiang Ho (tai271828) wrote :

Regarding comment#1, I could only reproduce this issue on saenger[1] with 5.0.0-23 kernel. For segers[2] and scobee[3], I did not manage to reproduce.

[1] saenger (DMI: Huawei XA320 V2 /BC82HPNB, BIOS 0.95 08/15/2019)
    (initramfs) cat /proc/version /proc/cmdline
    Linux version 5.0.0-23-generic (buildd@bos02-arm64-055) (gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)) #24~18.04.1-Ubuntu SMP Mon Jul 29 16:10:24 UTC 2019
    BOOT_IMAGE=/boot/vmlinuz-5.0.0-23-generic root=UUID=835de629-96b1-4857-9181-8274f692e799 ro sysrq_always_enabled cpumax=15 nr_cpus=15 break

[2] segers (DMI: Huawei XA320 V2 /BC82HPNB, BIOS 0.95 08/15/2019)
    (initramfs) cat /proc/version /proc/cmdline
    Linux version 5.0.0-23-generic (buildd@bos02-arm64-055) (gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)) #24~18.04.1-Ubuntu SMP Mon Jul 29 16:10:24 UTC 2019
    BOOT_IMAGE=/boot/vmlinuz-5.0.0-23-generic root=UUID=e70c736e-0fc9-4459-9e23-d18d2fafb1bb ro sysrq_always_enabled cpumax=15 nr_cpus=15 break
    (initramfs) rmmod hisi_sas_v3_hw
    [ 245.645314] sd 7:0:0:0: [sda] Synchronizing SCSI cache
    [ 245.650527] sd 7:0:0:0: [sda] Stopping disk
    [ 245.820682] hisi_sas_v3_hw 0000:b4:04.0: dev[1:5] is gone
    [ 245.827193] sd 7:0:1:0: [sdb] Synchronizing SCSI cache
    [ 245.832395] sd 7:0:1:0: [sdb] Stopping disk
    [ 245.988757] hisi_sas_v3_hw 0000:b4:04.0: dev[2:5] is gone

[3] scobee
    (initramfs) cat /proc/cmdline
    BOOT_IMAGE=/boot/vmlinuz-5.0.0-23-generic root=UUID=e6b10f77-e7aa-431c-9711-aa68f3db5621 ro sysrq_always_enabled cpumax=15 nr_cpus=15 break
    (initramfs) cat /proc/version /proc/cmdline
    Linux version 5.0.0-23-generic (buildd@bos02-arm64-055) (gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)) #24~18.04.1-Ubuntu SMP Mon Jul 29 16:10:24 UTC 2019
    BOOT_IMAGE=/boot/vmlinuz-5.0.0-23-generic root=UUID=e6b10f77-e7aa-431c-9711-aa68f3db5621 ro sysrq_always_enabled cpumax=15 nr_cpus=15 break
    (initramfs) modprobe -r hisi_sas_v3_hw
    [ 51.431030] sas: Expander phys DID NOT change
    [ 51.437067] sas: Expander phys DID NOT change
    [ 51.443175] sas: Expander phys DID NOT change
    [ 51.449181] sas: Expander phys DID NOT change
    [ 51.455268] sas: Expander phys DID NOT change
    [ 51.461362] sas: Expander phys DID NOT change
    [ 51.467443] sas: Expander phys DID NOT change
    [ 51.472268] hisi_sas_v3_hw 0000:74:02.0: dev[4:1] is gone
    [ 51.478122] sd 2:0:1:0: [sdb] Synchronizing SCSI cache
    [ 51.488760] sd 2:0:1:0: [sdb] Stopping disk
    [ 52.025933] hisi_sas_v3_hw 0000:74:02.0: dev[3:5] is gone
    [ 52.031877] sd 2:0:0:0: [sda] Synchronizing SCSI cache
    [ 52.045136] sd 2:0:0:0: [sda] Stopping disk
    [ 52.626005] hisi_sas_v3_hw 0000:74:02.0: dev[2:5] is gone
    [ 52.632599] hisi_sas_v3_hw 0000:74:02.0: dev[1:2] is gone

Revision history for this message
Taihsiang Ho (tai271828) wrote :

I did not reproduce this issue on saenger[1], segers[2] and scobee[3] with 5.4.0-65 ubuntu kernel.

[1] saenger

(initramfs) cat /proc/version /proc/cmdline; rmmod hisi_sas_v3_hw
Linux version 5.4.0-65-generic (buildd@bos02-arm64-048) (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #73~18.04.1-Ubuntu SMP Tue Jan 19 09:05:37 UTC 2021
BOOT_IMAGE=/boot/vmlinuz-5.4.0-65-generic root=UUID=835de629-96b1-4857-9181-8274f692e799 ro sysrq_always_enabled cpumax=15 nr_cpus=15 break

[2] segers

(initramfs) cat /proc/version /proc/cmdline; rmmod hisi_sas_v3_hw
Linux version 5.4.0-65-generic (b[ 213.693246] sd 7:0:1:0: [sdb] Synchronizing SCSI cache
uildd@bos02-arm64-048) (gcc vers[ 213.700279] sd 7:0:1:0: [sdb] Stopping disk
ion 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #73~18.04.1-Ubuntu SMP Tue Jan 19 09:05:37 UTC 2021
BOOT_IMAGE=/boot/vmlinuz-5.4.0-65-generic root=UUID=e70c736e-0fc9-4459-9e23-d18d2fafb1bb ro sysrq_always_enabled cpumax=15 nr_cpus=15 break
[ 213.832514] hisi_sas_v3_hw 0000:b4:04.0: dev[2:5] is gone
[ 213.838536] sd 7:0:0:0: [sda] Synchronizing SCSI cache
[ 213.843735] sd 7:0:0:0: [sda] Stopping disk
[ 213.948525] hisi_sas_v3_hw 0000:b4:04.0: dev[1:5] is gone

[3] scobee
(initramfs) cat /proc/version /proc/cmdline; rmmod hisi_sas_v3_hw
Linux version 5.4.0-65-generic (buildd@bos02-arm64-048) (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #73~18.04.1-Ubuntu SMP Tue Jan 19 09:05:37 UTC 2021
BOOT_IMAGE=/boot/vmlinuz-5.4.0-65-generic root=UUID=e6b10f77-e7aa-431c-9711-aa68f3db5621 ro sysrq_always_enabled cpumax=15 nr_cpus=15 break
[ 28.694842] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 28.701910] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 28.708984] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 28.715951] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 28.723037] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 28.730053] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 28.737057] sas: ex 500e004aaaaaaa1f phys DID NOT change
[ 28.742851] hisi_sas_v3_hw 0000:74:02.0: dev[4:1] is gone
[ 28.748692] sd 1:0:1:0: [sdb] Synchronizing SCSI cache
[ 28.759216] sd 1:0:1:0: [sdb] Stopping disk
[ 29.273642] hisi_sas_v3_hw 0000:74:02.0: dev[3:5] is gone
[ 29.279548] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[ 29.293217] sd 1:0:0:0: [sda] Stopping disk
[ 29.865673] hisi_sas_v3_hw 0000:74:02.0: dev[2:5] is gone
[ 29.872431] hisi_sas_v3_hw 0000:74:02.0: dev[1:2] is gone

Revision history for this message
dann frazier (dannf) wrote :

I'll go ahead and mark Incomplete then, as I assume we either need better steps to reproduce, or for Huawei engineers to submit a fix to stable per:
  https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

Changed in kunpeng920:
status: New → Incomplete
Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

 it use PCIE+ NVME disk to boot ubuntu system then rmmod/insmod sas modules;
now main issue that we can reproduce it by using this ubuntu ISO version which is Ubuntu 5.0.0-23-generic tag; but we build bionic source code deb package to debug it. Bug can not reproduce it.
Can you provide kernel deb or build method about ubuntu ISO version?

Taihsiang Ho (tai271828)
tags: added: tairadar
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

As the 5.0 kernel is no longer supported, I'm afraid being able to reproduce the issue with that kernel will not allow a potential fix to be SRU'ed into the 5.4 kernel.

Could you confirm with your engineers that the patch Dann referred to in comment #3 would address the issue?

If it does address the issue, then in order to SRU this patch into the 5.4 kernel we would either need a reproducer for the 5.4 kernel, or for a Huawei engineer to submit it to the upstream stable kernel. The Ubuntu kernel regularly integrates patches from the upstream stable kernel. So, this approach would probably be simpler.

The following describes the upstream process to apply fixes to the upstream stable tree: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.htm

Revision history for this message
Taihsiang Ho (tai271828) wrote :

The process link should be https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
 (there was a typo and missing "l")

Revision history for this message
Taihsiang Ho (tai271828) wrote :

Hi, Xinwei,

Additionally, regarding comment#7, are you looking for debug symbols? If you are looking for debugging symbols to debug the kernel/ko, you may get the corresponding kernel deb (linux-image-5.0.0-23-generic) and its debug-symbol(linux-image-5.0.0-23-generic-dbgsym:) here https://launchpad.net/ubuntu/+source/linux-hwe/5.0.0-23.24~18.04.1 . You can download more debs like header files directly as well.

Alternatively, you may also follow this instruction https://wiki.ubuntu.com/Debug%20Symbol%20Packages to add ddebs.list repositories to get more debug symbol packages that you want/need.

Revision history for this message
Taihsiang Ho (tai271828) wrote :

For example, you can get the binary kernel deb with debug symbols by invoking the following commands:

$ mkdir lp1914976
$ cd lp1914976
$ wget http://launchpadlibrarian.net/435193746/linux-image-5.0.0-23-generic-dbgsym_5.0.0-23.24~18.04.1_arm64.ddeb http://launchpadlibrarian.net/435113781/linux-image-unsigned-5.0.0-23-generic-dbgsym_5.0.0-23.24~18.04.1_arm64.ddeb http://launchpadlibrarian.net/435193747/linux-image-5.0.0-23-generic_5.0.0-23.24~18.04.1_arm64.deb http://launchpadlibrarian.net/435113778/linux-headers-5.0.0-23-generic_5.0.0-23.24~18.04.1_arm64.deb http://launchpadlibrarian.net/435129221/linux-headers-5.0.0-23_5.0.0-23.24~18.04.1_all.deb http://launchpadlibrarian.net/435113782/linux-modules-5.0.0-23-generic_5.0.0-23.24~18.04.1_arm64.deb http://launchpadlibrarian.net/435113783/linux-modules-extra-5.0.0-23-generic_5.0.0-23.24~18.04.1_arm64.deb
$ sudo dpkg -i *

Even more, you could unpack ddeb to get the kernel with debug symbols directly as well:
$ mkdir ./unpack-dbgsym
$ dpkg -x linux-image-unsigned-5.0.0-23-generic-dbgsym_5.0.0-23.24~18.04.1_arm64.ddeb ./unpack-dbgsym
$ file unpack-dbgsym/usr/lib/debug/boot/vmlinux-5.0.0-23-generic
unpack-dbgsym/usr/lib/debug/boot/vmlinux-5.0.0-23-generic: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), statically linked, BuildID[sha1]=0812c01a649760ac1b593e7c5a5d4903b0af2aea, with debug_info, not stripped

I hope the deb meets what you want. Please feel free let me know if you have more questions.

Revision history for this message
Taihsiang Ho (tai271828) wrote :

Hi, Fred,

I have updated the steps to reproduce the kernel trace. Please be informed that there are some reminders to you:

1. Did the kernel deb in comment #10 and its debug symbol deb (also in comment #10) meet your expectations? Is a kernel build instruction still needed for you? Do you have trouble when building the kernel by following this Ubuntu Wiki? https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel

2. You may be interested in the patch information in comment#3 . If the patch fixes your issue, we suggest upstreaming the patch to the corresponding stable kernel tree. Does the patch work for you?

3. If the patch is what you want, the target upstream stable kernel tree is v5.4. Ubuntu focal will pull and sync up upstream v5.4 stable release. Once the patch is upstreamed and focal pulls it, the bionic-hwe will get the patch automatically as well for the up-coming kernel SRU. This is an example to show v5.4 upstream stable kernel tree will be pulled by focal https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1918170 . Please also note Ubuntu 5.0.0 kernel is EOL. No patch for backporting is possible. We could not do anything further for 5.0.0.

Are you going to upstream the patch (to v5.4 stable release) in comment#3 ?

4. I could not reproduce this issue with kernel bionic-hwe, 5.4.0-67-generic. Could you reproduce the issue with the bionic-hwe 5.4 kernel?

description: updated
Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

=>1. Did the kernel deb in comment #10 and its debug symbol deb (also in comment #10) meet your expectations? Is a =>kernel build instruction still needed for you? Do you have trouble when building the kernel by following this Ubuntu =>Wiki? https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel

Can you build kernel source deb to reproduce it again. now our guy can not reproduce this bug if using this building kernel for debuging it.

=>2. You may be interested in the patch information in comment#3 . If the patch fixes your issue, we suggest upstreaming =>the patch to the corresponding stable kernel tree. Does the patch work for you?

Comment #3 patch maybe not solve this bug, Can you add this patch into ubuntu bionic (5.0.0-23) kernel to test it.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

@Xinwei,

Ubuntu kernel 5.0 is EOLed in Bionic. If you upgrade 18.04.2, you will get 5.4 HWE kernel. There will be no more 5.0 kernel update.

Unless we can reproduce with 5.4 kernel, there is no point to debug on 5.0 kernel.

Taihsiang Ho (tai271828)
tags: removed: tairadar
Taihsiang Ho (tai271828)
Changed in kunpeng920:
assignee: Taihsiang Ho (taihsiangho) → nobody
Revision history for this message
Ike Panhc (ikepanhc) wrote :

Since 5.0 kernel is EOLed, this bug is invalid. Please re-open if anyone can reproduce this issue with 5.4 kernel.

Changed in kunpeng920:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.