[Regression] crashkernel fails on HiSilicon D05

Bug #1806766 reported by dann frazier on 2018-12-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Status tracked in Disco
Cosmic
Undecided
dann frazier
Disco
Undecided
dann frazier

Bug Description

[Impact]
kdump support isn't usable on HiSilicon D05 systems. This previously worked in bionic.

[Test Case]
sudo apt install linux-crashdump
echo 'GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M"' | \
  sudo tee /etc/default/grub.d/kdump-tools.cfg
sudo update-grub
sudo reboot
echo 1 | sudo tee /proc/sys/kernel/sysrq
echo c | sudo tee /proc/sysrq-trigger

On failure:
[ 2.362261] ------------[ cut here ]------------
[ 2.362263] [CRTC:29:crtc-0] vblank wait timed out
[ 2.362294] WARNING: CPU: 0 PID: 143 at drivers/gpu/drm/drm_atomic_helper.c:1386 drm_atomic_helper_wait_for_vblanks.part.9+0x280/0x290 [drm_kms_helper]
[ 2.362295] Modules linked in: hibmc_drm(+) ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm hisi_sas_v2_hw(+) hisi_sas_main ehci_platform libsas scsi_transport_sas
[ 2.362309] CPU: 0 PID: 143 Comm: systemd-udevd Tainted: G C 4.19.0-7-generic #8-Ubuntu
[ 2.362310] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
[ 2.362312] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 2.362324] pc : drm_atomic_helper_wait_for_vblanks.part.9+0x280/0x290 [drm_kms_helper]
[ 2.362335] lr : drm_atomic_helper_wait_for_vblanks.part.9+0x280/0x290 [drm_kms_helper]
[ 2.362336] sp : ffff00000a2fb1f0
[ 2.362337] x29: ffff00000a2fb1f0 x28: 0000000000000001
[ 2.362339] x27: 0000000000000000 x26: 0000000000000001
[ 2.362342] x25: 0000000000000038 x24: ffff8000208c5800
[ 2.362344] x23: 0000000000000000 x22: 0000000000000001
[ 2.362346] x21: ffff80001eebb818 x20: ffff800025b18600
[ 2.362349] x19: 0000000000000000 x18: 0000000000000001
[ 2.362351] x17: 0000000000000000 x16: 0000000000000000
[ 2.362353] x15: ffffffffffffffff x14: ffff000009848708
[ 2.362355] x13: 0000000000000074 x12: ffff000009a12000
[ 2.362357] x11: ffff00000986d000 x10: ffff000009a122f8
[ 2.362359] x9 : 0000000000000001 x8 : ffff000009a15104
[ 2.362361] x7 : 0000000000000000 x6 : 0000004ce5700bb7
[ 2.362363] x5 : 00ffffffffffffff x4 : 0000000000000000
[ 2.362365] x3 : 0000000000000000 x2 : ffffffffffffffff
[ 2.362367] x1 : 0b15ae454042e100 x0 : 0000000000000000
[ 2.362370] Call trace:
[ 2.362381] drm_atomic_helper_wait_for_vblanks.part.9+0x280/0x290 [drm_kms_helper]
[ 2.362392] drm_atomic_helper_commit_tail+0x68/0x80 [drm_kms_helper]
[ 2.362402] commit_tail+0x7c/0x80 [drm_kms_helper]
[ 2.362413] drm_atomic_helper_commit+0xd8/0x150 [drm_kms_helper]
[ 2.362440] drm_atomic_commit+0x54/0x60 [drm]
[ 2.362451] restore_fbdev_mode_atomic+0x184/0x1f8 [drm_kms_helper]
[ 2.362461] restore_fbdev_mode+0x48/0x190 [drm_kms_helper]
[ 2.362472] drm_fb_helper_restore_fbdev_mode_unlocked+0x78/0xd8 [drm_kms_helper]
[ 2.362482] drm_fb_helper_set_par+0x34/0x60 [drm_kms_helper]
[ 2.362488] fbcon_init+0x3ac/0x4f0
[ 2.362491] visual_init+0xb8/0x110
[ 2.362492] do_bind_con_driver+0x1ec/0x3a8
[ 2.362494] do_take_over_console+0x148/0x208
[ 2.362495] do_fbcon_takeover+0x70/0xd8
[ 2.362497] fbcon_event_notify+0x838/0x8a8
[ 2.362501] notifier_call_chain+0x5c/0x98
[ 2.362502] blocking_notifier_call_chain+0x64/0x88
[ 2.362504] fb_notifier_call_chain+0x30/0x40
[ 2.362506] register_framebuffer+0x22c/0x328
[ 2.362516] __drm_fb_helper_initial_config_and_unlock+0x210/0x408 [drm_kms_helper]
[ 2.362526] drm_fb_helper_initial_config+0x4c/0x58 [drm_kms_helper]
[ 2.362530] hibmc_fbdev_init+0x88/0x190 [hibmc_drm]
[ 2.362534] hibmc_pci_probe+0x228/0x3c8 [hibmc_drm]
[ 2.362537] local_pci_probe+0x44/0xa8
[ 2.362539] pci_device_probe+0x194/0x1a8
[ 2.362541] really_probe+0x21c/0x3b8
[ 2.362543] driver_probe_device+0xe4/0x138
[ 2.362544] __driver_attach+0xe4/0x150
[ 2.362545] bus_for_each_dev+0x84/0xd8
[ 2.362547] driver_attach+0x30/0x40
[ 2.362548] bus_add_driver+0x1a8/0x288
[ 2.362550] driver_register+0x64/0x110
[ 2.362551] __pci_register_driver+0x58/0x68
[ 2.362555] hibmc_init+0x30/0x1000 [hibmc_drm]
[ 2.362557] do_one_initcall+0x54/0x1d8
[ 2.362560] do_init_module+0x60/0x1f0
[ 2.362561] load_module+0x15d0/0x18b8
[ 2.362563] __se_sys_finit_module+0xa0/0xf8
[ 2.362565] __arm64_sys_finit_module+0x24/0x30
[ 2.362567] el0_svc_common+0x94/0xe8
[ 2.362568] el0_svc_handler+0x38/0x78
[ 2.362570] el0_svc+0x8/0xc
[ 2.362571] ---[ end trace 8031150f999972d9 ]---

[Fix]
2 upstream patch series are required to fix this:
 https://<email address hidden>/msg10328.html
Which provides an EFI facility consumed by:
 https://lkml.org/lkml/2018/9/21/1066
There were also some follow-on fixes to deal with ARM-specific problems associated with this usage:
 https://www.spinics.net/lists/arm-kernel/msg685751.html

[Regression Risk]
The EFI changes are in architecture independent code where they add a new table and an API for adding regions to that table. However, this API is only used by the gic-v3-its driver, which is ARM-specific. On other architectures, this will be an empty table. It's possible that there is a bug in the table creation code that could cause regressions on other architectures, which would likely be seen in the form of a boot time error message ("Failed to install memreserve config table").

Risk mitigated by testing on both x86 ARM and EFI systems.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1806766

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Cosmic):
status: New → Incomplete
dann frazier (dannf) on 2018-12-05
description: updated
Changed in linux (Ubuntu Disco):
status: Incomplete → In Progress
Changed in linux (Ubuntu Cosmic):
status: Incomplete → In Progress
assignee: nobody → dann frazier (dannf)
Changed in linux (Ubuntu Disco):
assignee: nobody → dann frazier (dannf)
Seth Forshee (sforshee) on 2018-12-10
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
dann frazier (dannf) wrote :

Verified successful kdump on d05 (had to blacklist mlx drivers in crashkernel to avoid OOM, but that is expected).

tags: added: verification-done-cosmic
removed: verification-needed-cosmic
Launchpad Janitor (janitor) wrote :
Download full text (56.3 KiB)

This bug was fixed in the package linux - 4.18.0-14.15

---------------
linux (4.18.0-14.15) cosmic; urgency=medium

  * linux: 4.18.0-14.15 -proposed tracker (LP: #1811406)

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * [Regression] crashkernel fails on HiSilicon D05 (LP: #1806766)
    - efi: honour memory reservations passed via a linux specific config table
    - efi/arm: libstub: add a root memreserve config table
    - efi: add API to reserve memory persistently across kexec reboot
    - irqchip/gic-v3-its: Change initialization ordering for LPIs
    - irqchip/gic-v3-its: Simplify LPI_PENDBASE_SZ usage
    - irqchip/gic-v3-its: Split property table clearing from allocation
    - irqchip/gic-v3-its: Move pending table allocation to init time
    - irqchip/gic-v3-its: Keep track of property table's PA and VA
    - irqchip/gic-v3-its: Allow use of pre-programmed LPI tables
    - irqchip/gic-v3-its: Use pre-programmed redistributor tables with kdump
      kernels
    - irqchip/gic-v3-its: Check that all RDs have the same property table
    - irqchip/gic-v3-its: Register LPI tables with EFI config table
    - irqchip/gic-v3-its: Allow use of LPI tables in reserved memory
    - arm64: memblock: don't permit memblock resizing until linear mapping is up
    - efi/arm: Defer persistent reservations until after paging_init()
    - efi: Permit calling efi_mem_reserve_persistent() from atomic context
    - efi: Prevent GICv3 WARN() by mapping the memreserve table before first use

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: c...

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (14.1 KiB)

This bug was fixed in the package linux - 4.19.0-12.13

---------------
linux (4.19.0-12.13) disco; urgency=medium

  * linux: 4.19.0-12.13 -proposed tracker (LP: #1813664)

  * kernel oops in bcache module (LP: #1793901)
    - SAUCE: bcache: never writeback a discard operation

  * Disco update: 4.19.18 upstream stable release (LP: #1813611)
    - ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
    - mlxsw: spectrum: Disable lag port TX before removing it
    - mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
    - net: dsa: mv88x6xxx: mv88e6390 errata
    - net, skbuff: do not prefer skb allocation fails early
    - qmi_wwan: add MTU default to qmap network interface
    - ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
    - net: clear skb->tstamp in bridge forwarding path
    - netfilter: ipset: Allow matching on destination MAC address for mac and
      ipmac sets
    - gpio: pl061: Move irq_chip definition inside struct pl061
    - drm/amd/display: Guard against null stream_state in set_crc_source
    - drm/amdkfd: fix interrupt spin lock
    - ixgbe: allow IPsec Tx offload in VEPA mode
    - platform/x86: asus-wmi: Tell the EC the OS will handle the display off
      hotkey
    - e1000e: allow non-monotonic SYSTIM readings
    - usb: typec: tcpm: Do not disconnect link for self powered devices
    - selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
    - of: overlay: add missing of_node_put() after add new node to changeset
    - writeback: don't decrement wb->refcnt if !wb->bdi
    - serial: set suppress_bind_attrs flag only if builtin
    - bpf: Allow narrow loads with offset > 0
    - ALSA: oxfw: add support for APOGEE duet FireWire
    - x86/mce: Fix -Wmissing-prototypes warnings
    - MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
    - crypto: ecc - regularize scalar for scalar multiplication
    - arm64: perf: set suppress_bind_attrs flag to true
    - drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
    - clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
    - samples: bpf: fix: error handling regarding kprobe_events
    - usb: gadget: udc: renesas_usb3: add a safety connection way for
      forced_b_device
    - fpga: altera-cvp: fix probing for multiple FPGAs on the bus
    - selinux: always allow mounting submounts
    - ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
    - scsi: qedi: Check for session online before getting iSCSI TLV data.
    - drm/amdgpu: Reorder uvd ring init before uvd resume
    - rxe: IB_WR_REG_MR does not capture MR's iova field
    - efi/libstub: Disable some warnings for x86{,_64}
    - jffs2: Fix use of uninitialized delayed_work, lockdep breakage
    - clk: imx: make mux parent strings const
    - pstore/ram: Do not treat empty buffers as valid
    - media: uvcvideo: Refactor teardown of uvc on USB disconnect
    - powerpc/xmon: Fix invocation inside lock region
    - powerpc/pseries/cpuidle: Fix preempt warning
    - media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
    - ASoC: use dma_ops of parent device for acp_audio_dma
    - media: ve...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Andy Whitcroft (apw) on 2019-02-14
tags: added: kernel-fixup-verification-needed-bionic
removed: verification-needed-bionic
Andy Whitcroft (apw) wrote :

This bug was erroneously marked for verification in bionic; verification is not required and verification-needed-bionic is being removed.

Andy Whitcroft (apw) on 2019-02-14
tags: added: verification-done-bionic
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers