Panic on suspend/resume Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: sata_pmp_eh_recover+0xa2b/0xa40

Bug #1821434 reported by HG
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Focal
Fix Released
Medium
Unassigned

Bug Description

With efi-pstore activated I get the following panic in Linux 4.15.0-46 after resume:

<6>[12007.593358] ata10.01: SATA link down (SStatus 0 SControl 330)
<6>[12007.593469] ata10.02: hard resetting link
<6>[12007.908353] ata10.02: SATA link down (SStatus 0 SControl 330)
<6>[12007.911149] ata10.00: configured for UDMA/133
<0>[12007.972508] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: sata_pmp_eh_recover+0xa2b/0xa40
<0>[12007.972508]
<4>[12007.972515] CPU: 2 PID: 230 Comm: scsi_eh_9 Tainted: P OE 4.15.0-46-generic #49-Ubuntu
<4>[12007.972517] Hardware name: System manufacturer System Product Name/A320M-C, BIOS 1001 12/10/2017
<4>[12007.972518] Call Trace:
<4>[12007.972525] dump_stack+0x63/0x8b
<4>[12007.972530] panic+0xe4/0x244
<4>[12007.972533] ? sata_pmp_eh_recover+0xa2b/0xa40
<4>[12007.972536] __stack_chk_fail+0x19/0x20
<4>[12007.972538] sata_pmp_eh_recover+0xa2b/0xa40
<4>[12007.972543] ? ahci_do_softreset+0x260/0x260 [libahci]
<4>[12007.972545] ? ahci_do_hardreset+0x140/0x140 [libahci]
<4>[12007.972547] ? ata_phys_link_offline+0x60/0x60
<4>[12007.972549] ? ahci_stop_engine+0xc0/0xc0 [libahci]
<4>[12007.972552] sata_pmp_error_handler+0x22/0x30
<4>[12007.972554] ahci_error_handler+0x45/0x80 [libahci]
<4>[12007.972556] ata_scsi_port_error_handler+0x29b/0x770
<4>[12007.972558] ? ata_scsi_cmd_error_handler+0x101/0x140
<4>[12007.972559] ata_scsi_error+0x95/0xd0
<4>[12007.972562] ? scsi_try_target_reset+0x90/0x90
<4>[12007.972563] scsi_error_handler+0xd0/0x5b0
<4>[12007.972566] kthread+0x121/0x140
<4>[12007.972567] ? scsi_eh_get_sense+0x200/0x200
<4>[12007.972569] ? kthread_create_worker_on_cpu+0x70/0x70
<4>[12007.972572] ret_from_fork+0x22/0x40
<0>[12007.972591] Kernel Offset: 0xcc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

I have also tried 4.18 and 4.20 from https://kernel.ubuntu.com/~kernel-ppa/mainline/ but I get the same problem. The problem seems to be related to an add-on PCI-E SATA card:
22:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)

When disks are connected to the card port -> panic, no disks connected -> no panic.

CVE References

HG (hggg-0)
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1821434

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Revision history for this message
HG (hggg-0) wrote :

Can't do apport when the crash happens, I get the logs from efi-pstore.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
You-Sheng Yang (vicamo) wrote :

I had a try on kernel 4.15.0-46 in Bionic with efi-pstore module loaded as pstore backend, but I cannot reproduce that kernel panic. It really takes more information before one can investigation more into this issue, so apport runs, even before the panic taking place, is really mandatory. Please give apport a chance to collect some more information by executing `apport-collect 1821434`.

By the way, have you tried to enable efi-pstore _without_ that SATA card installed. Maybe it's not related to efi-pstore at all?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
HG (hggg-0) wrote :

This kernel fixes it for me, I can suspend & resume no problem now with disks attached to the PCI adapter

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
Revision history for this message
HG (hggg-0) wrote :

Unfortunately after many successful suspend/resume cycles I got the same crash again. Only difference from the previous times seems to have been the system under some I/O load, doing a fs scrub.
See attached kernel logs

Revision history for this message
HG (hggg-0) wrote :

More logs about some failure of resuming the device

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Does the system recently get update? My test kernel was -47 and it may be replaced by official kernel, which also has -47 suffix.

Revision history for this message
HG (hggg-0) wrote :

I think it's your kernel, I never shut it down after the upgrade and it has this build date: 4.15.0-47-generic #50 SMP Mon Mar 25 15:56:04 CST 2019 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

So the patched kernel reduce the fail rate, but not completely?

Revision history for this message
HG (hggg-0) wrote :

Yes, it looks like that. I've done some more testing and it seems if a fs is mounted on a device attached to the ASM1062 SATA ports it will also oops with your patch.

Changed in linux (Ubuntu):
assignee: Kai-Heng Feng (kaihengfeng) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Changed in linux (Ubuntu Bionic):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Focal Packages (valentin-manea+launchpad-net) wrote :

As far as I am concerned this is verified for bionic, system has been rock solid with this patch.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.7 KiB)

This bug was fixed in the package linux - 4.15.0-101.102

---------------
linux (4.15.0-101.102) bionic; urgency=medium

  * bionic/linux: 4.15.0-101.102 -proposed tracker (LP: #1877262)

  * 4.15.0-100.101 breaks userspace builds due to a bug in the headers
    /usr/include/linux/swab.h of linux-libc-dev (LP: #1877123)
    - include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for
      swap

  * bionic snapdragon 4.15 snap failed Certification testing (LP: #1877657)
    - Revert "drm/msm: Use the correct dma_sync calls in msm_gem"
    - Revert "drm/msm: stop abusing dma_map/unmap for cache"

linux (4.15.0-100.101) bionic; urgency=medium

  * bionic/linux: 4.15.0-100.101 -proposed tracker (LP: #1875878)

  * built-using constraints preventing uploads (LP: #1875601)
    - temporarily drop Built-Using data

  * Add debian/rules targets to compile/run kernel selftests (LP: #1874286)
    - [Packaging] add support to compile/run selftests

  * getitimer returns it_value=0 erroneously (LP: #1349028)
    - [Config] CONTEXT_TRACKING_FORCE policy should be unset

  * QEMU/KVM display is garbled when booting from kernel EFI stub due to missing
    bochs-drm module (LP: #1872863)
    - [Config] Enable CONFIG_DRM_BOCHS as module for all archs

  * Backport MPLS patches from 5.3 to 4.15 (LP: #1851446)
    - net/mlx5e: Report netdevice MPLS features
    - net: vlan: Inherit MPLS features from parent device
    - net: bonding: Inherit MPLS features from slave devices
    - net/mlx5e: Move to HW checksumming advertising

  * LIO hanging in iscsit_free_session and iscsit_stop_session (LP: #1871688)
    - scsi: target: remove boilerplate code
    - scsi: target: fix hang when multiple threads try to destroy the same iscsi
      session
    - scsi: target: iscsi: calling iscsit_stop_session() inside
      iscsit_close_session() has no effect

  * Add hw timestamps to received skbs in peak_canfd (LP: #1874124)
    - can: peak_canfd: provide hw timestamps in rx skbs

  * Bionic update: upstream stable patchset 2020-04-23 (LP: #1874502)
    - ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltage
    - bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads
    - net: vxge: fix wrong __VA_ARGS__ usage
    - hinic: fix a bug of waitting for IO stopped
    - hinic: fix wrong para of wait_for_completion_timeout
    - cxgb4/ptp: pass the sign of offset delta in FW CMD
    - qlcnic: Fix bad kzalloc null test
    - i2c: st: fix missing struct parameter description
    - firmware: arm_sdei: fix double-lock on hibernate with shared events
    - null_blk: Fix the null_add_dev() error path
    - null_blk: Handle null_add_dev() failures properly
    - null_blk: fix spurious IO errors after failed past-wp access
    - xhci: bail out early if driver can't accress host in resume
    - x86: Don't let pgprot_modify() change the page encryption bit
    - block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices
    - irqchip/versatile-fpga: Handle chained IRQs properly
    - sched: Avoid scale real weight down to zero
    - selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault
    - PCI/switchtec: Fix init_completio...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (25.9 KiB)

This bug was fixed in the package linux - 5.4.0-31.35

---------------
linux (5.4.0-31.35) focal; urgency=medium

  * focal/linux: 5.4.0-31.35 -proposed tracker (LP: #1877253)

  * Intermittent display blackouts on event (LP: #1875254)
    - drm/i915: Limit audio CDCLK>=2*BCLK constraint back to GLK only

  * Unable to handle kernel pointer dereference in virtual kernel address space
    on Eoan (LP: #1876645)
    - SAUCE: overlayfs: fix shitfs special-casing

linux (5.4.0-30.34) focal; urgency=medium

  * focal/linux: 5.4.0-30.34 -proposed tracker (LP: #1875385)

  * ubuntu/focal64 fails to mount Vagrant shared folders (LP: #1873506)
    - [Packaging] Move virtualbox modules to linux-modules
    - [Packaging] Remove vbox and zfs modules from generic.inclusion-list

  * linux-image-5.0.0-35-generic breaks checkpointing of container
    (LP: #1857257)
    - SAUCE: overlayfs: use shiftfs hacks only with shiftfs as underlay

  * shiftfs: broken shiftfs nesting (LP: #1872094)
    - SAUCE: shiftfs: record correct creator credentials

  * Add debian/rules targets to compile/run kernel selftests (LP: #1874286)
    - [Packaging] add support to compile/run selftests

  * shiftfs: O_TMPFILE reports ESTALE (LP: #1872757)
    - SAUCE: shiftfs: fix dentry revalidation

  * LIO hanging in iscsit_free_session and iscsit_stop_session (LP: #1871688)
    - scsi: target: iscsi: calling iscsit_stop_session() inside
      iscsit_close_session() has no effect

  * [ICL] TC port in legacy/static mode can't be detected due TCCOLD
    (LP: #1868936)
    - SAUCE: drm/i915: Align power domain names with port names
    - SAUCE: drm/i915/display: Move out code to return the digital_port of the aux
      ch
    - SAUCE: drm/i915/display: Add intel_legacy_aux_to_power_domain()
    - SAUCE: drm/i915/display: Split hsw_power_well_enable() into two
    - SAUCE: drm/i915/tc/icl: Implement TC cold sequences
    - SAUCE: drm/i915/tc: Skip ref held check for TC legacy aux power wells
    - SAUCE: drm/i915/tc/tgl: Implement TC cold sequences
    - SAUCE: drm/i915/tc: Catch TC users accessing FIA registers without enable
      aux
    - SAUCE: drm/i915/tc: Do not warn when aux power well of static TC ports
      timeout

  * alsa/sof: external mic can't be deteced on Lenovo and HP laptops
    (LP: #1872569)
    - SAUCE: ASoC: intel/skl/hda - set autosuspend timeout for hda codecs

  * amdgpu kernel errors in Linux 5.4 (LP: #1871248)
    - drm/amd/display: Stop if retimer is not available

  * Focal update: v5.4.34 upstream stable release (LP: #1874111)
    - amd-xgbe: Use __napi_schedule() in BH context
    - hsr: check protocol version in hsr_newlink()
    - l2tp: Allow management of tunnels and session in user namespace
    - net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
    - net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin
    - net: ipv6: do not consider routes via gateways for anycast address check
    - net: phy: micrel: use genphy_read_status for KSZ9131
    - net: qrtr: send msgs from local of same id as broadcast
    - net: revert default NAPI poll timeout to 2 jiffies
    - net: tun: record RX queue in skb before do_xdp_gener...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (38.1 KiB)

This bug was fixed in the package linux - 5.3.0-53.47

---------------
linux (5.3.0-53.47) eoan; urgency=medium

  * eoan/linux: 5.3.0-53.47 -proposed tracker (LP: #1877257)

  * Intermittent display blackouts on event (LP: #1875254)
    - drm/i915: Limit audio CDCLK>=2*BCLK constraint back to GLK only

  * Unable to handle kernel pointer dereference in virtual kernel address space
    on Eoan (LP: #1876645)
    - SAUCE: overlayfs: fix shitfs special-casing

linux (5.3.0-52.46) eoan; urgency=medium

  * eoan/linux: 5.3.0-52.46 -proposed tracker (LP: #1874752)

  * alsa: make the dmic detection align to the mainline kernel-5.6
    (LP: #1871284)
    - ALSA: hda: add Intel DSP configuration / probe code
    - ALSA: hda: fix intel DSP config
    - ALSA: hda: Allow non-Intel device probe gracefully
    - ALSA: hda: More constifications
    - ALSA: hda: Rename back to dmic_detect option
    - [Config] SND_INTEL_DSP_CONFIG=m
    - [packaging] Remove snd-intel-nhlt from modules

  * built-using constraints preventing uploads (LP: #1875601)
    - temporarily drop Built-Using data

  * ubuntu/focal64 fails to mount Vagrant shared folders (LP: #1873506)
    - [Packaging] Move virtualbox modules to linux-modules
    - [Packaging] Remove vbox and zfs modules from generic.inclusion-list

  * linux-image-5.0.0-35-generic breaks checkpointing of container
    (LP: #1857257)
    - SAUCE: overlayfs: use shiftfs hacks only with shiftfs as underlay

  * shiftfs: broken shiftfs nesting (LP: #1872094)
    - SAUCE: shiftfs: record correct creator credentials

  * Add debian/rules targets to compile/run kernel selftests (LP: #1874286)
    - [Packaging] add support to compile/run selftests

  * shiftfs: O_TMPFILE reports ESTALE (LP: #1872757)
    - SAUCE: shiftfs: fix dentry revalidation

  * getitimer returns it_value=0 erroneously (LP: #1349028)
    - [Config] CONTEXT_TRACKING_FORCE policy should be unset

  * 5.3.0-46-generic - i915 - frequent GPU hangs / resets rcs0 (LP: #1872001)
    - drm/i915/execlists: Preempt-to-busy
    - drm/i915/gt: Detect if we miss WaIdleLiteRestore
    - drm/i915/execlists: Always force a context reload when rewinding RING_TAIL

  * alsa/sof: external mic can't be deteced on Lenovo and HP laptops
    (LP: #1872569)
    - SAUCE: ASoC: intel/skl/hda - set autosuspend timeout for hda codecs

  * Eoan update: upstream stable patchset 2020-04-22 (LP: #1874325)
    - ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltage
    - bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads
    - net: vxge: fix wrong __VA_ARGS__ usage
    - hinic: fix a bug of waitting for IO stopped
    - hinic: fix wrong para of wait_for_completion_timeout
    - cxgb4/ptp: pass the sign of offset delta in FW CMD
    - qlcnic: Fix bad kzalloc null test
    - i2c: st: fix missing struct parameter description
    - cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL
    - media: venus: hfi_parser: Ignore HEVC encoding for V1
    - firmware: arm_sdei: fix double-lock on hibernate with shared events
    - null_blk: Fix the null_add_dev() error path
    - null_blk: Handle null_add_dev() failures properly
    - null_blk: fix spuri...

Changed in linux (Ubuntu):
status: Expired → Fix Released
Revision history for this message
Khaled El Mously (kmously) wrote :

Thanks for the verification @vali !!

tags: added: verification-done-eoan verification-done-focal verification-done-xenial
removed: verification-needed-eoan verification-needed-focal verification-needed-xenial
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.