Comment 0 for bug 1917471

Revision history for this message
prabhakar pujeri (prabhakarpujeri) wrote : Bus Fatal Error observed when reboot on BCM5720

following error messages are observed

[ 146.429212] shutdown[1]: Rebooting.
[ 146.435151] kvm: exiting hardware virtualization
[ 146.575319] megaraid_sas 0000:67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 148.088133] [qede_unload:2236(eno12409)]Link is down
[ 148.183618] qede 0000:31:00.1: Ending qede_remove successfully
[ 148.518541] [qede_unload:2236(eno12399)]Link is down
[ 148.625066] qede 0000:31:00.0: Ending qede_remove successfully
[ 148.762067] ACPI: Preparing to enter system sleep state S5
[ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5
[ 148.803731] {1}[Hardware Error]: event severity: recoverable
[ 148.810191] {1}[Hardware Error]: Error 0, type: fatal
[ 148.816088] {1}[Hardware Error]: section_type: PCIe error
[ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point
[ 148.829026] {1}[Hardware Error]: version: 3.0
[ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010
[ 148.841140] {1}[Hardware Error]: device_id: 0000:04:00.0
[ 148.847309] {1}[Hardware Error]: slot: 0
[ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00
[ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f
[ 148.865145] {1}[Hardware Error]: class_code: 020000
[ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00010000
[ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030
[ 148.886575] {1}[Hardware Error]: TLP Header: 40000001 0000030f 90028090 00000000
[ 148.894823] tg3 0000:04:00.0: AER: aer_status: 0x00100000, aer_mask: 0x00010000
[ 148.902795] tg3 0000:04:00.0: AER: [20] UnsupReq (First)
[ 148.910234] tg3 0000:04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID
[ 148.918806] tg3 0000:04:00.0: AER: aer_uncor_severity: 0x000ef030
[ 148.925558] tg3 0000:04:00.0: AER: TLP Header: 40000001 0000030f 90028090 00000000
[ 148.933984] reboot: Restarting system
[ 148.938319] reboot: machine restart

I have observed the following. when I test older kernel

Kernel version Fatal Error
5.4.0-42.46 No
5.4.0-45.49 No
5.4.0-47.51 No
5.4.0-48.52 No
5.4.0-51.56 No
5.4.0-52.57 No
5.4.0-53.59 No
5.4.0-54.60 No
5.4.0-58.64 No
5.4.0-59.65 yes
5.4.0-60.67 yes

later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly

PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

https://kernel.ubuntu.com/git/ubuntu/ubuntu-focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d