Comment 19 for bug 1681909

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote : Re: dump is not captured in remote host when kdump over ssh is configured on firestone.

Looking at the log, I noticed the EEH is frozen right after finding the Broadcom card. Is that one the tg3?

[ OK ] Found device NetXtreme BCM5719 Gigabit Ethernet PCIe.
[ 8.191135] EEH: Frozen PE#7 on PHB#21 detected
[ 8.191280] EEH: PE location: S00210f, PHB location: N/A

Also, the recovery problem seems to be caused by ast.

[ 18.267005] EEH: 2100000 reads ignored for recovering device at location=S00210f driver=ast pci addr=0021:10:00.0
[ 18.267334] EEH: Might be infinite loop in ast driver

Looking at the upstream logs, one commit came up. Can you open a new bug for it?

commit 298360af3dab45659810fdc51aba0c9f4097e4f6
Author: Russell Currey <email address hidden>
Date: Thu Dec 15 16:12:41 2016 +1100

    drivers/gpu/drm/ast: Fix infinite loop if read fails

    ast_get_dram_info() configures a window in order to access BMC memory.
    A BMC register can be configured to disallow this, and if so, causes
    an infinite loop in the ast driver which renders the system unusable.

    Fix this by erroring out if an error is detected. On powerpc systems with
    EEH, this leads to the device being fenced and the system continuing to
    operate.

    Cc: <email address hidden> # 3.10+
    Signed-off-by: Russell Currey <email address hidden>
    Reviewed-by: Joel Stanley <email address hidden>
    Signed-off-by: Daniel Vetter <email address hidden>
    Link: http://patchwork<email address hidden>

diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c
index 904beaa932d03..f75c6421db623 100644
--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -223,7 +223,8 @@ static int ast_get_dram_info(struct drm_device *dev)
        ast_write32(ast, 0x10000, 0xfc600309);

        do {
- ;
+ if (pci_channel_offline(dev->pdev))
+ return -EIO;
        } while (ast_read32(ast, 0x10000) != 0x01);
        data = ast_read32(ast, 0x10004);

@@ -428,7 +429,9 @@ int ast_driver_load(struct drm_device *dev, unsigned long flags)
        ast_detect_chip(dev, &need_post);

        if (ast->chip != AST1180) {
- ast_get_dram_info(dev);
+ ret = ast_get_dram_info(dev);