Ubuntu
linux package

Bug #1447664
Comment #37

Comment 37 for bug 1447664

Revision history for this message

Paulo Abadie Guedes (paulo.guedes) wrote on 2018-01-28:

#37

Hello, I am still having this bug. I'm working with several HP machines, with the same model as Yngvi. Here it is (from dmesg messages):
Hardware name: HP HP EliteDesk 705 G3 Brazil Desktop Mini/8266, BIOS P26 Ver. 02.03 12/22/2016

Interesting to notice that it always happens with a 10/100 switch, but never occurs with a gigabit one.

I've compiled and tested the 4.15.0-rc8 release candidade, which has the commit 4419bb1cedcda0272e1dc410345c5a1d1da0e367, but it does not solve the issue. I added a few printk and can see that the module is correctly compiled and loaded, but my machine is not a Dell. Hence, the "if" condition fails and the body is not executed.

I tried also to force the patch, by keeping the "if body" and removing the condition, just to see what happens (with another printk to prove that it runs). The code runs (limiting MRRS t0 2048, I think), but it does not solve the bug.
It complains that TSC is unstable, right after tg3 breaks. Here is a dmesg snippet, maybe it helps.

<...>
[ 155.816404] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 155.816447] clocksource: 'refined-jiffies' wd_now: fffdcbf3 wd_last: fffdc110 mask: ffffffff
[ 155.816490] clocksource: 'tsc' cs_now: 7d3f16e620 cs_last: 7b2987b172 mask: ffffffffffffffff
[ 155.816533] tsc: Marking TSC unstable due to clocksource watchdog
[ 155.939181] tg3 0000:01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[ 156.103998] tg3 0000:01:00.0 eth0: Link is down
[ 156.322988] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[ 156.323040] sched_clock: Marking unstable (156322980975, 5436)<-(156582881282, -259894745)
[ 156.323144] clocksource: Switched to clocksource refined-jiffies
<...>

If you want to take a deeper look, there are a few logs here. Tried also with "tsc=unstable" and other boot parameters, mostly to see if any would help (feeling lucky, perhaps?). Nothing changed, the bug is still in here. They show mostly the same messages, to me.

log_01_acpi_off.txt
https://pastebin.com/FGQNiLqk

log_02_maxcpus_1.txt
https://pastebin.com/2eEJnA3Z

log_03_nmi_watchdog_off.txt
https://pastebin.com/Su44AqiX

log_04_nmi_watchdog_off.txt
https://pastebin.com/4ja0UZ0c

log_05_noapic_nolapic.txt
https://pastebin.com/fZNJbME5

Well, any ideas? I can reproduce the problem 100% of the time. Would you like me to test any other patch?

Kai-Heng Feng, you mention "it's better to ask HP and Broadcom to fix the issue". I agree, but how can we do that?

Thank you,
Paulo

Interesting to notice that it always happens with a 10/100 switch, but never occurs with a gigabit one.

I tried also to force the patch, by keeping the "if body" and removing the condition, just to see what happens (with another printk to prove that it runs). The code runs (limiting MRRS t0 2048, I think), but it does not solve the bug. 
It complains that TSC is unstable, right after tg3 breaks. Here is a dmesg snippet, maybe it helps.

<...>
[  155.816404] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[  155.816447] clocksource:                       'refined-jiffies' wd_now: fffdcbf3 wd_last: fffdc110 mask: ffffffff
[  155.816490] clocksource:                       'tsc' cs_now: 7d3f16e620 cs_last: 7b2987b172 mask: ffffffffffffffff
[  155.816533] tsc: Marking TSC unstable due to clocksource watchdog
[  155.939181] tg3 0000:01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[  156.103998] tg3 0000:01:00.0 eth0: Link is down
[  156.322988] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[  156.323040] sched_clock: Marking unstable (156322980975, 5436)<-(156582881282, -259894745)
[  156.323144] clocksource: Switched to clocksource refined-jiffies
<...>

log_01_acpi_off.txt
https://pastebin.com/FGQNiLqk

log_02_maxcpus_1.txt
https://pastebin.com/2eEJnA3Z

log_03_nmi_watchdog_off.txt
https://pastebin.com/Su44AqiX

log_04_nmi_watchdog_off.txt
https://pastebin.com/4ja0UZ0c

log_05_noapic_nolapic.txt
https://pastebin.com/fZNJbME5

Well, any ideas? I can reproduce the problem 100% of the time. Would you like me to test any other patch?

Kai-Heng Feng, you mention "it's better to ask HP and Broadcom to fix the issue". I agree, but how can we do that?

Thank you,
Paulo

Ubuntulinux package

Comment 37 for bug 1447664

Ubuntu
linux package