Comment 5 for bug 1931106

Revision history for this message
aft2d (aft2d) wrote (last edit ):

Update:
I've contacted the mail addresses from the Kai-Heng's post.
Michael Chan (from Broadcom) replied that they've seen similar issues on other AMD systems and that they were working with AMD to resolve this.
The plan was to establish contact between me and AMD, unfortunately this never happened. The attempt to contact AMD via the official way (tech support) failed because I could not answer AMD's questions without feedback from Broadcom, who then also did not reply anymore.

Workaround:
Luckily, with the information that came out of the conversation with Broadcom, I was able to troubleshoot a bit myself since I knew at least somewhat where to look.
It appears that by setting Advanced -> NB Configuration -> IOMMU to "disabled" (default is "Auto") in Supermicro BIOS the problem does not occur anymore.

Since then the whole topic is "stuck".

It's just a workaround and not really a fix, but at least servers running stable now for me. Since I don't know where the actual problem is (whether in AMD hardware, bios, kernel, or whatever) so I can't say if this bug report can be marked as closed or not.