Comment 271 for bug 1690085

Revision history for this message
In , johnpaul7 (johnpaul7-linux-kernel-bugs) wrote :

There really has to be something else at play here. With my system everything has been stable since finding this thread and setting rcu-nocbs. But my co-worker who built an almost identical system (only differences being 1800X and non QVL memory) is experiencing the problem and close to giving up. My system specs:

Fedora 26
kernel 4.14.18
CONFIG_RCU_NOCB_CPU=y
rcu-nocbs=0-15

Ryzen 7 1700x (week 37)
ASrock Fatal1ty AB350 Gaming-ITX/ac (4.40 bios)
32gb QVL memory running at rated 2933 via profile
Other stuff: AMD GPU, Corsair PSU, Samsung SSD

Bios settings are pretty much stock apart from memory profile, enabling virtualization stuff, and tweaking fan profiles. Nothing tweaked related to C6 (bios or zenstates script).

To help aggregate all our experiences, and try to find some patterns what is everyone's thoughts on starting a Google form spreadsheet with columns for:
- CPU
- CPU week: yes, not related really but good to know in general
- Mobo: helpful for finding others in "your" situation
- Mobo Chipset: curious if 350 vs 370 shows anything significant. Mobo name would probably contain this, but having a dedicated column to sort is helpful
- Mobo BIOS version
- (boolean) QVL memory: we know Ryzen is sensitive to memory, so would be good data point to collect
- (boolean) CONFIG_RCU_NOCB_CPU
- (boolean) rcu-nocbs: bool since value is based on processor and not something I think we misalign with number of threads
- C6 tweaks: "false" or what you did (e.g. bios, zenstates.py, etc)
- Distro
- Kernel
- (boolean) Passes stress test: to help eliminate hardware instability from OC'ing or faulty memory sticks, we can define what qualifies as "true" here
- (linear scale) Stable: from my life sucks and I can't use system, to my life on Ryzen is awesome and I forgot last time I rebooted (but still conscious bugs and crashes exist outside of Ryzen related issues).

I want to try and keep it as simple/straightforward as possible so it compliments this thread, and helps us see the data in a more structured format. Hopefully we can find a pattern! I also can't help but wonder how more people are not reporting issues. Would be curious if any people on the levelonetechs community can get Wendell & Co. to draw more attention to this. AMD is making great strides for open source and linux friendly hardware, but this problem really hurts that effort.

**EDIT** v1 of form is live: https://goo.gl/forms/oGCTPnNK0vJtNntj2

If you want to be a collaborator on the form, message me directly. Also share feedback on how to make the form better suited for our goal!