You can sign up to get a daily email of our articles, see the Mailing List page.
Nonstop system freezes/crashes since switching to AMD
Page: 1/2»
  Go to:
brokeassben Jun 6, 2021
Hey all,
I'm absolutely no expert, but have been using Linux since 2003 and generally have been able to troubleshoot on my own and do IT for a living. That said, I am at my damn wits end trying to solve this issue. Since building my PC last year and moving from Intel + Nvidia, I have had very frequent system freezes that require hard shutdowns. I'd generally figure this was some defective hardware if I hadn't seen a number of people with similar or roughly the same components going through the same thing and posting about it on Reddit and other various places. If anyone has experienced this same thing and figured out a solution, I'd really appreciate any help.

What triggers the freeze:
Happens randomly--sometimes as early as the login screen, sometimes with only a browser open, sometimes not til after many many hours of use.
Can usually trigger a freeze faster by running a game through Proton, but that can also be very inconsistent.

So things I've tried:
Updating BIOS (I've updated the BIOS more times than I can recall at this point)
Disabling C-states in BIOS (this was suggested to be the solution by several people on Reddit)
Updating to a newer kernel (tried up to 5.12 with no success)
Updating to a newer mesa
SSH-ing into the computer to attempt to capture logs with no helpful info in them
Installing a different Linux OS (Fedora, Arch, newest Ubuntu)
Installing Windows 10 (runs consistently without issue, but I obviously don't want to run Win)

System info:
Ryzen 9 3900x
Radeon RX 5700 XT
ASUS TUF Gaming B550M-Plus (Wi-Fi)

Thanks!
Ben
Liam Dawe Jun 6, 2021
I would seriously suggest checking the RAM. Most of the time this is a RAM problem.
Koopacabras Jun 6, 2021
yeah could be the ram sometimes some Ryzen have problems with high frequency RAM try lowering down the freq, or disabling XMP.
or it could be that one module is damaged in that case you should try a ram verifying software and check one module at a time to see which one is damaged.
most of RAM manufacters have lifetime warranty so if one module is damaged you could receive a replacement from RMA.
I think on debian/ubuntu there is a cli utility to check ram can't remember it's name.
also if your motherboard has and "Idle Mode" option, change it to "Typical Current Idle", but afaik that was an issue with Zen 1 processors never heard of that happening on Zen 2. Also you could try increasing the voltage of your Ryzen CPU too (but this was an issue with Zen 1 as well, but trying won't hurt)

Also another thing you could try is adding this options to grub:

processor.max_cstate=1 rcu_nocbs=0-23 idle=nomwait
or
processor.max_cstate=5 rcu_nocbs=0-23 idle=nomwait
(but AFAIK this issue was fixed in newer kernels)

also if you pase a full dmesg log could help.
dmesg > dmesg.txt 

and copy paste the output on pastebin.

Last edited by Koopacabras on 6 June 2021 at 9:10 pm UTC
HyperRealisticRock Jun 7, 2021
In my experience its almost always power, either a loose connector or pin not seated or the PSU unit itself.
Xpander Jun 7, 2021
Yeah i think most likely its either PSU or RAM issues.
Though you say windows runs without issues. Which is a bit weird then.
WHat Aquabat suggested is worth a try though

Last edited by Xpander on 7 June 2021 at 7:01 am UTC
damarrin Jun 7, 2021
What others have said, though do try a different gfx card (nvidia) if you can. I've had so many problems with AMD gfx over the years... /o\
Whitewolfe80 Jun 7, 2021
What temp is your cpu getting hitting ? have you overclocked because i did experience something similar about a year ago with 2600 and it turned out cpu was hitting high 90s and my motherboard was spiking to over 100. A really stupid mistake on my part regarding voltage and core clock settings.

Last edited by Whitewolfe80 on 7 June 2021 at 1:10 pm UTC
brokeassben Jun 7, 2021
Quoting: Whitewolfe80What temp is your cpu getting hitting ? have you overclocked because i did experience something similar about a year ago with 2600 and it turned out cpu was hitting high 90s and my motherboard was spiking to over 100. A really stupid mistake on my part regarding voltage and core clock settings.
I haven't overclocked since it hasn't run stably at all with the exception of windows, but I REALLY don't want to dual-boot. It generally maxes out around 65 when gaming for long stretches--turns out water cooling is overhyped and good airflow is just as good in many cases.

Quoting: Liam DaweI would seriously suggest checking the RAM. Most of the time this is a RAM problem.
The RAM (as well as the PSU) is repurposed from my previous build, has been re-seated, is listed as compatible with the motherboard and passes both memory tests I've tried...which I know doesn't always mean it's good RAM. I should probably just invest in some faster RAM and splurge on 32GB.

Quoting: The_AquabatAlso another thing you could try is adding this options to grub:

processor.max_cstate=1 rcu_nocbs=0-23 idle=nomwait
I've tried this out and haven't had any crashes after a couple hours of gaming, which makes me cautiously hopeful! I'll try a few more games after work today to see how it goes. My hopes have been dashed a few times with other attempts at fixes.

A weird added detail--my system is 100% stable while running CS:GO 🤷‍♂️

I really appreciate all of you taking the time to respond!

Ben
dvd Jun 7, 2021
Quoting: brokeassben
Quoting: Whitewolfe80What temp is your cpu getting hitting ? have you overclocked because i did experience something similar about a year ago with 2600 and it turned out cpu was hitting high 90s and my motherboard was spiking to over 100. A really stupid mistake on my part regarding voltage and core clock settings.
I haven't overclocked since it hasn't run stably at all with the exception of windows, but I REALLY don't want to dual-boot. It generally maxes out around 65 when gaming for long stretches--turns out water cooling is overhyped and good airflow is just as good in many cases.

Quoting: Liam DaweI would seriously suggest checking the RAM. Most of the time this is a RAM problem.
The RAM (as well as the PSU) is repurposed from my previous build, has been re-seated, is listed as compatible with the motherboard and passes both memory tests I've tried...which I know doesn't always mean it's good RAM. I should probably just invest in some faster RAM and splurge on 32GB.

Quoting: The_AquabatAlso another thing you could try is adding this options to grub:

processor.max_cstate=1 rcu_nocbs=0-23 idle=nomwait
I've tried this out and haven't had any crashes after a couple hours of gaming, which makes me cautiously hopeful! I'll try a few more games after work today to see how it goes. My hopes have been dashed a few times with other attempts at fixes.

A weird added detail--my system is 100% stable while running CS:GO 🤷‍♂️

I really appreciate all of you taking the time to respond!

Ben

For my Ryzen disabling C6 fixed the problem too (I have an old 1300x). Weirdly, i read too somewhere that the C-state issues were fixed in newer hardware. But i have C6 disabled in the BIOS and the kernel options too.
(Seems like people still had problems with C6 last year: https://bugzilla.kernel.org/show_bug.cgi?id=206487)
tuubi Jun 8, 2021
Quoting: TheReaperUKremember Linux power consumption/usage are not as good as Windows
Just to clarify, some Linux drivers and software aren't as power-efficient as some drivers and software on Windows. If Linux wasted more power than Windows in general, you can bet your butt it wouldn't be so popular in server rooms and on embedded devices.

Sorry for going off topic.
Koopacabras Jun 8, 2021
Quoting: brokeassbenI've tried this out and haven't had any crashes after a couple hours of gaming, which makes me cautiously hopeful! I'll try a few more games after work today to see how it goes. My hopes have been dashed a few times with other attempts at fixes.
the fixed I mentioned is a hardware bug that Ryzens have. On Windows it's patched at microcode firmare level, on Linux it _should_ be patched as well, but seems that Linux kernel devs haven't completely nailed it and sometimes it can happen even with patches.. it would make sense, that might be the reason why you don't experience the bug on windows, on windows the microcode patch works better.
While you're here, please consider supporting GamingOnLinux on:

Reward Tiers: Patreon. Plain Donations: PayPal.

This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!

You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
Login / Register


Or login with...
Sign in with Steam Sign in with Google
Social logins require cookies to stay logged in.