Patreon Logo Support us on Patreon to keep GamingOnLinux alive. This ensures all of our main content remains free for everyone. Just good, fresh content! Alternatively, you can donate through PayPal Logo PayPal. You can also buy games using our partner links for GOG and Humble Store.
Title: Working around Ryzen CPU freezes
Page: 1/2
  Go to:
Shmerl 1 Mar 2018
Many Ryzen CPUs for a long time have been affected by random freezes and reboots, which some managed to narrow down to C6 power states. Even RMA often didn't help with these.

Recently I found an actual kernel bug report about it: https://bugzilla.kernel.org/show_bug.cgi?id=196683

Apparently, AMD said that they are going to release some microcode update (or MB manufacturers are going to release some firmware updates?), which handle this. But until that will happen, you can work around it without disabling C6 states, if you build the kernel with

CONFIG_RCU_NOCB_CPU=y

And then use rcu_nocbs=0-... kernel boot parameter to enable it.

I was bugged by this issue for a long time, and recently decided to build a kernel like above. It happens to be quite easy with Debian. It indeed works around the problem.

Here is an example HOWTO on doing it:
_______________________________________________________________________
If this is useful to anyone, here is what I did on Debian testing:

Install current Linux source package and some tools and libraries:

sudo apt-get install linux-source build-essential kernel-package libncurses5-dev libelf-dev libssl-dev

Note: in my case I pulled linux-source from Debian sid, since the default kernel in testing still didn't get bumped to Linux 4.15.x, even though 4.15 is available in general.

Unpack the source for example to $HOME/build.

I like using explicit tar parameters even though they are long, it just makes it hugely more readable and easy to understand:

linux_ver="4.15"
config_ver="4.15.0-1-amd64"

linux_src="/usr/src/linux-source-${linux_ver}.tar.xz"
build_dir="${HOME}/build"
source_dir="${build_dir}/linux-source-${linux_ver}"

mkdir -p $build_dir
tar --extract --verbose --use-compress-program "xz --decompress" --file ${linux_src} --directory ${build_dir}
cd $source_dir


Now you are in the unpacked source directory. You'd need to create proper .config file. My goal was to make only minimal tweaks from the stock Debian kernel, so I copied default config from /boot first:

cp -v /boot/config-${config_ver} ${source_dir}/.config

Now, you need to enable the actual workaround. There is a useful tool for configuring kernel parameters - menuconfig (that's why libncurses5-dev was needed above).

make menuconfig

It will build the tool in place and will run it. Find RCU options under General Setup > RCU Subsystem

External Media: You need to be logged in to view this.


External Media: You need to be logged in to view this.


And there enable "Make expert-level adjustments to RCU configuration" and "Offload RCU callback processing from boot-selected CPUs"

External Media: You need to be logged in to view this.


Then select Save (into .config), and a few times Exit to exit the tool.

One more thing is needed - comment out CONFIG_SYSTEM_TRUSTED_KEYS in resulting ${source_dir}/.config, otherwise [the build will fail](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=823107).

perl -p -i -e 's/^CONFIG_SYSTEM_TRUSTED_KEYS=/#CONFIG_SYSTEM_TRUSTED_KEYS=/' .config

Now you are ready to build it (I chose suffixes -rcu, and -1 for versions):

make -j$(nproc) deb-pkg LOCALVERSION=-rcu KDEB_PKGVERSION=$(make kernelversion)-1

Press Enter a few times to complete the configuration caused by the modifications, and it should proceed with the build. After it completes, you should have the ready package in $build_dir:

cd ..
ls -1 linux-image*
linux-image-4.15.4-rcu_4.15.4-1_amd64.deb
linux-image-4.15.4-rcu-dbg_4.15.4-1_amd64.deb


Install the result:

sudo dpkg -i linux-image-4.15.4-rcu_4.15.4-1_amd64.deb

Now open /etc/default/grub with your editor (under sudo), and add this to your kernel boot parameters (assuming you have 8 core / 16 thread processor).

GRUB_CMDLINE_LINUX_DEFAULT="... rcu_nocbs=0-15"

... here means whatever was there before the change, don't remove what was there, just add the new parameter after a space!

Save and update system grub:

sudo update-grub

That's it. Reboot the system and you are ready to use the workaround kernel.
Xpander 1 Mar 2018
I haven't had any issues with it after i disabled C6 power state in the BIOS. also have AGESA 1.0.0.0a BIOS
weird that some need to enable kernel configs and boot parameters. My CPU is Launch one so it should in theory have that segfault bug also when lots of parallel compiles running. Though i have ran that ryzen-kill test for 2 hours 2 times and one time 4 hours with no issues.

xpander@arch ~ $ uptime
 19:29:19 up 7 days, 21:01,  1 user,  load average: 1.52, 1.43, 0.85

xpander@arch ~ $ inxi -F
System:    Host: arch Kernel: 4.15.4-1-ARCH x86_64 bits: 64 Desktop: MATE 1.20.0
           Distro: Arch Linux
Machine:   Device: desktop Mobo: ASUSTeK model: PRIME X370-PRO v: Rev X.0x serial: N/A
           UEFI [Legacy]: American Megatrends v: 3803 date: 01/22/2018
CPU:       8 core AMD Ryzen 7 1700X Eight-Core (-MT-MCP-) cache: 4096 KB
           clock speeds: max: 3925 MHz 1: 1881 MHz 2: 1813 MHz 3: 2691 MHz 4: 2382 MHz
           5: 1842 MHz 6: 1855 MHz 7: 1862 MHz 8: 1873 MHz 9: 2958 MHz 10: 2413 MHz
           11: 1709 MHz 12: 1710 MHz 13: 1727 MHz 14: 1716 MHz 15: 1710 MHz
           16: 1711 MHz
Graphics:  Card: NVIDIA GP104 [GeForce GTX 1070]
           Display Server: x11 (X.Org 1.19.6 ) driver: nvidia
           Resolution: [email protected][email protected]
           OpenGL: renderer: GeForce GTX 1070/PCIe/SSE2 version: 4.6.0 NVIDIA 390.25
Audio:     Card-1 NVIDIA GP104 High Def. Audio Controller driver: snd_hda_intel
           Card-2 M-Audio driver: USB Audio
           Card-3 Focusrite-Novation Focusrite Scarlett 2i2 driver: USB Audio
           Sound: Advanced Linux Sound Architecture v: k4.15.4-1-ARCH
Network:   Card: Intel I211 Gigabit Network Connection driver: igb
           IF: enp8s0 state: up speed: 1000 Mbps duplex: full mac: 60:45:cb:9a:09:31
Drives:    HDD Total Size: 7751.6GB (64.3% used)
           ID-1: /dev/nvme0n1 model: Samsung_SSD_960_EVO_250GB size: 250.1GB
           ID-2: /dev/sdc model: ST3000DM001 size: 3000.6GB
           ID-3: /dev/sda model: Samsung_SSD_850 size: 500.1GB
           ID-4: /dev/sdb model: ST2000DM001 size: 2000.4GB
           ID-5: /dev/sdd model: ST2000DM001 size: 2000.4GB
Partition: ID-1: / size: 230G used: 129G (59%) fs: ext4 dev: /dev/nvme0n1p1
Sensors:   System Temperatures: cpu: 38.0C mobo: 29.0C gpu: 31C
           Fan Speeds (in rpm): cpu: 0 sys-1: 1163 sys-2: 919
Info:      Processes: 345 Uptime: 7 days Memory: 11726.2/32166.7MB
           Client: Shell (bash) inxi: 2.3.56

Shmerl 1 Mar 2018
Quoting: XpanderI haven't had any issues with it after i disabled C6 power state in the BIOS. also have AGESA 1.0.0.0a BIOS
weird that some need to enable kernel configs and boot parameters.
I updated the first post with some details. Disabling C6 is really a very crude fix (which in case of my MB firmware doesn't even work, since C6 doesn't get disabled). It causes higher CPU temperature and power usage. RCU workaround helps without disabling C6 so the temperature practically isn't affected.
Xpander 1 Mar 2018
Quoting: Shmerl
Quoting: XpanderI haven't had any issues with it after i disabled C6 power state in the BIOS. also have AGESA 1.0.0.0a BIOS
weird that some need to enable kernel configs and boot parameters.
I updated the first post with some details. Disabling C6 is really a very crude fix (which in case of my MB firmware doesn't even work, since C6 doesn't get disabled). It causes higher CPU temperature and power usage. RCU workaround helps without disabling C6 so the temperature practically isn't affected.
judging by the clock numbers, mine doesnt enable C6 either. i can see the cores go down as far as 1.7Ghz, though they should only go down to 2.2Ghz with C6 disabled. didnt notice any temp increase either. So this is a bit weird one.. Maybe arch adds this by default to the kernel, though i dont see any kernel parameters.

edit: also i had lots of those computer freezes or black screens when i was on 4.14 kernels and the C6 setting didnt help. But i have no idea if 4.15 made it go away or some BIOS/microcode updates.
Shmerl 2 Mar 2018
Quoting: Xpanderedit: also i had lots of those computer freezes or black screens when i was on 4.14 kernels and the C6 setting didnt help. But i have no idea if 4.15 made it go away or some BIOS/microcode updates.
What is your current AGESA version and microcode? You should be able to see the first in the firmware somewhere, and the second like this:

grep microcode /proc/cpuinfo                                                                                                                                        
microcode       : 0x8001129
...


So I currently have 0x8001129.
nox 2 Mar 2018
What kind of freezes are we talking about here?
I have the same ryzen as xpander, and I haven't had any issues to speak of at all. So, this intrigues me!
Shmerl 2 Mar 2018
Complete system freezes, you can't even access the system remotely over ssh when they happen. This is a hardware problem, and not every chip has it. So you might have a good one.
Xpander 3 Mar 2018
Quoting: Shmerl
Quoting: Xpanderedit: also i had lots of those computer freezes or black screens when i was on 4.14 kernels and the C6 setting didnt help. But i have no idea if 4.15 made it go away or some BIOS/microcode updates.
What is your current AGESA version and microcode? You should be able to see the first in the firmware somewhere, and the second like this:

grep microcode /proc/cpuinfo                                                                                                                                        
microcode       : 0x8001129
...


So I currently have 0x8001129.
same version microcode as yours and AGESA 1.0.0.0a
when i had freezes i think i might have had them because of my RAM OC or the BIOS that was just bad.
but yeah no issues since 4.15 kernels, but i updaded BIOS around the same time also, so i dunno which one of those gave me the stability.
SirBubbles 3 Mar 2018
Would this have anything to do with weird momentary freezes when doing just about anything under Gnome-shell? I mean, I'm on ubuntu 17.10 with gnome-shell, nvidia drivers, kernel 4.13.0-36, and I'll often get freezes of around 5-10 seconds at random. No idea of the cause, but I do have a Ryzen 1700 at 3.7 ghz. Any idea if this is the same issue?
(*EDIT* Note that I don't get mystery lock-ups such as you're describing here, so it might be a separate issue.)
GustyGhost 3 Mar 2018
I am affected by this bug. From what I understand, it was only the first few month's production of Ryzen (Summit Ridge) chips. The flaw was supposedly fixed for chips manufactured in the following quarter but don't quote me on that.

All in all, the freezes only occur maybe once a week with moderate use. Not ideal but I'm not about to go recompiling my kernel over it either.
Shmerl 4 Mar 2018
Browsing around my ASRock X370 Taichi firmware settings, I found this one:

Advanced > AMD CBS > Zen Common Options > Power Supply Idle Control.

I changed it from auto to low, let's see if it will help with stock kernel.
Shmerl 4 Mar 2018
Still freezing with "low current idle". Testing now with "common current idle".
lucinos 4 Mar 2018
unrelated (?)

I had freezes on my signature laptop which is intel.
I had never the problem with kernel 4.9 which thankfully was lts so I kept using mostly this instead of the latest (because no freezes). Also never before that with any previous kernel. The problem started with kernel 4.10 and continued until 4.14. In some cases I would go for a few days without a freeze and that would trick me to believe that it was fixed but it was not. But now at last it seems to be fixed at some iteration of 4.14. It is more than two months that I have no freeze with latest kernel. I have used 4.14 and now 4.15 a lot without a single freeze. It is months now, not some days or even weeks so I am starting having quite confidence it is over.
Shmerl 4 Mar 2018
Yep, that's completely unrelated, since Ryzen freezes are AMD specific.
Shmerl 4 Mar 2018
I'm now also seeing a lot of these in dmesg:

[11225.078807] x86: Booting SMP configuration:
[11225.078808] smpboot: Booting Node 0 Processor 1 APIC 0x1
[11225.081035] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[11225.081063]  cache: parent cpu1 should not be sleeping
[11225.081127] microcode: CPU1: patch_level=0x08001129
[11225.081213] CPU1 is up
[11225.081233] smpboot: Booting Node 0 Processor 2 APIC 0x2


And so on for all 16 virtual cores.
Avehicle7887 4 Mar 2018
Yesterday morning I updated my Bios to the latest which included Agesa to version 1000a and "Improve system performance" whatever it may mean. The freezes rarely happen to me so I'm not sure if it helped or not. I'm using Asus Prime B350M-A motherboard.
Shmerl 7 Mar 2018
That C-state 0x0 not supported by HW happens now always, so it's not related to my test above.

With Advanced > AMD CBS > Zen Common Options > Power Supply Idle Control set to "Common current idle" (instead of auto), I didn't get any freezes in a while, so I assume it's a valid workaround.

I noticed what changes after it's set in the firmware, using [zenstates.py](https://github.com/r4m0n/ZenStates-Linux/blob/master/zenstates.py):

When set to auto:

C6 State - Package - Enabled
C6 State - Core - Enabled


when set to Common current idle:

C6 State - Package - Disabled
C6 State - Core - Enabled


So apparently it disables package C6 state (while keeping core C6 state enabled)! Hopefully it can shed some light on what the problem is. I wonder if Ryzen 2 will be free of this issue.

What exactly is "package" in this context? Is it still part of CPU, or it's something on the motherboard?
Xpander 7 Mar 2018
Quoting: ShmerlWhat exactly is "package" in this context? Is it still part of CPU, or it's something on the motherboard?
i think its the whole die, not individual core.

seems i have this enabled:

C6 State - Package - Enabled
C6 State - Core - Enabled


still havent had that freeze bug, go figure then.
Shmerl 7 Mar 2018
Quoting: Xpanderstill havent had that freeze bug, go figure then.
You had it before and then it just stopped happening?
Xpander 7 Mar 2018
Quoting: Shmerl
Quoting: Xpanderstill havent had that freeze bug, go figure then.
You had it before and then is just stopped happening?
yeah i had it with kernel 4.14 some later versions like 4.14.10 or something (can't remember exactly) but it might have been issue with my RAM clocs also not kernel related or BIOS, cause i updated BIOS around same time i built kernel 4.15 and the problems dissapeared,i also took some tighter settings from RAM OC a bit more loose.
Avehicle7887 8 Mar 2018
3 Days since my last report here, so far no freezes at all with the system running well over 10 hours daily. The only update I did was to the Bios. At max it happens maybe once a week, even before I updated the Bios, time will tell but I'm keeping an eye on it.
While you're here, please consider supporting GamingOnLinux on:

Reward Tiers: Patreon Logo Patreon. Plain Donations: PayPal Logo PayPal.

This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!

You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
Login / Register