You can sign up to get a daily email of our articles, see the Mailing List page!
Support us on Patreon to keep GamingOnLinux alive. This ensures we have no timed articles and no paywalls. Just good, fresh content! Alternatively, you can donate through Paypal, Flattr and Liberapay.!
  Go to:
Working around Ryzen CPU freezes
Shmerl commented on 1 March 2018 at 4:40 pm UTC

Many Ryzen CPUs for a long time have been affected by random freezes and reboots, which some managed to narrow down to C6 power states. Even RMA often didn't help with these.

Recently I found an actual kernel bug report about it: https://bugzilla.kernel.org/show_bug.cgi?id=196683

Apparently, AMD said that they are going to release some microcode update (or MB manufacturers are going to release some firmware updates?), which handle this. But until that will happen, you can work around it without disabling C6 states, if you build the kernel with

CONFIG_RCU_NOCB_CPU=y

And then use rcu_nocbs=0-... kernel boot parameter to enable it.

I was bugged by this issue for a long time, and recently decided to build a kernel like above. It happens to be quite easy with Debian. It indeed works around the problem.

Here is an example HOWTO on doing it:
_______________________________________________________________________
If this is useful to anyone, here is what I did on Debian testing:

Install current Linux source package and some tools and libraries:

sudo apt-get install linux-source build-essential kernel-package libncurses5-dev libelf-dev libssl-dev

Note: in my case I pulled linux-source from Debian sid, since the default kernel in testing still didn't get bumped to Linux 4.15.x, even though 4.15 is available in general.

Unpack the source for example to $HOME/build.

I like using explicit tar parameters even though they are long, it just makes it hugely more readable and easy to understand:

linux_ver="4.15" config_ver="4.15.0-1-amd64" linux_src="/usr/src/linux-source-${linux_ver}.tar.xz" build_dir="${HOME}/build" source_dir="${build_dir}/linux-source-${linux_ver}" mkdir -p $build_dir tar --extract --verbose --use-compress-program "xz --decompress" --file ${linux_src} --directory ${build_dir} cd $source_dir

Now you are in the unpacked source directory. You'd need to create proper .config file. My goal was to make only minimal tweaks from the stock Debian kernel, so I copied default config from /boot first:

cp -v /boot/config-${config_ver} ${source_dir}/.config

Now, you need to enable the actual workaround. There is a useful tool for configuring kernel parameters - menuconfig (that's why libncurses5-dev was needed above).

make menuconfig

It will build the tool in place and will run it. Find RCU options under General Setup > RCU Subsystem

image

image

And there enable "Make expert-level adjustments to RCU configuration" and "Offload RCU callback processing from boot-selected CPUs"

image

Then select Save (into .config), and a few times Exit to exit the tool.

One more thing is needed - comment out CONFIG_SYSTEM_TRUSTED_KEYS in resulting ${source_dir}/.config, otherwise the build will fail.

perl -p -i -e 's/^CONFIG_SYSTEM_TRUSTED_KEYS=/#CONFIG_SYSTEM_TRUSTED_KEYS=/' .config

Now you are ready to build it (I chose suffixes -rcu, and -1 for versions):

make -j$(nproc) deb-pkg LOCALVERSION=-rcu KDEB_PKGVERSION=$(make kernelversion)-1

Press Enter a few times to complete the configuration caused by the modifications, and it should proceed with the build. After it completes, you should have the ready package in $build_dir:

cd .. ls -1 linux-image* linux-image-4.15.4-rcu_4.15.4-1_amd64.deb linux-image-4.15.4-rcu-dbg_4.15.4-1_amd64.deb

Install the result:

sudo dpkg -i linux-image-4.15.4-rcu_4.15.4-1_amd64.deb

Now open /etc/default/grub with your editor (under sudo), and add this to your kernel boot parameters (assuming you have 8 core / 16 thread processor).

GRUB_CMDLINE_LINUX_DEFAULT="... rcu_nocbs=0-15"

... here means whatever was there before the change, don't remove what was there, just add the new parameter after a space!

Save and update system grub:

sudo update-grub

That's it. Reboot the system and you are ready to use the workaround kernel.

Xpander commented on 1 March 2018 at 5:31 pm UTC

I haven't had any issues with it after i disabled C6 power state in the BIOS. also have AGESA 1.0.0.0a BIOS
weird that some need to enable kernel configs and boot parameters. My CPU is Launch one so it should in theory have that segfault bug also when lots of parallel compiles running. Though i have ran that ryzen-kill test for 2 hours 2 times and one time 4 hours with no issues.

xpander@arch ~ $ uptime 19:29:19 up 7 days, 21:01, 1 user, load average: 1.52, 1.43, 0.85
xpander@arch ~ $ inxi -F System: Host: arch Kernel: 4.15.4-1-ARCH x86_64 bits: 64 Desktop: MATE 1.20.0 Distro: Arch Linux Machine: Device: desktop Mobo: ASUSTeK model: PRIME X370-PRO v: Rev X.0x serial: N/A UEFI [Legacy]: American Megatrends v: 3803 date: 01/22/2018 CPU: 8 core AMD Ryzen 7 1700X Eight-Core (-MT-MCP-) cache: 4096 KB clock speeds: max: 3925 MHz 1: 1881 MHz 2: 1813 MHz 3: 2691 MHz 4: 2382 MHz 5: 1842 MHz 6: 1855 MHz 7: 1862 MHz 8: 1873 MHz 9: 2958 MHz 10: 2413 MHz 11: 1709 MHz 12: 1710 MHz 13: 1727 MHz 14: 1716 MHz 15: 1710 MHz 16: 1711 MHz Graphics: Card: NVIDIA GP104 [GeForce GTX 1070] Display Server: x11 (X.Org 1.19.6 ) driver: nvidia Resolution: 1920x1080@60.00hz, 2560x1440@143.91hz OpenGL: renderer: GeForce GTX 1070/PCIe/SSE2 version: 4.6.0 NVIDIA 390.25 Audio: Card-1 NVIDIA GP104 High Def. Audio Controller driver: snd_hda_intel Card-2 M-Audio driver: USB Audio Card-3 Focusrite-Novation Focusrite Scarlett 2i2 driver: USB Audio Sound: Advanced Linux Sound Architecture v: k4.15.4-1-ARCH Network: Card: Intel I211 Gigabit Network Connection driver: igb IF: enp8s0 state: up speed: 1000 Mbps duplex: full mac: 60:45:cb:9a:09:31 Drives: HDD Total Size: 7751.6GB (64.3% used) ID-1: /dev/nvme0n1 model: Samsung_SSD_960_EVO_250GB size: 250.1GB ID-2: /dev/sdc model: ST3000DM001 size: 3000.6GB ID-3: /dev/sda model: Samsung_SSD_850 size: 500.1GB ID-4: /dev/sdb model: ST2000DM001 size: 2000.4GB ID-5: /dev/sdd model: ST2000DM001 size: 2000.4GB Partition: ID-1: / size: 230G used: 129G (59%) fs: ext4 dev: /dev/nvme0n1p1 Sensors: System Temperatures: cpu: 38.0C mobo: 29.0C gpu: 31C Fan Speeds (in rpm): cpu: 0 sys-1: 1163 sys-2: 919 Info: Processes: 345 Uptime: 7 days Memory: 11726.2/32166.7MB Client: Shell (bash) inxi: 2.3.56

Shmerl commented on 1 March 2018 at 6:24 pm UTC

XpanderI haven't had any issues with it after i disabled C6 power state in the BIOS. also have AGESA 1.0.0.0a BIOS
weird that some need to enable kernel configs and boot parameters.

I updated the first post with some details. Disabling C6 is really a very crude fix (which in case of my MB firmware doesn't even work, since C6 doesn't get disabled). It causes higher CPU temperature and power usage. RCU workaround helps without disabling C6 so the temperature practically isn't affected.

Xpander commented on 1 March 2018 at 8:08 pm UTC

Shmerl
XpanderI haven't had any issues with it after i disabled C6 power state in the BIOS. also have AGESA 1.0.0.0a BIOS
weird that some need to enable kernel configs and boot parameters.

I updated the first post with some details. Disabling C6 is really a very crude fix (which in case of my MB firmware doesn't even work, since C6 doesn't get disabled). It causes higher CPU temperature and power usage. RCU workaround helps without disabling C6 so the temperature practically isn't affected.

judging by the clock numbers, mine doesnt enable C6 either. i can see the cores go down as far as 1.7Ghz, though they should only go down to 2.2Ghz with C6 disabled. didnt notice any temp increase either. So this is a bit weird one.. Maybe arch adds this by default to the kernel, though i dont see any kernel parameters.

edit: also i had lots of those computer freezes or black screens when i was on 4.14 kernels and the C6 setting didnt help. But i have no idea if 4.15 made it go away or some BIOS/microcode updates.

Shmerl commented on 2 March 2018 at 1:59 pm UTC

Xpanderedit: also i had lots of those computer freezes or black screens when i was on 4.14 kernels and the C6 setting didnt help. But i have no idea if 4.15 made it go away or some BIOS/microcode updates.

What is your current AGESA version and microcode? You should be able to see the first in the firmware somewhere, and the second like this:

grep microcode /proc/cpuinfo microcode : 0x8001129 ...

So I currently have 0x8001129.

nox commented on 2 March 2018 at 7:25 pm UTC

What kind of freezes are we talking about here?
I have the same ryzen as xpander, and I haven't had any issues to speak of at all. So, this intrigues me!

Shmerl commented on 2 March 2018 at 7:41 pm UTC

Complete system freezes, you can't even access the system remotely over ssh when they happen. This is a hardware problem, and not every chip has it. So you might have a good one.

Xpander commented on 3 March 2018 at 12:21 am UTC

Shmerl
Xpanderedit: also i had lots of those computer freezes or black screens when i was on 4.14 kernels and the C6 setting didnt help. But i have no idea if 4.15 made it go away or some BIOS/microcode updates.

What is your current AGESA version and microcode? You should be able to see the first in the firmware somewhere, and the second like this:

grep microcode /proc/cpuinfo microcode : 0x8001129 ...

So I currently have 0x8001129.

same version microcode as yours and AGESA 1.0.0.0a
when i had freezes i think i might have had them because of my RAM OC or the BIOS that was just bad.
but yeah no issues since 4.15 kernels, but i updaded BIOS around the same time also, so i dunno which one of those gave me the stability.

SirBubbles commented on 3 March 2018 at 1:24 pm UTC

Would this have anything to do with weird momentary freezes when doing just about anything under Gnome-shell? I mean, I'm on ubuntu 17.10 with gnome-shell, nvidia drivers, kernel 4.13.0-36, and I'll often get freezes of around 5-10 seconds at random. No idea of the cause, but I do have a Ryzen 1700 at 3.7 ghz. Any idea if this is the same issue?
(*EDIT* Note that I don't get mystery lock-ups such as you're describing here, so it might be a separate issue.)

GustyGhost commented on 3 March 2018 at 11:51 pm UTC

I am affected by this bug. From what I understand, it was only the first few month's production of Ryzen (Summit Ridge) chips. The flaw was supposedly fixed for chips manufactured in the following quarter but don't quote me on that.

All in all, the freezes only occur maybe once a week with moderate use. Not ideal but I'm not about to go recompiling my kernel over it either.

Shmerl commented on 4 March 2018 at 12:28 am UTC

Browsing around my ASRock X370 Taichi firmware settings, I found this one:

Advanced > AMD CBS > Zen Common Options > Power Supply Idle Control.

I changed it from auto to low, let's see if it will help with stock kernel.

  Go to:

Due to spam you need to Register and Login to comment.


Or login with...

Livestreams & Videos
Community Livestreams
  • Story Time: „Dreamfall Chapters“
  • Date:
See more!
Popular this week
View by Category
Contact
Latest Comments
Latest Forum Posts