Support us on Patreon to keep GamingOnLinux alive. This ensures we have no timed articles and no paywalls. Just good, fresh content! Alternatively, you can donate through PayPal, Flattr, Liberapay or Buy us a Coffee. You can also buy games using our partner links for GOG and Humble Store.
PC freeze with Vega and should I buy a RTX 2060 super?
Page: 1/2»
  Go to:
Dax Tailor 6 March 2020 at 7:46 pm UTC

Hi every one.
A while ago I build a new Ryzen 3000 PC with a Vega 56 GPU. For month now my PC is freezing randomly. With freezing I mean nothing works anymore, no virtual consol, no ping, no kernel sysreq. Have to do a hw reset. And there is never an error in the kernel log.
I tried everything I could find on the internet (BIOS changes, Kernel Boot options etc.) but nothing helps. Even the RAM is currently on the low default clock.
The main problem is, I can't reproduce it. But it happens always when a game is running. It never happens when I watch videos for example. So I don't think this is the C6 state problem. Even games with a low CPU and GPU usage, so it is not related to a high power usage.
About 2 weeks ago I watched a video from AdoreTV and he sad that this happens on Windows too. Because I thought that is a Linux problem I never looked into Windows related search results
However, the Windows problem is related to the Vega and Navi GPUs, not the Ryzen CPU.
I really hate so say that, but I'm very close to buy a NVidia (2060super) card. If this problem does not go away very soon I might even give up gaming on the PC. (Which is a bit of a pity with at least 30 not played games on steam, but I already found a new hobby, building Lego-kind sets, and I'm 53.)
Anyway, I think my question at the moment is not so much about how to fix the problem, although I might try if someone has an idea, but is there any known issue with a RTX 2060 super on linux at the moment? Don't want to spend over €450,- when there are other big problems.
Currently running Manjaro unstable? with the 5.5.7 kernel.
Thanks for reading this,
Bye

sr_ls_boy 6 March 2020 at 8:18 pm UTC

You could try to ssh into your computer to gather some more information.

Dax Tailor 6 March 2020 at 8:45 pm UTC

You mean when its frozen?
I tried this, there is no connection possible, even a ping is not working anymore.
For some testing I let a a little script read out the current GPU state from /sys directory and let it shown on the other computer every second. When the gaming PC stop working, the update stops too.
Btw, the other computer is a DELL laptop with a pentium 3M, with 512MB RAM, 64GB PATA-SSD and runs Mint with an i3 WM. (Thought that a little bit funny to mention.)

sr_ls_boy 6 March 2020 at 9:06 pm UTC

Maybe ssh in the other direction and dump your kernel logs on your dell laptop.
I would ask on the mesa issues page for help.

EDIT
Give us something to read. Start a game and then post the contents of dmesg.
Tell us about your graphics stack. I don't have manjaro. What version of mesa?
What version of libdrm? How about the contents of /var/log/Xorg0.log? Do you
use ACO as your shader compiler?

Last edited by sr_ls_boy on 6 March 2020 at 9:17 pm UTC

Dax Tailor 6 March 2020 at 9:19 pm UTC

Hmm, that I have not thought about. I have to check out how to write the logs over the LAN.
I'm not sure how much part of the driver is in mesa (at least the 3D part). Can mesa even crash the kernel? But it is worth at to look there. I can't recall that any mesa related came up by my google search.
Thanks

sr_ls_boy 6 March 2020 at 9:20 pm UTC

Give us a dmesg and a /var/log/Xorg0.log if able.

Dax Tailor 6 March 2020 at 9:45 pm UTC

Ok, you ask for it;)
(But keep in mind the problem is there for month now and the first entries I found during google serach are over 2 years old.)

Mesa 20.0.1
libdrm 2.4.100
ACO I don't think. The only package I found (mesa-aco) is not installed. I guess it's LLVM then?
The kernel mode line: oops=panic udev.log_priority=3 audit=0 amdgpu.ppfeaturemask=0xffffffff amdgpu.vm_debug=1 amdgpu.gpu_recovery=1 processor.max_cstate=3 rcu_nocbs=all

The Xorg log is long to post it here, but mostly Modelines from AMDGPU. Nothing unusual I would say.
This log has to go to the laptop too.

This look a little bit odd. At the end of the Xorg log is this:
[ 11880.078] (II) AMDGPU(0): EDID vendor "GSM", prod id 30436
[ 11880.078] (II) AMDGPU(0): Using EDID range info for horizontal sync
[ 11880.078] (II) AMDGPU(0): Using EDID range info for vertical refresh
[ 11880.078] (II) AMDGPU(0): Printing DDC gathered Modelines:
[ 11880.078] (II) AMDGPU(0): Modeline "3440x1440"x0.0  319.75  3440 3488 3520 3600  1440 1443 1453 1481 +hsync -vsync (88.8 kHz eP)
[ 11880.078] (II) AMDGPU(0): Modeline "3440x1440"x0.0  429.80  3440 3584 3680 3880  1440 1448 1452 1476 +hsync -vsync (110.8 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "3440x1440"x0.0  157.75  3440 3488 3520 3600  1440 1443 1453 1461 +hsync -vsync (43.8 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "2560x1080"x0.0  185.58  2560 2624 2688 2784  1080 1083 1093 1111 -hsync -vsync (66.7 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1280x720"x0.0   74.25  1280 1390 1430 1650  720 725 730 750 +hsync +vsync (45.0 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "720x480"x0.0   27.00  720 736 798 858  480 489 495 525 -hsync -vsync (31.5 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1920x1080"x0.0  148.50  1920 2008 2052 2200  1080 1084 1089 1125 +hsync +vsync (67.5 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "640x480"x0.0   25.18  640 656 752 800  480 490 492 525 -hsync -vsync (31.5 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1920x1080"x0.0  148.50  1920 2448 2492 2640  1080 1084 1089 1125 +hsync +vsync (56.2 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1280x720"x0.0   74.25  1280 1720 1760 1980  720 725 730 750 +hsync +vsync (37.5 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "720x576"x0.0   27.00  720 732 796 864  576 581 586 625 -hsync -vsync (31.2 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 628 +hsync +vsync (37.9 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "640x480"x0.0   31.50  640 656 720 840  480 481 484 500 -hsync -vsync (37.5 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1280x1024"x0.0  135.00  1280 1296 1440 1688  1024 1025 1028 1066 +hsync +vsync (80.0 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1024x768"x0.0   78.75  1024 1040 1136 1312  768 769 772 800 +hsync +vsync (60.0 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 777 806 -hsync -vsync (48.4 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "832x624"x0.0   57.28  832 864 928 1152  624 625 628 667 -hsync -vsync (49.7 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "800x600"x0.0   49.50  800 816 896 1056  600 601 604 625 +hsync +vsync (46.9 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1152x864"x0.0  108.00  1152 1216 1344 1600  864 865 868 900 +hsync +vsync (67.5 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1152x864"x60.0   81.75  1152 1216 1336 1520  864 867 871 897 -hsync +vsync (53.8 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1280x1024"x0.0  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +hsync +vsync (64.0 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1600x900"x59.9  118.25  1600 1696 1856 2112  900 903 908 934 -hsync +vsync (56.0 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1680x1050"x0.0  146.25  1680 1784 1960 2240  1050 1053 1059 1089 -hsync +vsync (65.3 kHz e)
[ 11880.078] (II) AMDGPU(0): Modeline "1280x800"x0.0   83.50  1280 1352 1480 1680  800 803 809 831 -hsync +vsync (49.7 kHz e)


Repeated 3 times with different time stamps.

Searching the mesa issue side I found this: Random crash on amdgpu due to temperature missrepoorting
Sounds interesting. I will try what he/she wrote to log this.

Thanks.

sr_ls_boy 6 March 2020 at 10:01 pm UTC

Try comment 23 and set GALLIUM_DDEBUG.
Also consider posting dmesg and the Xorg log and use the spoiler tags. I get those modelines as well.

damarrin 7 March 2020 at 11:29 am UTC

Can you borrow an Nvidia card off someone? If it works ok you'll know what's what.

Dax Tailor 8 March 2020 at 8:24 am UTC

@damarrian
I have an old GTX970 I tried. Because the freezing is not reproducible, and the RTX 2060 has certainly a different driver I can't tell with the old card if the RTX will work or not. I used the GTX970 for a couple of years in my old Intel based PC with Linux Mint and Arch Linux and never run into this kind of problems. And I don't nobody with an RTX card.

@sr_ls_boy
I tried what was suggested in the comment 23 and set the GALLIUM_DDEBUG. Played some games yesterday and let some games run in demo mode. But the PC never froze. Not sure if the problem solved itself or not. There is one difference, I'm running the 5.5.8 kernel now (came with a Manjaro update) and according to the kernel change log there are some things fixed in the AMDGPU driver.
The comment 23 suggested to run this commands if the error occurs:
sudo umr -lb > umr_dump
sudo umr -O verbose,use_colour -R gfx[.] >> umr_dump
sudo umr -O halt_waves,use_colour -wa >> umr_dump

I tried this and the 2nd one instantly reboots my PC. (This commands are not working with zsh by the way.)
However, I don't now how to run this when the PC is frozen.
I'm still not so sure this is a driver problem. I mean the Windows driver is based on an other source code. I'm not sure but I think the driver developer for Windows and Linux are two different teams.

I will post more, if I have more info on this. Not sure how many people with an AMD Vega card are reading this post and have the same problems. According to a poll from Hardware Unboxed Can We Still Recommend Radeon GPUs? there are about 19% AMD GPU users with problems.
I forgot to post the link to Still Something Wrong At Radeon from AdroedTV.

Thanks for your help.
Dax

damarrin 8 March 2020 at 8:44 am UTC

I’m pretty sure a 20xx card uses the same driver as a 9xx card.

You need to Register and Login to comment, submit articles and more.


Or login with...

Popular this week
View by Category
Contact
Latest Comments
Latest Forum Posts