While you're here, please consider supporting GamingOnLinux on:
Reward Tiers:
Patreon. Plain Donations:
PayPal.
This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!
You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
Reward Tiers:
This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!
You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
Login / Register
- Oh dear - ARC Raiders was logging your private Discord chats [updated]
- Many more US states are planning or already have operating system age verification laws
- Ubuntu and Fedora devs comment on California's new Digital Age Assurance Act
- EA Javelin Anticheat job listing mentions future support for Linux and Proton
- Sony PlayStation reportedly moving away from PC ports
- > See more over 30 days here
Recently Updated
- Recommendations for portable monitor for Steam Deck?
- childermass - Shop Crush - Psychological Horror Thrift Sim with Literal Illusio…
- hollowlimb - Introduce Yourself!
- hollowlimb - Proton/Wine Games Locking Up
- Caldathras - recently released super fun crpg - Sector Unknown
- Jarmer - See more posts
How to setup OpenMW for modern Morrowind on Linux / SteamOS and Steam Deck
How to install Hollow Knight: Silksong mods on Linux, SteamOS and Steam Deck
TL;dr - Anyone with Threadripper or Ryzen hardware still seeing stability problems? Specifically with PCIE errors.
Long version - Between motherboards, power supplies, SSD's and Video cars I've now spent several thousand upgrading to a 1950x, yet still getting hardware problems.
Example errors:
#
pcieport 0000:00:01.1: AER: Corrected error received: id=0000
pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Transmitter ID)
pcieport 0000:00:01.1: device [1022:1453] error status/mask=00001000/00006000
pcieport 0000:00:01.1: [12] Replay Timer Timeout
pcieport 0000:00:01.1: AER: Corrected error received: id=0000
pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000080/00006000
pcieport 0000:00:01.1: [ 7] Bad DLLP
#
# dmesg |grep pciep |grep -- '\['|sort | uniq -c
316 pcieport 0000:00:01.1: [12] Replay Timer Timeout
1689 pcieport 0000:00:01.1: [ 6] Bad TLP
17 pcieport 0000:00:01.1: [ 7] Bad DLLP
1652 pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000040/00006000
17 pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000080/00006000
279 pcieport 0000:00:01.1: device [1022:1453] error status/mask=00001000/00006000
37 pcieport 0000:00:01.1: device [1022:1453] error status/mask=00001040/00006000
46 pcieport 0000:01:00.2: [12] Replay Timer Timeout
46 pcieport 0000:01:00.2: device [1022:43b1] error status/mask=00001000/00002000
# # lspci -vv | egrep '(1453|43b1)'
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b1 (rev 02) (prog-if 00 [Normal decode])
pcilib: sysfs_read_vpd: read failed: Input/output error
40:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
40:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
My concerns are twofold - Getting to the root cause and warranty should this turn out to be hardware related. Trying to find answers from vendors is an exercise in futility. I don't do Windows, at all. I don't even own a pirated copy. Disabling the AER is not an option either since the hardware is throwing errors for a reason. Neither is increasing
Something else odd is they appear to change in frequency based on where the hardware physically is. Maybe RFI / shielding problems?
Hardware:
Ryzen Threadripper 1950x (16 core)
Asus ROG Zenith
2x Samsung SM961 NVMe
2x Samsung Pro 960 SSDs
4x 3TB WD Reds
1x EVGA 1080TI/FTW3
32G DDR4
1x EVGA 1kW Supernova P2
4.13.9 Kernel.
Also tried with and still freaking own - the AORUS board, different RAM, Pro 950's, EVGA 850W PSU, EVGA 1080TI Kingpin (returned due to coil whine. Should have kept it as the FTW3 had whine too. Though there was nothing else wrong, replacing the powersupply fixed it. EVGA wouldn't comment on why).
Basically a beast.
xpander@arch ~ $ dmesg |grep pciep |grep -- '\['|sort | uniq -c1 [ 1.297189] pcieport 0000:00:01.3: AER enabled with IRQ 28
1 [ 1.297204] pcieport 0000:00:03.1: AER enabled with IRQ 29
and what stability issue?
i havent had any issues since april of this year, after few bios versions that fixed all this.
longest uptime has been 15 days only, but i do reboots to update kernel, so i really haven't kept the system going for longer periods.
im on 4.13 as well
Ryzen 1800x
Asus Rog Crosshair VI Hero
PCIe NVME drive
Nvidia GTX 1070
$ dmesg |grep pciep |grep -- '\['|sort | uniq -c1 [ 1.182122] pcieport 0000:00:01.1: AER enabled with IRQ 28
1 [ 1.182150] pcieport 0000:00:01.3: AER enabled with IRQ 29
1 [ 1.182164] pcieport 0000:00:03.1: AER enabled with IRQ 30
Take a look here, maybe its something with Nvidia 1080 cards?
[GTX 1080 Throwing Bad TLP PCIe Bus Errors](https://forums.geforce.com/default/topic/957456/geforce-drivers/gtx-1080-throwing-bad-tlp-pcie-bus-errors/1/)
Good luck
Regarding the geforce URL, it's broken. Spent 20 minutes filling out capchas to no avail. NVIDIA uses Incapsula on their a lot of their sites which breaks them all... and although they use Google anyway, it's only THEM.
I'm guessing the link is like everywhere else, telling people to disable PCI memory mapping (the pci=nommconf), or dropping back to gen3 PCI. Neither are _really_ options. FWIW gen2 does make the errors go away which furthers my suspicion it's a hardware problem. Given the thousands people have spent I'm guessing hardware vendors will not be opening that can of worms.
dmesg |grep pciep |grep -- '\['|sort | uniq -c1 [ 1.216868] pcieport 0000:00:01.1: AER enabled with IRQ 28
1 [ 1.216889] pcieport 0000:00:01.3: AER enabled with IRQ 29
1 [ 1.216904] pcieport 0000:00:03.1: AER enabled with IRQ 30
1 [ 1.216910] pcieport 0000:00:01.1: Signaling PME with IRQ 28
1 [ 1.216917] pcieport 0000:00:01.3: Signaling PME with IRQ 29
1 [ 1.216926] pcieport 0000:00:03.1: Signaling PME with IRQ 30
1 [ 1.216940] pcieport 0000:00:07.1: Signaling PME with IRQ 31
1 [ 1.216956] pcieport 0000:00:08.1: Signaling PME with IRQ 33
For you case I think if you add these kernel parameters in Grub you should be fine :)
pcie_aspm=off
I am using grub Customizer just for easy editing