Skip to main content
Topic: Random Freezes (Read 1647 times) previous topic - next topic
0 Members and 3 Guests are viewing this topic.

Random Freezes

Hi,

I'm only using Artix for a month or so now. However I've experienced the same issues before when I was using Arch.
This PC is rather new and I built it myself (which I actually did for the first time) and since I use it it randomly freezes and I have to hard reboot it. Actually, after I installed Artix, I wasn't experiencing this issue anymore. But I installed the lib32 nvidia package it started happening again.

I've tried going through the Arch wiki and searching for solutions on other sites but I couldn't find a solution. Back when I was on Arch I tried to read the system log after one such "crash" (I think I used journalctl) and found an error code. However I don't have that saved anywhere anymore. I found that error code on nvidia's site though, but there wasn't really anything that helped me there either.
I will try if I can find it again later but I'll probably have to wait for the next freeze to do that.

There seem to be different types of this error. Sometimes it occurs when I start a new application, sometimes singular pixel glitch boxes appear without anything happening (usually only on the background though, which I use nitrogen for) which then results in the whole screen being covered by them later and sometimes it just happens directly after logging in to qtile with sddm.





Those two images are from a crash that happened randomly during runtime. I also can't move the mouse cursor or do anything else but restarting my PC during this window. At first the sound cut off as well, but then resumed playing indefinitely.





And those are the two states it switched between when the error occurred directly after starting my wm.
Notably in this case I could still kinda use the tty to login and restart my PC. I haven't tried restarting X or anything else besides using reboot there though.
The glitches here look different but before I was sent back into the tty there were glitches that looked like those above.

I'm running Artix openrc with the zen kernel (though it happened before I was using zen and artix as well). I have an AMD Ryzen 7 5800X processor and an Nvidia GeFroce GTX 1660 Ti GPU. I'm unsure whether the error is caused by software or hardware (e.g. me not handling something correctly when I built this PC). It could also be caused by qtile but I'm not sure about it.

Any help is appreciated. Thank you for reading through this rather long post.

I expect that the information given is not enough to help me, but I'm still a bit unfamiliar with Artix, OpenRC and Linux in general. So please tell what information you need in addition.

Re: Random Freezes

Reply #1
I'm unsure whether the error is caused by software or hardware (e.g. me not handling something correctly when I built this PC).
This looks like a hardware error. It could be related to a number of hardware issues, not only related to video card and monitor (and VGA and power cables), but also to CPU, motherboard, etc.

Re: Random Freezes

Reply #2
This looks like a hardware error. It could be related to a number of hardware issues, not only related to video card and monitor (and VGA and power cables), but also to CPU, motherboard, etc.
Thanks for the quick answer. I have used the monitor and cables with another PC without any problem. Could it be that I haven't applied enough thermal paste on my CPU? I was unsure of that at the time. The parts I bought were all new so it is either a manufacturing error, damage due to transportation or my own mistake I suppose. I will check the connections as well though.

Re: Random Freezes

Reply #3
To me this feels like a monitor issue, do you have a spare monitor you could try and swap out  for?

Insufficient thermal paste on the CPU won't do this, it'll just cause your CPUs internal throttle to kick in and reduce performance.

Another thing you could try, did you put your GPU in the right slot? This is usually the big PCIE slot closest to your mobo (it has the highest bandwidth),  but check your motherboard manual to be sure.

Also, what PSU are you using, you could be delivering insufficient power.

Re: Random Freezes

Reply #4
To me this feels like a monitor issue, do you have a spare monitor you could try and swap out  for?

Insufficient thermal paste on the CPU won't do this, it'll just cause your CPUs internal throttle to kick in and reduce performance.

Another thing you could try, did you put your GPU in the right slot? This is usually the big PCIE slot closest to your mobo (it has the highest bandwidth),  but check your motherboard manual to be sure.

Also, what PSU are you using, you could be delivering insufficient power.
I'm not too sure if it's a monitor issue since I'm using two monitors and I have used both of them with my old PC, even after I got the new one. The connections to the monitor could be a problem, but they shouldn't be, since they're DP and HDMI. I would use DP for both, but I'm missing another cable. Another weird thing is that when those errors appear without freezing the PC immediately they seem to be on the layer of my wallpaper. So if another window is above them they're gone. They can also be captured by screen recording.

The GPU already is in the slot closest to the CPU. I can't really use the other slot though, since it's blocked by the SSDs (I had to mount them like that because one cable didn't fit otherwise).

My PSU should have enough power. I used an online calculator to find out how much power my components need and even went slightly above that. The PSU has 550W. However, since it's a modular PSU, the cables might not be connected properly.

Re: Random Freezes

Reply #5





Those are some screenshots of the errors that appear right now.

Edit: They disappear after restarting nitrogen. And for context: in the last image I tried putting my terminal above one such error.

Re: Random Freezes

Reply #6
The only other thing I can see is removing the config file on the arch wiki, but that was for a slightly different error, still removing the config file (make sure to back it up first!) couldn't hurt?

Re: Random Freezes

Reply #7
The only other thing I can see is removing the config file on the arch wiki, but that was for a slightly different error, still removing the config file (make sure to back it up first!) couldn't hurt?
I'll try that for now. The error might still be related to hardware or maybe my window manager having issues with something (like the nvidia drivers). When I had the same error on arch I was using qtile as well and about the same config too.

Re: Random Freezes

Reply #8
after I installed Artix, I wasn't experiencing this issue anymore. But I installed the lib32 nvidia package it started happening again.

Seems related to lib32 nvidia package(s) you installed. I would go back to check which package(s) got installed and why installed. Try to remove them to see if the problem persists. Also, remove any xorg.conf file in /etc/X11 & /usr/share/X11.

You have another video card in your system by any chance?

 

Re: Random Freezes

Reply #10
Seems related to lib32 nvidia package(s) you installed. I would go back to check which package(s) got installed and why installed. Try to remove them to see if the problem persists. Also, remove any xorg.conf file in /etc/X11 & /usr/share/X11.
It seems related, however I'm not sure if it actually is. On my Arch install it happened even without those packages (if I remember it correctly at least). I will try uninstalling that if I get another freeze though.

Quote
You have another video card in your system by any chance?
And no, I only have the video card I listed in that system.

Re: Random Freezes

Reply #11
Looks like that card is supported in Nouveau to some degree but NV160 still has several TODO items:
https://nouveau.freedesktop.org/FeatureMatrix.html
https://www.phoronix.com/scan.php?page=news_item&px=Nouveau-GTX-16-Support
According to that Phoronix article the required firmware microcode could be different for nouveau and nvidia drivers?
I see, that could be a problem as well. Could my GPU be not that well supported in general? I'm now running the nouveau drivers but I have used the nvidia drivers before and the same error occurred.

Re: Random Freezes

Reply #12
It's possible for nouveau and nvidia to have the same bugs. Often this sort of situation comes down to thinking up tests to try and narrow down the cause, work through the cheap / easy / most likely options, ie as you have 2 monitors, try one at a time, if it happens on both it's probably not them, although if you were unlucky they are both faulty or not compatible with that card. You can try some new testing kernel for the latest code possible, or some old one in case it is a recent bug, boot a live USB from some other distro, see if you can make the problem better or worse.
 With thermal paste, personally I just put a thin covering over the top of the chip(s) with a plastic spreader, any excess squishes out. Too much excess using electrically conductive pastes can be bad news, just messy with non-conductive types.  ;D  If you do a test run then take off the heatsink you can see how it went, then clean up and redo it, but I also agree that is very unlikely to be the cause of those symptoms.
 Checking the logs for errors would be a good idea too, possibly you need to install some syslog package first.

Re: Random Freezes

Reply #13
I didn't get another freeze yesterday when I tried using dwm instead of qtile. However I just got one immediately after sddm started again. This time I saved the output of dmesg and found some lines that might be related:

Code: [Select]
[    9.698774] NVRM: GPU at PCI:0000:2b:00: GPU-5ad82855-a427-6810-21ce-41d8dacfae75
[    9.698778] NVRM: Xid (PCI:0000:2b:00): 62, pid=2268, 21b5(3200) 00000000 00000000
[    9.934829] NVRM: Xid (PCI:0000:2b:00): 13, pid=2268, Graphics Exception: Shader Program Header 1 Error
[    9.934843] NVRM: Xid (PCI:0000:2b:00): 13, pid=2268, Graphics Exception: Shader Program Header 2 Error
[    9.934854] NVRM: Xid (PCI:0000:2b:00): 13, pid=2268, Graphics Exception: Shader Program Header 9 Error
[    9.934864] NVRM: Xid (PCI:0000:2b:00): 13, pid=2268, Graphics Exception: Shader Program Header 11 Error
[    9.934874] NVRM: Xid (PCI:0000:2b:00): 13, pid=2268, Graphics Exception: ESR 0x405840=0xa0000a06
[    9.934888] NVRM: Xid (PCI:0000:2b:00): 13, pid=2268, Graphics Exception: ESR 0x405848=0x80000000
[    9.935221] NVRM: Xid (PCI:0000:2b:00): 13, pid=2279, Graphics Exception: ChID 0013, Class 0000c597, Offset 00000100, Data 00000000
[   10.462104] NVRM: Xid (PCI:0000:2b:00): 31, pid=2189, Ch 00000009, intr 00000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST faulted @ 0x1_0032f000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[   11.474770] NVRM: Xid (PCI:0000:2b:00): 31, pid=2189, Ch 00000009, intr 00000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST faulted @ 0x1_0032f000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[   12.485803] NVRM: Xid (PCI:0000:2b:00): 31, pid=2189, Ch 00000009, intr 00000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST faulted @ 0x1_0032f000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[   13.496695] NVRM: Xid (PCI:0000:2b:00): 31, pid=2189, Ch 00000009, intr 00000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST faulted @ 0x1_0032f000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[   16.539884] NVRM: Xid (PCI:0000:2b:00): 45, pid=2189, Ch 00000000
[   16.540399] NVRM: Xid (PCI:0000:2b:00): 45, pid=2189, Ch 00000001
[   16.540819] NVRM: Xid (PCI:0000:2b:00): 45, pid=2189, Ch 00000008
[   16.541228] NVRM: Xid (PCI:0000:2b:00): 45, pid=2189, Ch 00000009
[   16.541626] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000010
[   16.542028] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000011
[   16.542426] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000012
[   16.542825] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000016
[   16.543228] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000017
[   16.543626] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000018
[   16.544028] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000019
[   16.544426] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 0000001a
[   16.544825] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 0000001b
[   16.922957] random: crng init done
[   16.922958] random: 3 urandom warning(s) missed due to ratelimiting
[   17.234353] elogind[1417]: Removed session c1.
[   17.280196] NVRM: Xid (PCI:0000:2b:00): 45, pid=2189, Ch 00000000
[   17.280639] NVRM: Xid (PCI:0000:2b:00): 45, pid=2189, Ch 00000001
[   17.281045] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000010
[   17.281447] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000011
[   17.281852] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000012
[   17.282254] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000017
[   17.282666] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 00000018
[   17.283067] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 0000001a
[   17.283472] NVRM: Xid (PCI:0000:2b:00): 45, pid=2279, Ch 0000001b
[   17.293604] NVRM: Xid (PCI:0000:2b:00): 45, pid=2189, Ch 00000000
[   17.294023] NVRM: Xid (PCI:0000:2b:00): 45, pid=2189, Ch 00000001
[   26.386298] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[   26.386330] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0
[   30.391018] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[   30.391049] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0
[   36.514882] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[   36.514909] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0
[   40.519306] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[   40.519337] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0
[   43.841829] elogind[1417]: New session 3 of user myu.
[  166.984991] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[  166.985023] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0
[  170.988104] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[  170.988133] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0
[  177.111425] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[  177.111457] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0
[  181.114456] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[  181.114490] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0
[  187.152510] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[  187.152544] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0
[  191.155570] NVRM: GPU 0000:2b:00.0: RmInitAdapter failed! (0x23:0x65:1204)
[  191.155597] NVRM: GPU 0000:2b:00.0: rm_init_adapter failed, device minor number 0

"[   43.841829] elogind[1417]: New session 3" was caused by me restarting sddm, which made the screens show nothing (besides visual glitches). Notably it seemed like the glitches were mirrored, but I'd have to look into that again the next time it happens. I also looked up "RmInitAdapter failed" but in hindsight it's probably just an error that occurred due to other errors. If I got it right that error is caused by a timeout from the nvidia driver waiting for the GPU.

https://forum.level1techs.com/t/is-my-graphics-card-dead-or-am-i-missing-something/161787/5 I also found this and the pattern looks awfully close to the one I usually had (at least on errors during runtime). Found that by looking into the "Graphics Exceptions", like that one at "[    9.934829]".

It's possible for nouveau and nvidia to have the same bugs. Often this sort of situation comes down to thinking up tests to try and narrow down the cause, work through the cheap / easy / most likely options, ie as you have 2 monitors, try one at a time, if it happens on both it's probably not them, although if you were unlucky they are both faulty or not compatible with that card. You can try some new testing kernel for the latest code possible, or some old one in case it is a recent bug, boot a live USB from some other distro, see if you can make the problem better or worse.
I feel like using a live USB doesn't really help much since those freezes are very random and there is quite some time in between some of them. Furthermore I used the same hardware setup with preeetty much the same software on arch before and the same error occurred as well. Although testing an installation without any nvidia drivers could be interesting as well. It wouldn't really solve the problem though because I kinda want to use them (initially I wanted to buy an AMD card anyway, but I didn't find a good one for a decent price). The issue could also be caused by some other components not playing nice with the nvidia card but I don't know how likely that actually is.
With thermal paste, personally I just put a thin covering over the top of the chip(s) with a plastic spreader, any excess squishes out. Too much excess using electrically conductive pastes can be bad news, just messy with non-conductive types.  ;D  If you do a test run then take off the heatsink you can see how it went, then clean up and redo it, but I also agree that is very unlikely to be the cause of those symptoms.
My only concerns were that I pup on to little thermal paste. I'm not good at estimating sizes and distances and wasn't sure if I got it as stated in the manual. I'll probably put that at about the end of the list of things I want to check.

Re: Random Freezes

Reply #14
The suggestion of using another distro, perhaps from a live usb, you could install it if you preferred, is to use totally different software and versions. Arch and Artix with the same DE are going to share ~90% bugs. Devuan stable on another desktop will have ~90% different bugs. So if the problem goes away it shows it's more likely to be software not hardware or vice versa if  it's still there.