Skip to main content
Topic: Frequent freezes related to amdgpu and kworker (Read 190 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

Frequent freezes related to amdgpu and kworker

I have upgraded on the 20th and ever since I have frequent freezes of plasmashell, input, etc.
The system is not totally gone...if I get the IP address, I can log into it via SSH after the fact.
But doing the wrong thing in there shuts me out as well.

Unfortunately it's the kind of issue where the system is so far gone that the log files aren't written properly, but based on what I've seen in dmesg via SSH, I have narrowed it down to this issue or something similar to it.

If I do get to see dmesg output before the reset, it does talk about amdgpu freezes, lockups, there are stack traces and everything.
Unfortunately, it's so f*cked up not even REISUB will reboot the system properly. It does bring the system down to text level, but instead of rebooting, it just shows dmesg logging 120 seconds of lockup/wait and everything being stuck.

I did try the solution from other threads, turning PSR off for the module, and that increased the amount of time I get with the system...but it will still crash eventually.

There does not seem to be any particular cause.

The curious thing is: The amdgpu module wasn't actually part of the update. According to pacman.log, I the graphics-adjacent things I got were multiple mesa 1:24.3.1-3 related ones, multiple vulkan 1:1.4.303-1 related ones, including, but not limited to, vulkan-radeon-related updates, and a minor kernel update from 6.12.1 to 6.12.4.

The lack of logging makes this really stupid to debug.
The only thing I have is from bootup, informing me that the VESA driver failed to load
Code: [Select]
(EE) Failed to load /usr/lib/xorg/modules/drivers/vesa_drv.so: /usr/lib/xorg/modules/drivers/vesa_drv.so: undefined symbol: VBESetModeParameters
...but AMDGPU runs with radeonsi drivers following that, so it doesn't seem to be related/fatal.

Did anybody experience anything similar recently?
Any ideas how I can force the logs to be written out even in case of a CPU lockup?

Re: Frequent freezes related to amdgpu and kworker

Reply #1
First thing to try would be to revert the kernel to 6.12.1 or install linux-lts and boot from that.

artist
Linux is simple; use Artix, or Submit Your System To Evil Malicious D(a)emons

Re: Frequent freezes related to amdgpu and kworker

Reply #2
Downgrading the kernel unfortunately did not help.
In fact, after the freeze confirming that it didn't work, I got a second freeze while trying to post this very post.

I did, however, manage to catch dmesg output this time: https://paste.debian.net/hidden/5c37d424/
When my posting was interrupted, the system was apparently immediately so far gone that despite a declaration to the contrary, amdgpu's IP state wasn't even dumped:
Code: [Select]
[   25.174450] elogind[1677]: Watching system buttons on /dev/input/event14 (VEIKK Keyboard)
[ 1050.266527] amdgpu 0000:06:00.0: amdgpu: Dumping IP State
[ 1077.336532] elogind[1677]: New session 2 of user REDACTED.
[ 1096.644708] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 2100716444 wd_nsec: 2100715663
[ 1115.066345] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 10822445632 wd_nsec: 10822440979

With the additional information, however, I have found a new thread on Arch's forums that sounds similar: https://bbs.archlinux.org/viewtopic.php?id=302000
The poster there points out that apparently there's a known issue with Mesa 24.3.x and amdgpu, to the point where the second post in that thread describes my situation pretty accurately:

Quote
I also have an AMD APU here with integrated AMD graphics.
Since mesa update to 24.3.1 the system keeps freezing.
Downgrade the packages 'mesa' to 24.2.7-x and 'vulkan-radeon' to 24.2.7-x fixes that problem for now.
Of course this is not a permanent solution.

But the community seems to think the bug has been fixed. KDE users with Wayland were affected. [...]
So I'm gonna revert Mesa to 24.2.x and hope that that "fixes" it.

Re: Frequent freezes related to amdgpu and kworker

Reply #3
Thx for the feedback and additional info; other users might benefit from this as well.

artist
Linux is simple; use Artix, or Submit Your System To Evil Malicious D(a)emons

Re: Frequent freezes related to amdgpu and kworker

Reply #4
I'm probably jinxing it with this post, but I haven't had any freezes since downgrading mesa and lib32-mesa to 24.2.7-1 two days ago.

So if you're having weird freezes with amdgpu and mesa 24.3.x, try that first.