Frequent freezes related to amdgpu and kworker
I have upgraded on the 20th and ever since I have frequent freezes of plasmashell, input, etc.
The system is not totally gone...if I get the IP address, I can log into it via SSH after the fact.
But doing the wrong thing in there shuts me out as well.
Unfortunately it's the kind of issue where the system is so far gone that the log files aren't written properly, but based on what I've seen in dmesg via SSH, I have narrowed it down to this issue or something similar to it.
If I do get to see dmesg output before the reset, it does talk about amdgpu freezes, lockups, there are stack traces and everything.
Unfortunately, it's so f*cked up not even REISUB will reboot the system properly. It does bring the system down to text level, but instead of rebooting, it just shows dmesg logging 120 seconds of lockup/wait and everything being stuck.
I did try the solution from other threads, turning PSR off for the module, and that increased the amount of time I get with the system...but it will still crash eventually.
There does not seem to be any particular cause.
The curious thing is: The amdgpu module wasn't actually part of the update. According to pacman.log, I the graphics-adjacent things I got were multiple mesa 1:24.3.1-3 related ones, multiple vulkan 1:1.4.303-1 related ones, including, but not limited to, vulkan-radeon-related updates, and a minor kernel update from 6.12.1 to 6.12.4.
The lack of logging makes this really stupid to debug.
The only thing I have is from bootup, informing me that the VESA driver failed to load
(EE) Failed to load /usr/lib/xorg/modules/drivers/vesa_drv.so: /usr/lib/xorg/modules/drivers/vesa_drv.so: undefined symbol: VBESetModeParameters
...but AMDGPU runs with radeonsi drivers following that, so it doesn't seem to be related/fatal.
Did anybody experience anything similar recently?
Any ideas how I can force the logs to be written out even in case of a CPU lockup?