Skip to main content
Topic solved
This topic has been marked as solved and requires no further attention.
Topic: [SOLVED] Freeze - unfreeze - freeze, reboot to fix (Read 2985 times) previous topic - next topic
0 Members and 5 Guests are viewing this topic.

Re: [SOLVED] Freeze - unfreeze - freeze, reboot to fix

Reply #15
The 5.8.14 worked with no problems for an hour or two, the 5.9.14 worked with no problems for a few hours. While I was using the LTS some upgrades came in, including gtk2 and mesa. Could be fixed somewhere like that perhaps. - No, it was just waiting until I thought it had gone then did it again.

Re: [SOLVED] Freeze - unfreeze - freeze, reboot to fix

Reply #16
Still not had problems with 5.8.14-artix1-1, but the earliest to date was 5.9.2.artix1-1. Just downloaded linux-5.9.1.arch1-1-x86_64.pkg.tar.zst and linux-5.9.arch1-1-x86_64.pkg.tar.zst from Arch archive (not in my cache or Artix archive, 5.9.a skipped too)

Re: [SOLVED] Freeze - unfreeze - freeze, reboot to fix

Reply #17
newest lts 5.4.85 works? if not, can be a regression

Re: [SOLVED] Freeze - unfreeze - freeze, reboot to fix

Reply #18
It's not that easy to tell what works or what doesn't, you can go all day with nothing then boot up and it freezes after 5 minutes, or perhaps later in the boot. linux-lts (5.4.84-1) ran for 4 days without trouble. linux (5.8.14.artix1-1) ran for 2 days without trouble.
All these froze sooner or later, within about a day of use, some on the first boot, others after around 24 hours after several boots:
5.9.2.artix1-1 / 5.9.10.artix1-1 / 5.9.12.artix1-1 / 5.9.14.artix1-1 / 5.10.1.arch1-1 / 5.10.1.artix1-1
So now I'm on 5.9.1-arch1-1 because I realised it's quicker working backwards, the faulty kernels take less time than running a good one for days. My current theory is something was introduced in the kernel in 5.8.14 as in the Arch forum discussion, but perhaps it wasn't added for my hardware until a bit later (as there are different modules and they don't always get done all at once) and if I can find when it happened, it might give some clue what it was. 5.8.14 was also the last release before the switch to 5.9.
So I think the LTS should work fine, and there's no problem with the Artix builds vs the Arch ones, but it would take the rest of the week to try it. Also the LTS I did try was a recent build, so presumably this isn't a compiler bug.

5.9.1 and 5.9.0 both had the problem. Tried 5.8.14 more and it's fine.
So this appeared with the 5.9 kernel, and it's still there in linux-git : 5.11.0-rc1-1-git-00073-g3516bd729358
Now - try to bisect. Soooo slooooowww.... 7 hours to clone and build linux-git, 22GB build folder, trying modprobed-db next, perhaps that will help a little.
Only an hour or two now, j1 so it runs in bg. v5.9-rc1 good, v5.9-rc6 bad. 1638 commits between, only 4 in drivers/gpu/drm/nouveau/ and 3 for nv50 family (my card):
ca386aa7155a drm/nouveau/kms/nv50-gp1xx: add WAR for EVO push buffer HW bug
a9cfcfcad50c drm/nouveau/kms/nv50-gp1xx: disable notifies again after core update
35dde8d40636 drm/nouveau/kms/nv50-: add some whitespace before debug message
v5.9-rc4 - bad. Next: fc8c70526bd30733ea8667adb8b8ffebea30a8ed just before the 4 nouveau commits, they are between rc3 and rc4. If that's good it will narrow it down a lot.

Re: [SOLVED] Freeze - unfreeze - freeze, reboot to fix

Reply #19
Closer  now...
ca386aa7155a drm/nouveau/kms/nv50-gp1xx: add WAR for EVO push buffer HW bug             < bug was here
a9cfcfcad50c drm/nouveau/kms/nv50-gp1xx: disable notifies again after core update  < testing here now!
35dde8d40636 drm/nouveau/kms/nv50-: add some whitespace before debug message < this only added a space in a comment
a255e9c8694d drm/nouveau/kms/gv100-: Include correct push header in crcc37d.c  < this isn't my "nv50"  gpu
fc8c70526bd3 drm/radeon: Prefer lower feedback dividers                                 < bug wasn't here

I think I see a possibility for the cause too:
Code: [Select]
bit from:
drivers/gpu/drm/nouveau/dispnv50/core507d.c

if (ntfy) {
PUSH_MTHD(push, NV507D, SET_NOTIFIER_CONTROL,
  NVDEF(NV507D, SET_NOTIFIER_CONTROL, MODE, WRITE) |
  NVVAL(NV507D, SET_NOTIFIER_CONTROL, OFFSET, NV50_DISP_CORE_NTFY >> 2) |
  NVDEF(NV507D, SET_NOTIFIER_CONTROL, NOTIFY, ENABLE));
}

PUSH_MTHD(push, NV507D, UPDATE, interlock[NV50_DISP_INTERLOCK_BASE] |
interlock[NV50_DISP_INTERLOCK_OVLY] |
  NVDEF(NV507D, UPDATE, NOT_DRIVER_FRIENDLY, FALSE) |
  NVDEF(NV507D, UPDATE, NOT_DRIVER_UNFRIENDLY, FALSE) |
  NVDEF(NV507D, UPDATE, INHIBIT_INTERRUPTS, FALSE),

SET_NOTIFIER_CONTROL,                                                   <<<<<< why is that floating about there, could it be a typo??
  NVDEF(NV507D, SET_NOTIFIER_CONTROL, NOTIFY, DISABLE));
Those seem to be lists of numbers (instructions) being "pushed" to the hardware.
That was added in "disable notifies again after core update" so, guessing pending more testing ;D
Some of the things in this area were borrowed from NVIDIA reading the commit messages, so perhaps it was in the NVIDIA driver first then was copied to Nouveau a bit later.
Still testing but no bug so far - looking more like its the "add WAR for EVO push buffer HW bug"
An interesting thing - you know why this has gone unfixed for 3 months? Ben Skeggs the nouveau kernel maintainer has vanished. Last commit on GitHub was Nov 14th. Nothing recent in the LKML either.


 

Re: [SOLVED] Freeze - unfreeze - freeze, reboot to fix

Reply #21
The patch that blew up my mobo is now in the mainline kernel development tree. They made some additional changes in nouveau so hopefully it won't have that effect any more and might fix the freezes instead, and should appear in the 5.11 kernel. I think I'll stick with the lts kernel till that's been out a while just in case...