Skip to main content
Topic: Random black screen freeze switching to tty on recent kernels (Read 601 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Random black screen freeze switching to tty on recent kernels

Switching to a tty console results in an empty black screen perhaps 25% of the time, although you can get a sequence of it working or not. This freeze can usually be exited by pressing Fn + F10 (KB backlight) to cycle the keyboard backlight to off, mid and full, then Fn + Insert (sleep) and after it has gone to sleep, press the power button to wake up and it returns to the expected state showing the tty. This is true for the current Zen kernel and LTS kernel, but it doesn't seem to happen with a much older linux-5.8.14.artix1-1-x86_64.pkg.tar.zst kernel. I tried updating the BIOS to the latest version and disabling all the performance options and virtualization support in the BIOS, forcing the CPU to 100% all the time with one core and thread, with no change.
This is using a Dell E7470 Ultrabook, hw details:
Code: [Select]
System:
  Kernel: 5.8.14-artix1-1 arch: x86_64 bits: 64 compiler: gcc v: 10.2.0
    Desktop: MATE v: 1.27.0 Distro: Artix Linux base: Arch Linux
Machine:
  Type: Laptop System: Dell product: Latitude E7470 v: N/A serial: <filter>
  Mobo: Dell model: 0T6HHJ v: A00 serial: <filter> UEFI: Dell v: 1.36.3
    date: 09/18/2022
Battery:
  ID-1: BAT0 charge: 30.4 Wh (100.0%) condition: 30.4/55.0 Wh (55.2%)
    volts: 8.6 min: 7.6 model: LGC-LGC3.65 DELL 242WD6C status: full
CPU:
  Info: dual core model: Intel Core i7-6600U bits: 64 type: MT MCP
    arch: Skylake rev: 3 cache: L1: 128 KiB L2: 512 KiB L3: 4 MiB
  Speed (MHz): avg: 2645 high: 2882 min/max: 400/3400 cores: 1: 2808 2: 2882
    3: 2550 4: 2340 bogomips: 22408
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3
Graphics:
  Device-1: Intel Skylake GT2 [HD Graphics 520] vendor: Dell Latitude E7470
    driver: i915 v: kernel arch: Gen-9 bus-ID: 00:02.0
  Display: server: X.Org v: 21.1.8 with: Xwayland v: 23.1.2 driver: X:
    loaded: intel unloaded: fbdev,modesetting dri: i965 gpu: i915
    resolution: 1920x1080~60Hz
  API: OpenGL Message: Unable to show GL data. Required tool glxinfo
    missing.
Network:
  Device-1: Intel Wireless 8260 driver: iwlwifi v: kernel bus-ID: 01:00.0
  IF: wlan0 state: down mac: <filter>
Drives:
  Local Storage: total: 238.47 GiB used: 71.3 GiB (29.9%)
  ID-1: /dev/nvme0n1 vendor: Western Digital model: PC SN720
    SDAPNTW-256G-1016 size: 238.47 GiB temp: 40.9 C
Partition:
  ID-1: / size: 120 GiB used: 71.07 GiB (59.2%) fs: btrfs dev: /dev/nvme0n1p6
  ID-2: /boot size: 500 MiB used: 228.7 MiB (45.7%) fs: btrfs
    dev: /dev/nvme0n1p2
  ID-3: /boot/efi size: 299.4 MiB used: 292 KiB (0.1%) fs: vfat
    dev: /dev/nvme0n1p1
  ID-4: swap-1 size: 16 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/nvme0n1p3
Info:
  Processes: 199 Uptime: 11m Memory: available: 7.67 GiB
  used: 680.6 MiB (8.7%) Init: OpenRC runlevel: default Compilers: gcc: 13.1.1
  Packages: 707 Shell: Bash v: 5.1.16 inxi: 3.3.27

Re: Random black screen freeze switching to tty on recent kernels

Reply #1
Unfortunately, it sounds like you're probably just going to have bisect the kernel and find the commit. I actually roll my own patch because a recent-ish commit completely broke my sound on one of my machines. (don't think there's anything wrong with the commit; it's probably the sound drivers for my device that's just bugged).

Re: Random black screen freeze switching to tty on recent kernels

Reply #2
So far, trying the kernels available in the Artix archive, the problem started between these two versions:
Bad: linux-5.16.1.artix1-1-x86_64.pkg.tar.zst           16-Jan-2022 13:55   
Good: linux-5.15.12.artix1-1-x86_64.pkg.tar.zst          30-Dec-2021 12:03

Re: Random black screen freeze switching to tty on recent kernels

Reply #3
That's not too bad of a range to find the problematic commit if you're up for it.

Re: Random black screen freeze switching to tty on recent kernels

Reply #4
Yes, I shall try and do that, but that part will take a while longer. Using the Arch archive narrowed it down more, as the Artix kernels often skip some versions,  to between these two:
linux-5.16.arch1-1-x86_64.pkg.tar.zst
linux-5.15.13.arch1-1-x86_64.pkg.tar.zst




Re: Random black screen freeze switching to tty on recent kernels

Reply #5
Found it I think:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/+/8e07757bece6e81b0b0910358ebceca3032bc6c7%5E%21/#F0
It seems this commit was where the problem started:

commit 8e07757bece6e81b0b0910358ebceca3032bc6c7 (HEAD)
Author: Shyam Prasad N <[email protected]>
Date:   Mon Jul 19 10:03:38 2021 +0000

    cifs: do not negotiate session if session already exists

I can try and revert that in a recent kernel to see if it fixes it there for more confirmation I suppose. Don't know what cifs has got to do with switching to a tty or why it affects this machine and not others though, I was expecting something more hardware specific like a driver.
It took a while to figure out how get it to build at that version, I had to install the gcc11 AUR package, simply downgrading gcc and gcc-libs broke my desktop, also I needed to downgrade to pahole 1:1.23-1 to fix a further build failure, besides not using the latest linux-firmware package.

Re: Random black screen freeze switching to tty on recent kernels

Reply #6
Reverting the patch in linux-git did not fix the issue, but changed / improved it. There are a lot of changes in this section of code including the function in question, the file has even moved to fs/smb/client/connect.c from it's previous location. Unmodified linux-git would never recover by pressing sleep, it required a forced power off. With the reverted patch sleep then resume always worked but still it often went to a black screen when switching to a tty. On recovery this message can sometimes be seen on the tty and in syslog:
__common_interrupt: 0.55 No irq handler for vector

I also found on a Dell M4400 a similar problem, but here, when switching from a tty back to the desktop, the desktop crashs back to the login tty, as I have startx and autologin set up it then restarts automatically although any previously open apps were forcibly terminated in the crash. This happens only about say one in 50 times so it is not so obvious.
There was a similar bug (fixed now) in December 2021 relating to xorg and xorg-server:
https://bbs.archlinux.org/viewtopic.php?id=272327
This does not seem to happen when I use either a 5.15.12 or 5.16.1 Artix kernel on the M4400, so it was not triggered by the exact same commit. If this is something to do with  memory corruption then the exact nature of the code change can almost be coincidental as the important thing is how many bytes get trampled,or with race conditions, how long an instruction takes.

It might be a kernel bug but it also might be that changes in the kernel relating to cifs / samba require changes elsewhere that have not yet been made, so it could be a something else bug instead. If the bug was really over a year old then surely others would have already reported it.

Re: Random black screen freeze switching to tty on recent kernels

Reply #7
So I added a printk to the commit revert:
Code: [Select]
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 9280e253bf09..6f831bcb87a8 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -2202,6 +2202,8 @@ cifs_get_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
  struct sockaddr_in *addr = (struct sockaddr_in *)&server->dstaddr;
  struct sockaddr_in6 *addr6 = (struct sockaddr_in6 *)&server->dstaddr;
 
+ printk(KERN_ERR "cifs_get_smb_ses being called 123abc penguin\n");
+
  xid = get_xid();
 
  ses = cifs_find_smb_ses(server, ctx);
@@ -2210,20 +2212,21 @@ cifs_get_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
  ses->ses_status);
 
  spin_lock(&ses->chan_lock);
+
+ mutex_lock(&ses->session_mutex);
+ rc = cifs_negotiate_protocol(xid, ses, server);
+ if (rc) {
+ mutex_unlock(&ses->session_mutex);
+ /* problem -- put our ses reference */
+ cifs_put_smb_ses(ses);
+ free_xid(xid);
+ return ERR_PTR(rc);
+ }
+
  if (cifs_chan_needs_reconnect(ses, server)) {
  spin_unlock(&ses->chan_lock);
  cifs_dbg(FYI, "Session needs reconnect\n");
 
- mutex_lock(&ses->session_mutex);
- rc = cifs_negotiate_protocol(xid, ses, server);
- if (rc) {
- mutex_unlock(&ses->session_mutex);
- /* problem -- put our ses reference */
- cifs_put_smb_ses(ses);
- free_xid(xid);
- return ERR_PTR(rc);
- }
-
  rc = cifs_setup_session(xid, ses, server,
  ctx->local_nls);
  if (rc) {
But it never appears in dmesg or syslog, even after switching to a tty and freezing - or not. So it looks like this section of code is not used. I don't use any samba things and also did -Rs samba caja-share to  remove the samba package and related items, which has not changed the situation either. Kernel internals sometimes defy ordinary logic.

Update:
 I installed Gnome desktop (which uses wayland) and gdm, starting the Mate desktop with gdm still has the problem, so it isn't xinitrc, but Gnome using Wayland is not affected, while Gnome using Xorg is, so it is an Xorg bug it seems.