Skip to main content
Topic: xorg-server desperately needs an update (Read 1724 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

xorg-server desperately needs an update

Code: [Select]
GPU HANG: ecode 9:1:86dffffd, in Xorg [2210]
Kernel: 5.13.0-rc7-1-mainline-00073-g55fcd4493da5 x86_64
Driver: 20201103
Time: 1624705893 s 286164 us
Boottime: 55886 s 975106 us
Uptime: 759 s 263037 us
Capture: 4307623744 jiffies; 35447 ms ago
Active process (on ring rcs0): Xorg [2210]
Reset count: 0
Suspend count: 1
Platform: SKYLAKE
Subplatform: 0x0
PCI ID: 0x1912
PCI Revision: 0x06
PCI Subsystem: 1043:8694
IOMMU enabled?: 0
DMC loaded: yes
DMC fw version: 1.27
RPM wakelock: yes
PM suspended: no
GT awake: yes
EIR: 0x00000000
IER: 0x08080000
GTIER[0]: 0x09090909
GTIER[1]: 0x09090909
GTIER[2]: 0x00000000
GTIER[3]: 0x00000909
PGTBL_ER: 0x00000000
FORCEWAKE: 0xffff0001
DERRMR: 0x2077efef
  fence[0] = 134603b00b40001
  fence[1] = 00000000
  fence[2] = 00000000
  fence[3] = 00000000
  fence[4] = 00000000
  fence[5] = 00000000
  fence[6] = 00000000
  fence[7] = 00000000
  fence[8] = 00000000
  fence[9] = 00000000
  fence[10] = 00000000
  fence[11] = 00000000
  fence[12] = 00000000
  fence[13] = 00000000
  fence[14] = 00000000
  fence[15] = 00000000
  fence[16] = 00000000
  fence[17] = 00000000
  fence[18] = 00000000
  fence[19] = 00000000
  fence[20] = 00000000
  fence[21] = 00000000
  fence[22] = 00000000
  fence[23] = 00000000
  fence[24] = 00000000
  fence[25] = 00000000
  fence[26] = 00000000
  fence[27] = 00000000
  fence[28] = 00000000
  fence[29] = 00000000
  fence[30] = 00000000
  fence[31] = 00000000
ERROR: 0x00000000
DONE_REG: 0x07ffffff
FAULT_TLB_DATA: 0x0000001c 0x9704e00c
GTT_CACHE_EN: 0xf0007fff
rcs0 command stream:
  CCID:  0x00000000
  START: 0x00001000
  HEAD:  0x00002da0 [0x00002d48]
  TAIL:  0x00002eb8 [0x00002da8, 0x00002df8]
  CTL:   0x00003001
  MODE:  0x00000000
  HWS:   0xffffe000
  ACTHD: 0x0000fffe ec00bb90
  IPEIR: 0x00000000
  IPEHR: 0x79000002
  ESR:   0x00000000
  INSTDONE: 0xffdfffff
  SC_INSTDONE: 0xfffffffe
  SAMPLER_INSTDONE[0][0]: 0xffffffff
  SAMPLER_INSTDONE[0][1]: 0xffffffff
  SAMPLER_INSTDONE[0][2]: 0xffffffff
  ROW_INSTDONE[0][0]: 0xffffffff
  ROW_INSTDONE[0][1]: 0xffffffff
  ROW_INSTDONE[0][2]: 0xffffffff
  batch: [0x0000fffe_ec00a000, 0x0000fffe_ec014000]
  BBADDR: 0x0000fffe_ec00bb91
  BB_STATE: 0x00000020
  INSTPS: 0x00009010
  INSTPM: 0x00000000
  FADDR: 0x0000fffe ec00bd80
  RC PSMI: 0x00000010
  FAULT_REG: 0x00000000
  GFX_MODE: 0x00008000
  PDP0: 0x0000000103508000
  PDP1: 0x0000000000000000
  PDP2: 0x0000000000000000
  PDP3: 0x0000000000000000
  hung: 1
  engine reset count: 0
  ELSP[0]:  pid 2210, seqno        b:000157fe, prio 0, head 00002e00, tail 00002eb8
  ELSP[1]:  pid 0, seqno        3:0000507d, prio 0, head 00000cc0, tail 00000d48
  Active context: Xorg[2210] prio 0, guilty 1 active 0, runtime total 20516762616ns, avg 1227324ns
...
rcs0 --- user = 0x0000fffe ff001000
...
rcs0 --- user = 0x00000000 f8000000
...
rcs0 --- user = 0x00000000 e0000000
...
rcs0 --- ring = 0x00000000 00001000
...
rcs0 --- HW context = 0x00000000 fffc7000
...
available engines: 0
slice total: 0, mask=0000
subslice total: 0
EU total: 0
EU per subslice: 0
has slice power gating: no
has subslice power gating: no
has EU power gating: no
Unavailable
gen: 9
gt: 2
iommu: disabled
memory-regions: 5
page-sizes: 11000
platform: SKYLAKE
ppgtt-size: 48
ppgtt-type: 2
dma_mask_size: 39
is_mobile: no
is_lp: no
require_force_probe: no
is_dgfx: no
has_64bit_reloc: yes
gpu_reset_clobbers_display: no
has_reset_engine: yes
has_global_mocs: no
has_gt_uc: yes
has_l3_dpf: no
has_llc: yes
has_logical_ring_contexts: yes
has_logical_ring_elsq: no
has_master_unit_irq: no
has_pooled_eu: no
has_rc6: yes
has_rc6p: no
has_rps: yes
has_runtime_pm: yes
has_snoop: no
has_coherent_ggtt: yes
unfenced_needs_alignment: no
hws_needs_physical: no
cursor_needs_physical: no
has_csr: yes
has_ddi: yes
has_dp_mst: yes
has_dsb: no
has_dsc: no
has_fbc: yes
has_fpga_dbg: yes
has_gmch: no
has_hdcp: yes
has_hotplug: yes
has_hti: no
has_ipc: yes
has_modular_fia: no
has_overlay: no
has_psr: yes
has_psr_hw_tracking: yes
overlay_needs_physical: no
supports_tv: no
rawclk rate: 24000 kHz
Has logical contexts? yes
scheduler: 1f
i915.vbt_firmware=(null)
i915.modeset=-1
i915.lvds_channel_mode=0
i915.panel_use_ssc=-1
i915.vbt_sdvo_panel_type=-1
i915.enable_dc=-1
i915.enable_fbc=1
i915.enable_psr=-1
i915.psr_safest_params=no
i915.enable_psr2_sel_fetch=no
i915.disable_power_well=1
i915.enable_ips=1
i915.invert_brightness=0
i915.enable_guc=0
i915.guc_log_level=-1
i915.guc_firmware_path=(null)
i915.huc_firmware_path=(null)
i915.dmc_firmware_path=(null)
i915.mmio_debug=0
i915.edp_vswing=0
i915.reset=3
i915.inject_probe_failure=0
i915.fastboot=-1
i915.enable_dpcd_backlight=-1
i915.force_probe=
i915.fake_lmem_start=0
i915.request_timeout_ms=20000
i915.enable_hangcheck=yes
i915.load_detect_test=no
i915.force_reset_modeset_test=no
i915.error_capture=yes
i915.disable_display=no
i915.verbose_state_checks=yes
i915.nuclear_pageflip=no
i915.enable_dp_mst=yes
i915.enable_gvt=no
That sort of thing keeps happening. I thought mesa was to blame, but maybe it isn't. I tried to build xorg-server-git from the AUR, but it failed to start (there was an build error about xorgproto, even though I installed xorgproto-git). Is there a *working* PKGBUILD somewhere?

I am currently using the modesetting driver again - xf86-video-intel did not work either.

Enabling IOMMU didn't help - dmesg says scalable mode is supported, but it didn't work. At least the xorg freezes, unlike DRM freezes, are recoverable by killing the xserver.

And what does "i915.reset=3" mean? I did not set it on the commandline, and the default is 2 according to modinfo -p i915.

I tried downgrading intel-ucode (didn't seem to help either) because of the inordinate number of freezes I had lately - except with the kernel 5.4.y series, but as I've stated elsewhere, using that kernel is not a long-term solution. I've been building my own kernels using custom configs for years, including the 5.4 kernel; that can't be the problem, the 5.4 config is as close as possible to the 5.12 and mainline configs. I've been up for up to a week without freezes with 5.12, but of late it's just horrible. Totally broken.

I restored my system partitions from a known good backup and redid the recent updates, that didn't help either.

Can it really be that the kernel is to blame? I can't imagine Linus Torvalds putting up with simply *abandoning* i915 users. After all, Intel has a stake in the Linux kernel. The kernel doc says changes that don't work will be revoked, but that isn't happening.

Tried the distro kernel (not even up-to-date, and mirrors.dotsrc.org seems to be broken again, and I also get more alsa-lib errors). Currently trying to update my config using modprobed-db (aware of the possibility of kernel modules changing their names, don't need a lecture).

Bingo! distribution kernel (5.12.12.artix1-1) froze too. Expected no less, but had to try again, in case something changed. No weird programs running (vlc2, musescore, golly), just normal stuff that can be expected to work (mate-terminal, pluma, MATE desktop, all latest versions built from source, and gcc11. Many, but not all, freezes happen when gcc is running and I edit files with pluma (gedit and gvim (binary package) have been known to freeze too), or grep git logs). gtk3 is a possible freeze candidate, but not involved in all freezes, unless the entire MATE desktop is broken. But all other desktops are either toys (lxde, lxqt) severely dated (xfce) or bloated and full of spyware (gnome including flashback, kde, and presumably all the rest).

Guys, it's beginning to look like Artix doesn't work for me. I've been thinking for a while of switching to Gentoo (or FreeBSD if the Linux kernel deserts me), but I'll miss pacman.

Are you going to shut me down again because I mentioned software other than xorg-server in this post?

As I said, everything still works with the 5.4 kernel series, so I don't think I have hardware damage (or a rootkit, unless there is a rootkit that specifically targets kernels newer than 5.4).


Re: xorg-server desperately needs an update

Reply #2
If the entire system freezes and you can't do anything (change tty, etc.) it's a kernel issue sadly.

Re: xorg-server desperately needs an update

Reply #3
First error for xorg-server-git, the source package configure (not PKGBUILD, upstream not Arch/Artix) errors as it's not looking in the right pkgconfig dir, hack hack:
$ sudo cp /usr/share/pkgconfig/inputproto.pc /usr/lib/pkgconfig/
(don't forget to rm that later!)
Now I get a version error, but I guess with xorgproto-git it would be OK for you.
Dependency inputproto found: NO found 2.3.2 but need: '>= 2.3.99.1'

Re: xorg-server desperately needs an update

Reply #4
Thanks #######, that worked! New xserver up and running.
I put the corresponding cp and rm commands in my PKGBUILD.
Now I'll try the mainline and 5.12 kernels again. If they still fail, I've done everything I could to fix this issue, and I'll have 3 years max to switch to FreeBSD. :-(

Edit:
There is an alternative solution: Add the following to the xorgproto-git PKGBUILD package():
Code: [Select]
  # fix xorg server build
  mkdir -p "${pkgdir}"/usr/lib
  mv "${pkgdir}"/usr/share/pkgconfig "${pkgdir}"/usr/lib

Re: [SOLVED] xorg-server desperately needs an update

Reply #5
PLEASE DON'T SOLVE MY TOPICS WITHOUT MY CONSENT!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Sorry for yelling, but you are not helping. The topic is UNSOLVED, my box is still broken with kernels >5.4.

Thanks Artist, for the LinuxReviews link! As for the ArchWiki, been there, done that. But the "intel_idle.max_cstate=1 i915.enable_dc=0 ahci.mobile_lpm_policy=1" kernel parameters don't work either. (I added all, and I still get
Code: [Select]
i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 1356.505652] i915 0000:00:02.0: [drm] Xorg[2224] context reset due to GPU hang
[ 1356.513564] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in Xorg [2224]
This isn't the latest message, but the latest I could recover. (Of course I don't get zilch with a hard freeze). I don't think xorg is the problem; it really is a kernel bug.

The LinuxReviews site (which dates from last year) warns against the 5.4 kernel, oddly, but it is the only one that works reliably for me (without bullshit options added. I had a long-standing bullshit option "acpi_osi=Linux", but removing it wasn't the solution, so I added it back).

WTF is going on? I'm not prone to conspiracy theories, but this level of incompetence coming from kernel developers is not normal. The linux kernel is completely broken.

Is there a way of disabling DRM? Short of "nomodeset", I mean? The vesa driver is probably ancient. AFAIK glamor acceleration is mandatory with modesetting.

Is is possible to change timeout parameters in the kernel config? I haven't touched them yet, I don't know enough.
Code: [Select]
CONFIG_DRM_I915_REQUEST_TIMEOUT=20000
CONFIG_DRM_I915_FENCE_TIMEOUT=10000
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=250
CONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT=8000
CONFIG_DRM_I915_STOP_TIMEOUT=100
CONFIG_DRM_I915_TIMESLICE_DURATION=1
I can't exclude the possibility that the many hard freezes damaged my harddisk (fsck finds nothing), but I still don't see how that would affect only newer kernels.

Re: xorg-server desperately needs an update

Reply #6
What is your box? Perhaps I missed something, so far I guess you are using the i915 driver so must have Intel graphics and a Skylake series CPU, which were produced between 2015 and 2019. As it looks hw specific then you'd have a better chance if you revealed this info, unless it's classified  ;D . This isn't something I have experienced (using Nvidia) but possibly others would have some ideas.

Re: xorg-server desperately needs an update

Reply #7
My box is called Wortmann TERRA 2230WPV, with an Asus H110M-A/M.2 mobo. I flashed the EFI 2 days ago, it is up-to-date.
Processor: Intel® Core™ i3-6100 CPU @ 3.70GHz × 4
Graphics:  Mesa DRI Intel® HD Graphics 530 (SKL GT2)

Edit:
Can MATE be responsible for interface freezes?

I was beginning to think that my interface freeze problems have to do with MATE desktop, specifically openbox, which hasn't been updated in ages (my preferred wm, because it's scriptable, and doesn't lay terminal windows on top of each other so I have to pull them apart with the mouse, like marco does). But I (like others) just found out that MATE panel won't start with marco (the default MATE wm, a fork of metacity), even with the packaged mate-session-manager. I reverted to openbox and installed picom for a compositor, but it causes video tearing with the modesetting driver, like all compositors, so I reverted to the packaged xorg-server and xf86-video-intel (though it seems xf86-video-intel can be built against xorg-server-git, which brings us back to the package base being out of date. Why are there no decent video drivers at all? And why is wayland development so slow? It seems to be urgently needed).

Cannot be a mate-panel or xorg-server-git problem, because mate-panel works perfectly with xorg-server-git and openbox.

What can I do? There is no *sensible* alternative to MATE (if I install some toy, geek, or bloated desktop environment, I'll be seriously hampered in my ability to use my computer, so I probably won't even get sensible troubleshooting info). I'm not a geek, I have no use for enlightenment or other geek stuff. I actually want to do *actual work* occasionally on my box instead of being forced to hack all the time.

I will provide more info if needed. I'd like to try wayland, actually, but the mate session manager doesn't support it, and I'm afraid with mandatory compositing I'll never get rid of video tearing. Is there a script for wayland comparable to xinit?

The compositor doesn't help anyway:
Code: [Select]
[ 3156.520373] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 3156.520399] i915 0000:00:02.0: [drm] Xorg[9340] context reset due to GPU hang
[ 3156.527122] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in Xorg [9340]
The kernel parameters given in the LinuxReviews article
Code: [Select]
intel_idle.max_cstate=1 i915.enable_dc=0 ahci.mobile_lpm_policy=1
did not work (I used them all together). I am currently using "i915.enable_fbc=0" instead, which is mentioned in the ArchWiki, but not connected with interface freezes (but I'm desperate enough to clutch at any straw by now). I seems to help a little, I was able to stay up almost 12 hrs (sad to say that's an improvement), though I don't see why it should work. Do kernels after 5.7 require more video RAM, by any chance? I'm already at the maximum my EFI allows (1024M).

Please don't mark my posts as solved without my consent, I seem to have many problems that are interconnected, so I don't really know what titles to choose for my posts, so I'll just continue this thread.

I'm a hair's breadth away from bisecting the kernel (that would be gruesome and tedious, since the bug does not immediately manifest). Currently building 5.13+1956+gc54b245d0118-mainline, there have been almost 2000 commits in a day.

Re: xorg-server desperately needs an update

Reply #8
My gfx card differs: Iris Plus 655

These are my related kernel params:
i915.disable_power_well=0 i915.enable_guc=2

I do have xf86-video-intel installed; without this I had some problems.

Normally you don't want this, but did you see/try: https://wiki.archlinux.org/index.php/intel_graphics#Xorg_configuration
Some issues with X crashing, GPU hanging, or problems with X freezing, can be fixed by disabling the GPU usage with the NoAccel option - add the following lines to your configuration file:
  Option "NoAccel" "True"
Alternatively, try to disable the 3D acceleration only with the DRI option:
  Option "DRI" "False"

You can also try DRI2 and DRI3 maybe.

Hope this helps you somehow.

Re: xorg-server desperately needs an update

Reply #9
You can also try to experiment with latest mesa and Crocus Gallium3D driver for older intel GPU (Gen4-Gen7 or so).

Or intel Iris Gallium3D driver, I am truly lost in Intel naming sense.

 

Re: xorg-server desperately needs an update

Reply #10
Thanks SGOrava, for the crocus tip! The first light at the end of the tunnel (let's hope it isn't an oncoming train). The crocus driver (https://gitlab.freedesktop.org/airlied/mesa/-/tree/crocus) seems to be new, it only just appeared in meson_options.txt.

I'm ostensibly on Gen9 (Skylake), but neither iris nor i965 worked for me. But crocus fails to load:
Code: [Select]
 $ LIBGL_DEBUG=verbose glxinfo 1>/dev/null
libGL: MESA-LOADER: failed to open /usr/lib/dri/swrast_dri.so: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory
libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/dri)
libGL error: failed to load driver: swrast
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  151 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  60
  Current serial number in output stream:  61
This is with xf86-video-intel. With the modesetting driver I get a segfault.
Isn't crocus a 3D driver? Why would it rely on swrast? (swrast is deprecated). Is it just too early? Will this get fixed? Or do I need more updates (libva, xorg-server)?

As for the Arch Wiki, been there, as I said. I experimented with kernel commandlines till I was blue in the face, to no avail. "NoAccel" is not an option, glamor acceleration is *mandatory* with modesetting, more's the pity. I might try *late* kms start, taking i915 out of my initramfs, but this sounds like more trouble.

Edit:
The gallium i915 driver didn't load either (why not? It should, right?), and trying iris again resulted in an immediate freeze with the mainline kernel. (both with the modesetting driver).

So, no change so far. I still get the freezes (except with the 5.4 kernel, as I said; the 4.19 also still works, but will reach EOL before 5.4) because gallium-i915 and crocus won't load, and iris and i965 both don't work.

I reinstalled xorg-server-git and the xf86-video-intel built against it, but crocus still won't load. Maybe I should try xlib or gallium-xlib for glx (I REALLY shouldn't have to).

Edit:
mesa seems to be optional. I just rebuilt xorg-server-git without mesa, uninstalled mesa and rebooted, and I was able to log into the graphics. The Mate desktop (including gtk3), web browser (palemoon, librewolf), gimp, inkscape, sylpheed (=email), musescore (including qt5) all work, but I'll have yet to figure out how to make golly and vlc work with glamor (=2D) accel alone.
Please don't mark this as solved yet. (It isn't - if I get everything I need to work without mesa, it's still a workaround, not a fix).

Edit:
I still need libepoxy, a hard dependency of xorg-server and gtk3 (not in meson_options.txt), and probably libdrm, but they don't depend on mesa.

I had to hack /lib/pkgconfig/epoxy.pc to rebuild gtk3, or meson won't find it (but not for xorg-server):
Code: [Select]
prefix=/usr
libdir=${prefix}/lib
includedir=${prefix}/include

epoxy_has_glx=0
epoxy_has_egl=0
epoxy_has_wgl=0

Name: epoxy
Description: GL dispatch library
Version: 1.5.8
#Requires.private: gl egl
Libs: -L${libdir} -lepoxy
Libs.private: -ldl
Cflags: -I${includedir}
There may be a better solution (maybe a dummy .pc file that links against nothing). qt5-base built without problems with the "-no-opengl" option and without gst-plugins-base-libs. Need to rebuild qt5-quickcontrols2 (hopefully not more) for musescore.

Turns out wxgtk, a dependency of golly (a Game of Life emulator, and a constant crasher under mesa and linux > 5.4) has an option to be built without opengl, but the build fails. Maybe I'll have to report a bug.

If I succeed with vlc and golly, that means I'll finally have to switch to Gentoo, so I'll never ever have to worry about mesa again:
Code: [Select]
USE="-mesa"
Or something like that. I'll figure it out. Software problems I can solve.

Still, don't mark this as solved. If I have to leave Artix, that can't be a solution for you guys, can it? Why is it so hard to build a system without this unreliable and optional mesa shit?

At least, one thing is clear, the culprit is mesa. i'm currently running linux-mainline, and with mesa it would have frozen solid by now. Maybe rebuilding gtk3 and qt5-base will help, even if I have to reinstall mesa for vlc.

Edit:
Dummy epoxy.pc failed. Maybe rebuild libepoxy and use it as a makedep.

Success with vlc2 and vlc3 after rebuilding qt4, qt5-base, ffmpeg-2.8, ffmpeg-git, vlc2 and vlc3, using --avcodec-hw=none commandline option. Some video tearing with the modesetting driver, fixable with xf86-video-intel. Not a whole lot of diff with the video quality, actually.

Success with musescore (had to rebuild qt5-declarative, not qt5-quickcontrols2).

Success with avidemux.

Yet to succeed: golly, librewolf (need to build it from source, which sucks (damn profiling never worked)). Palemoon works.

I'll continue pulling mesa, and if I see anything promising in the git log, I'll try it again.
I kept all previous binaries, so restoring mesa will be fast.

Please don't mark this as solved. This is a bullshit workaround, nothing else.

Edit:
I partly reenabled opengl, to get golly working, so far without success (Andrew Trevorrow, the developer, was kind enough to answer my question about the latest golly version without opengl (the git log doesn't have tags), and I decided it wasn't worth my while to downgrade). gtk3, qt4, qt5, ffmpeg{2.8,4.4} and vlc{2,3} are still unaccelerated (qt has a -no-opengl option, gtk3 requires an ugly hack, see above. Turns out golly will work with the hacked gtk3, but still crash). Since golly is the only program that still freezes my box, the culprit must be the combination of gtk3, mesa and MATE. (I hope so, anyway, I'd like to regain my trust in the linux kernel).

The crocus driver still doesn't work (and it will now block xorg from starting, which means it loads but isn't meant for Skylake).

<flame asbestos="required">Let's face it, guys, opengl is as bad as systemd. It boxes everybody in, developers, packagers and users, and like systemd it does exactly ***nothing*** for most of us.</flame>

The 5.10-lts kernel has an unfixed (but fixable, as it is fixed in later versions) bug
Code: [Select]
smpboot: Scheduler frequency invariance went wobbly, disabling!
after resuming from suspend. Not an option either.

Edit:
The iris driver of the latest mesa-devel now works with musescore (with and without opengl). The "palettes" window in musescore does not work without opengl, but it can be replaced by the master palette. Which proves that musescore's use of opengl is just sloppy programming, it wouldn't be needed at all (see the above flame).

I am currently using the latest xorg-server snapshot, and the xf86-video-intel built against it, and I'll keep track of mesa and linux-mainline. If the freezes stop occurring, I'll report it.