Skip to main content
Topic: AMD eGPU fails to load after upgrading 6.14.1 -> 6.14.9 (Read 795 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

AMD eGPU fails to load after upgrading 6.14.1 -> 6.14.9

Hello, recently I found that after upgrading the system, Xorg fails to load my external AMD GPU. Looking through dmesg I noticed the following errors
Code: [Select]
[   10.318261] amdgpu: Virtual CRAT table created for CPU
[   10.318275] amdgpu: Topology: Add CPU node
[   10.318431] amdgpu 0000:06:00.0: enabling device (0000 -> 0002)
[   10.318592] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1682:0xC580 0xE7).
[   10.318649] [drm] register mmio base: 0x8C000000
[   10.318651] [drm] register mmio size: 262144
[   10.318726] amdgpu 0000:06:00.0: amdgpu: detected ip block number 0 <vi_common>
[   10.318729] amdgpu 0000:06:00.0: amdgpu: detected ip block number 1 <gmc_v8_0>
[   10.318731] amdgpu 0000:06:00.0: amdgpu: detected ip block number 2 <tonga_ih>
[   10.318732] amdgpu 0000:06:00.0: amdgpu: detected ip block number 3 <gfx_v8_0>
[   10.318733] amdgpu 0000:06:00.0: amdgpu: detected ip block number 4 <sdma_v3_0>
[   10.318735] amdgpu 0000:06:00.0: amdgpu: detected ip block number 5 <powerplay>
[   10.318736] amdgpu 0000:06:00.0: amdgpu: detected ip block number 6 <dm>
[   10.318737] amdgpu 0000:06:00.0: amdgpu: detected ip block number 7 <uvd_v6_0>
[   10.318738] amdgpu 0000:06:00.0: amdgpu: detected ip block number 8 <vce_v3_0>
[   10.612669] amdgpu 0000:06:00.0: amdgpu: Fetched VBIOS from ROM BAR
[   10.612687] amdgpu: ATOM BIOS: 113-58085SHD1-W90
[   10.618973] [drm] UVD is enabled in VM mode
[   10.618974] [drm] UVD ENC is enabled in VM mode
[   10.618978] [drm] VCE enabled in VM mode
[   10.618984] amdgpu 0000:06:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[   10.618989] amdgpu 0000:06:00.0: amdgpu: PCIE atomic ops is not supported
[   10.619003] [drm] GPU posting now...
[   10.738262] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[   10.743390] amdgpu 0000:06:00.0: BAR 2 [mem 0x6000000000-0x60001fffff 64bit pref]: releasing
[   10.743393] amdgpu 0000:06:00.0: BAR 0 [??? 0x00000000 flags 0x0]: releasing
[   10.743394] [drm:amdgpu_device_resize_fb_bar [amdgpu]] *ERROR* Problem resizing BAR0 (-16).
[   10.743871] amdgpu 0000:06:00.0: BAR 2 [mem 0x6000000000-0x60001fffff 64bit pref]: assigned
[   10.743937] amdgpu 0000:06:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[   10.743940] amdgpu 0000:06:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[   10.743948] resource: resource sanity check: requesting [mem 0x0000000000000000-0xffffffffffffffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000bffff window]
[   10.743966] ------------[ cut here ]------------
[   10.743967] WARNING: CPU: 3 PID: 664 at arch/x86/mm/pat/memtype.c:719 memtype_reserve_io+0xfe/0x110
[   10.743973] Modules linked in: ccm amdgpu(+) amdxcp gpu_sched drm_panel_backlight_quirks drm_exec drm_suballoc_helper drm_ttm_helper cmac algif_hash algif_skcipher af_alg btusb btrtl btintel btbcm btmtk uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda intel_rapl_msr snd_sof_intel_hda_mlink snd_sof_intel_hda intel_rapl_common snd_sof_pci intel_uncore_frequency snd_sof_xtensa_dsp intel_uncore_frequency_common snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_hda_codec_hdmi snd_soc_acpi_intel_sdca_quirks intel_tcc_cooling soundwire_generic_allocation x86_pkg_temp_thermal snd_soc_acpi intel_powerclamp iwlmvm soundwire_bus coretemp snd_soc_sdca snd_soc_avs kvm_intel snd_hda_codec_realtek snd_soc_hda_codec snd_hda_codec_generic snd_hda_ext_core kvm snd_hda_scodec_component snd_soc_core snd_compress irqbypass ac97_bus polyval_clmulni snd_pcm_dmaengine
[   10.744073]  polyval_generic snd_hda_intel ghash_clmulni_intel joydev sha512_ssse3 snd_intel_dspcfg sha256_ssse3 snd_intel_sdw_acpi mousedev mac80211 sha1_ssse3 snd_hda_codec aesni_intel 8021q snd_hda_core crypto_simd iTCO_wdt cryptd snd_hwdep garp intel_pmc_bxt hid_multitouch rapl libarc4 mrp r8169 snd_pcm iTCO_vendor_support ptp stp intel_cstate mei_hdcp mei_pxp llc pps_core snd_timer i2c_i801 intel_uncore realtek spi_nor mdio_devres psmouse intel_wmi_thunderbolt mei_me intel_lpss_pci mtd i2c_smbus pcspkr iwlwifi libphy snd thunderbolt i2c_hid_acpi i2c_mux intel_lpss soundcore i2c_hid mei idma64 intel_pmc_core pmt_telemetry pmt_class mac_hid intel_hid intel_pch_thermal intel_vsec sparse_keymap acpi_pad bnep cfg80211 bluetooth hid_generic usbhid i915 intel_gtt rtsx_pci_sdmmc i2c_algo_bit ttm mmc_core nvme drm_buddy drm_display_helper nvme_core serio_raw spi_intel_pci clevo_xsm_wmi(OE) rfkill rtsx_pci spi_intel video nvme_auth cec wmi i8042 atkbd libps2 serio vivaldi_fmap
[   10.744119] CPU: 3 UID: 0 PID: 664 Comm: (udev-worker) Tainted: G           OE      6.14.9-artix1-1 #1 1c901cbe3c4e5d31d525655957a353b77915fc71
[   10.744122] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   10.744123] Hardware name: COPELION INTERNATIONAL INC. ZX Series/ZX Series, BIOS 1.07.08TCOP3 03/27/2020
[   10.744124] RIP: 0010:memtype_reserve_io+0xfe/0x110
[   10.744127] Code: 08 fc ff ff b8 f0 ff ff ff eb 87 8b 54 24 04 4c 89 ee 48 89 df e8 02 fe ff ff 85 c0 75 db 8b 54 24 04 41 89 16 e9 68 ff ff ff <0f> 0b e9 4a ff ff ff e8 46 49 f9 00 66 0f 1f 44 00 00 90 90 90 90
[   10.744128] RSP: 0018:ffffd33dc131f708 EFLAGS: 00010286
[   10.744130] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000027
[   10.744131] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffba738790
[   10.744132] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffefff
[   10.744132] R10: ffffffffb9a5c460 R11: ffffd33dc131f580 R12: 0000000000000001
[   10.744133] R13: 0000000000000000 R14: ffffd33dc131f754 R15: ffff8c529a6c6f00
[   10.744134] FS:  00007f39e2d4f840(0000) GS:ffff8c55e4380000(0000) knlGS:0000000000000000
[   10.744136] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.744137] CR2: 00007efc63cbd0f0 CR3: 000000010caee004 CR4: 00000000003726f0
[   10.744138] Call Trace:
[   10.744139]  <TASK>
[   10.744140]  arch_io_reserve_memtype_wc+0x32/0x50
[   10.744143]  amdgpu_bo_init+0x3e/0x90 [amdgpu 8edcc4ad9e4bea8e76f927d618e34bca526e11a7]
[   10.744387]  ? amdgpu_gmc_get_vbios_allocations+0xb4/0x130 [amdgpu 8edcc4ad9e4bea8e76f927d618e34bca526e11a7]
[   10.744641]  gmc_v8_0_sw_init+0x2da/0x6b0 [amdgpu 8edcc4ad9e4bea8e76f927d618e34bca526e11a7]
[   10.744951]  amdgpu_device_init.cold+0x13ae/0x22b5 [amdgpu 8edcc4ad9e4bea8e76f927d618e34bca526e11a7]
[   10.745356]  amdgpu_driver_load_kms+0x15/0x70 [amdgpu 8edcc4ad9e4bea8e76f927d618e34bca526e11a7]
[   10.745591]  amdgpu_pci_probe+0x1ce/0x510 [amdgpu 8edcc4ad9e4bea8e76f927d618e34bca526e11a7]
[   10.745844]  ? __pm_runtime_resume+0x5f/0x90
[   10.745847]  local_pci_probe+0x3f/0x90
[   10.745851]  pci_device_probe+0xdb/0x290
[   10.745853]  ? sysfs_do_create_link_sd+0x6d/0xd0
[   10.745869]  really_probe+0xdb/0x340
[   10.745872]  ? pm_runtime_barrier+0x55/0x90
[   10.745873]  __driver_probe_device+0x78/0x140
[   10.745875]  driver_probe_device+0x1f/0xa0
[   10.745877]  ? __pfx___driver_attach+0x10/0x10
[   10.745897]  __driver_attach+0xcb/0x1e0
[   10.745899]  bus_for_each_dev+0x8a/0xe0
[   10.745901]  bus_add_driver+0x10b/0x1f0
[   10.745903]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu 8edcc4ad9e4bea8e76f927d618e34bca526e11a7]
[   10.746203]  driver_register+0x75/0xe0
[   10.746206]  ? amdgpu_init+0x4b/0xff0 [amdgpu 8edcc4ad9e4bea8e76f927d618e34bca526e11a7]
[   10.746481]  do_one_initcall+0x59/0x310
[   10.746484]  do_init_module+0x62/0x240
[   10.746487]  init_module_from_file+0x8b/0xe0
[   10.746489]  idempotent_init_module+0x115/0x310
[   10.746491]  __x64_sys_finit_module+0x67/0xc0
[   10.746493]  do_syscall_64+0x7b/0x190
[   10.746497]  ? vfs_read+0x162/0x390
[   10.746498]  ? vfs_read+0x162/0x390
[   10.746499]  ? __rseq_handle_notify_resume+0x9c/0x4b0
[   10.746502]  ? arch_exit_to_user_mode_prepare.isra.0+0x7c/0x90
[   10.746504]  ? syscall_exit_to_user_mode+0x37/0x1c0
[   10.746507]  ? do_syscall_64+0x87/0x190
[   10.746509]  ? complete+0x1c/0x90
[   10.746511]  ? __rseq_handle_notify_resume+0x9c/0x4b0
[   10.746513]  ? switch_fpu_return+0x4e/0xd0
[   10.746515]  ? arch_exit_to_user_mode_prepare.isra.0+0x7c/0x90
[   10.746517]  ? syscall_exit_to_user_mode+0x37/0x1c0
[   10.746519]  ? clear_bhb_loop+0x40/0x90
[   10.746522]  ? clear_bhb_loop+0x40/0x90
[   10.746524]  ? clear_bhb_loop+0x40/0x90
[   10.746526]  ? clear_bhb_loop+0x40/0x90
[   10.746528]  ? clear_bhb_loop+0x40/0x90
[   10.746530]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   10.746532] RIP: 0033:0x7f39e2e6420d
[   10.746533] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 3a 0d 00 f7 d8 64 89 01 48
[   10.746534] RSP: 002b:00007ffc904ffb78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   10.746536] RAX: ffffffffffffffda RBX: 0000555fda844bc0 RCX: 00007f39e2e6420d
[   10.746537] RDX: 0000000000000004 RSI: 00007f39e2fe92f2 RDI: 0000000000000035
[   10.746538] RBP: 00007ffc904ffc10 R08: 0000000000000000 R09: 00007ffc904ffca8
[   10.746539] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000020000
[   10.746540] R13: 0000555fda848de0 R14: 0000555fda844bc0 R15: 0000000000000000
[   10.746542]  </TASK>
[   10.746542] ---[ end trace 0000000000000000 ]---
[   10.746544] [drm:amdgpu_bo_init [amdgpu]] *ERROR* Unable to set WC memtype for the aperture base
[   10.746806] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <gmc_v8_0> failed -22
[   10.747247] amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
[   10.747248] amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
[   10.747250] amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
[   10.747375] amdgpu 0000:06:00.0: probe with driver amdgpu failed with error -22
While `lspci` still shows the VGA card, Xorg does not seem to see it. Downgrading linux and linux-header to 6.14.1 allows it to load normally without errors
Code: [Select]
[   10.101306] amdgpu: Virtual CRAT table created for CPU
[   10.101316] amdgpu: Topology: Add CPU node
[   10.101429] amdgpu 0000:06:00.0: enabling device (0000 -> 0002)
[   10.101549] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1682:0xC580 0xE7).
[   10.101587] [drm] register mmio base: 0x8C200000
[   10.101588] [drm] register mmio size: 262144
[   10.101684] amdgpu 0000:06:00.0: amdgpu: detected ip block number 0 <vi_common>
[   10.101686] amdgpu 0000:06:00.0: amdgpu: detected ip block number 1 <gmc_v8_0>
[   10.101687] amdgpu 0000:06:00.0: amdgpu: detected ip block number 2 <tonga_ih>
[   10.101689] amdgpu 0000:06:00.0: amdgpu: detected ip block number 3 <gfx_v8_0>
[   10.101690] amdgpu 0000:06:00.0: amdgpu: detected ip block number 4 <sdma_v3_0>
[   10.101691] amdgpu 0000:06:00.0: amdgpu: detected ip block number 5 <powerplay>
[   10.101692] amdgpu 0000:06:00.0: amdgpu: detected ip block number 6 <dm>
[   10.101693] amdgpu 0000:06:00.0: amdgpu: detected ip block number 7 <uvd_v6_0>
[   10.101694] amdgpu 0000:06:00.0: amdgpu: detected ip block number 8 <vce_v3_0>
[   10.112307] wlan0: Limiting TX power to 30 (30 - 0) dBm as advertised by 72:13:01:80:79:81
[   10.401676] amdgpu 0000:06:00.0: amdgpu: Fetched VBIOS from ROM BAR
[   10.401680] amdgpu: ATOM BIOS: 113-58085SHD1-W90
[   10.408284] [drm] UVD is enabled in VM mode
[   10.408286] [drm] UVD ENC is enabled in VM mode
[   10.408290] [drm] VCE enabled in VM mode
[   10.408296] amdgpu 0000:06:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[   10.408301] amdgpu 0000:06:00.0: amdgpu: PCIE atomic ops is not supported
[   10.408314] [drm] GPU posting now...
[   10.526622] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[   10.531819] amdgpu 0000:06:00.0: BAR 2 [mem 0x8c000000-0x8c1fffff 64bit pref]: releasing
[   10.531822] amdgpu 0000:06:00.0: BAR 0 [mem 0x6000000000-0x600fffffff 64bit pref]: releasing
[   10.531879] pcieport 0000:05:01.0: bridge window [mem 0x6000000000-0x60100fffff 64bit pref]: releasing
[   10.531881] pcieport 0000:05:01.0: bridge window [io  0x1000-0x0fff] to [bus 06-38] add_size 1000
[   10.531884] pcieport 0000:04:00.0: Assigned bridge window [mem 0x6000000000-0x601fffffff 64bit pref] to [bus 05-39] cannot fit 0x300000000 required for 0000:05:01.0 bridging to [bus 06-38]
[   10.531886] pcieport 0000:05:01.0: bridge window [mem 0x00000000 64bit pref] to [bus 06-38] requires relaxed alignment rules
[   10.531888] pcieport 0000:04:00.0: bridge window [io  0x1000-0x0fff] to [bus 05-39] add_size 1000
[   10.531891] pcieport 0000:04:00.0: bridge window [io  size 0x1000]: can't assign; no space
[   10.531892] pcieport 0000:04:00.0: bridge window [io  size 0x1000]: failed to assign
[   10.531893] pcieport 0000:04:00.0: bridge window [io  size 0x1000]: can't assign; no space
[   10.531894] pcieport 0000:04:00.0: bridge window [io  size 0x1000]: failed to assign
[   10.531896] pcieport 0000:05:01.0: bridge window [mem size 0x200200000 64bit pref]: can't assign; no space
[   10.531897] pcieport 0000:05:01.0: bridge window [mem size 0x200200000 64bit pref]: failed to assign
[   10.531898] pcieport 0000:05:01.0: bridge window [io  size 0x1000]: can't assign; no space
[   10.531899] pcieport 0000:05:01.0: bridge window [io  size 0x1000]: failed to assign
[   10.531900] pcieport 0000:05:01.0: bridge window [mem size 0x200200000 64bit pref]: can't assign; no space
[   10.531901] pcieport 0000:05:01.0: bridge window [mem size 0x200200000 64bit pref]: failed to assign
[   10.531902] pcieport 0000:05:01.0: bridge window [io  size 0x1000]: can't assign; no space
[   10.531902] pcieport 0000:05:01.0: bridge window [io  size 0x1000]: failed to assign
[   10.531904] amdgpu 0000:06:00.0: BAR 0 [mem size 0x200000000 64bit pref]: can't assign; no space
[   10.531905] amdgpu 0000:06:00.0: BAR 0 [mem size 0x200000000 64bit pref]: failed to assign
[   10.531906] amdgpu 0000:06:00.0: BAR 2 [mem 0x8c000000-0x8c1fffff 64bit pref]: assigned
[   10.531928] pcieport 0000:04:00.0: PCI bridge to [bus 05-39]
[   10.531937] pcieport 0000:04:00.0:   bridge window [mem 0x8c000000-0xa1efffff]
[   10.531942] pcieport 0000:04:00.0:   bridge window [mem 0x6000000000-0x601fffffff 64bit pref]
[   10.531953] pcieport 0000:05:01.0: PCI bridge to [bus 06-38]
[   10.531960] pcieport 0000:05:01.0:   bridge window [mem 0x8c000000-0x96efffff]
[   10.531966] pcieport 0000:05:01.0:   bridge window [mem 0x6000000000-0x60100fffff 64bit pref]
[   10.531995] [drm] Not enough PCI address space for a large BAR.
[   10.531996] amdgpu 0000:06:00.0: BAR 0 [mem 0x6000000000-0x600fffffff 64bit pref]: assigned
[   10.532023] amdgpu 0000:06:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[   10.532025] amdgpu 0000:06:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[   10.532037] [drm] Detected VRAM RAM=8192M, BAR=256M
[   10.532038] [drm] RAM width 256bits GDDR5
[   10.532119] [drm] amdgpu: 8192M of VRAM memory ready
[   10.532120] [drm] amdgpu: 7902M of GTT memory ready.
[   10.532142] [drm] GART: num cpu pages 65536, num gpu pages 65536
[   10.533529] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[   10.536324] [drm] Chained IB support enabled!
[   10.540007] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[   10.542664] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[   10.551382] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[   10.874416] [drm] Display Core v3.2.316 initialized on DCE 11.2
[   10.876114] snd_hda_intel 0000:06:00.1: bound 0000:06:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[   10.954208] [drm] UVD and UVD ENC initialized successfully.
[   11.065166] [drm] VCE initialized successfully.
[   11.065781] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
[   11.065815] amdgpu 0000:06:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[   11.070808] amdgpu 0000:06:00.0: amdgpu: Using BOCO for runtime pm
[   11.071641] amdgpu 0000:06:00.0: [drm] Registered 6 planes with drm panic
[   11.071642] [drm] Initialized amdgpu 3.61.0 for 0000:06:00.0 on minor 0
[   11.102176] amdgpu 0000:06:00.0: [drm] fb1: amdgpudrmfb frame buffer device
Does anyone know what recent changes could have caused this?