Skip to main content
Topic: losing cpus (Read 396 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

losing cpus

I run artix with openrc on two PCs.  An old Dell Dimension E521 (MD Athlon(tm) 64 X2 Dual Core Processor 5400+) and a Lenovo Ideapad 320-15ABR (AMD A12-9720P RADEON R7, 12 COMPUTE CORES 4C+8G)   On the laptop, I still have 4cores/threads.  However, on the desktop, some time ago, it started showing and using only one core/thread.  

I don't remember exactly when this started, but I found a thread on a Fedora forum from someone who had the same problem, starting with the upgrade from a 6.8 kernel to 6.9.  I have now confirmed that the problem shows up with any 6.10 kernel (linux or linux-zen) but I see both cores after downgrading to linux-6.8.9.  As time allows, I'll try to find the first kernel that shows the problem, and try to find a difference in their respective .config's, but I'm really curious why this only seems to affect one of my two PCs.

'CPU topo: Limiting to 1 possible CPUs' does show up in dmesg.

Has anyone else seen this, hopefully with some solution?  Otherwise, I'm open to any suggestions on how to troubleshoot any faster than trying each kernel in order.


Re: losing cpus

Reply #2
Well feel free to correct me if you like, perhaps you have a good reason for that suggestion,  but I thought LTS kernels don't lag that much and if no-one spots a problem until it hits LTS then it's unlikely to make it easier to find and fix. The 6.6 (current LTS) kernel was released 30th October 2023, compared to July 14th 2024 for 6.10. So LTS might be handy to run until a problem is resolved, and it might be better tested which could equally help newer computers, but I don't see how it's going to be especially suitable for old machines as opposed to new ones except by delaying things for a few months
To find the problem faster use the bisect method, go halfway between known good and known bad versions and repeat according to the result. Also not all Arch kernels ever make it into Artix, so you might be able to narrow it down a little more using the Arch package archives at the end.
It's quite likely a kernel bug from a commit not a config change.

Re: losing cpus

Reply #3
Check the bios for Core Control option, I'd also try mitigations=off kernel parameter just as a test if there are any of shenanigans applied to it.

Re: losing cpus

Reply #4
Once I confirm the earliest bad and latest working artix kernels, I suspect I'll then need to confirm with self compiled vanilla versions from kernel.org, and then do a bisect.   That will also involve me finding a minimal .config to use, since compiling a kernel with the full distro .config takes over 10 hours on this box.  I'd consider trying to do the compiles on a faster desktop I have, but then I'm spending time on setting things up instead of actually working on the problem.  All to be done in my copious free time. :-)

Re: losing cpus

Reply #5
It's not the config that takes the time but the modules, use modprobed-db from the AUR. It could be worth spending time initially to set up the faster machine to build, take the modprobed-db modules file from the target machine. It could take something like 50 kernels to bisect between release versions, if you find a faulty commit it might even be a merge from another tree containing thousands of commits, which itself needs bisecting. Also check if the problem exists in alternative kernels, for example linux-zen, linux-rt, which is an easier thing to check.

Re: losing cpus

Reply #6
I don't see much point in building the kernel one place and modules in another.  I just need to pare down .config to build only those bits (whether modules or built in) that I need.  I am not familiar with modprobed-db, but not all the trimming of .config is for modules.  For example, I have an AMD CPU, so I don't need any Intel specific stuff.  I'll post again if/when I make any progress.

 

Re: losing cpus

Reply #7
modprobe-db records what modules are loaded on your system. So you make the list on the AMD machine but you can take that list to build a complete kernel with only the required modules on a faster machine. It's about 5 times faster than building a whole kernel, and you can use the standard Artix config,  the Intel modules etc. won't be built, that's the point of it.
https://wiki.archlinux.org/title/Modprobed-db
The other thing is if you are working on a git tree and checking out different commits, as you progress you can rebuild the same directory for a faster incremental build, you don't need to do a clean build every time.
If you weren't already aware of it, the commit numbering scheme is confusing - I think it goes something like you have a 6.7.1 rc then 6.7.2 rc etc. then finally when the release is made at the end it's called 6.7.0 so I often build a version at or near both ends of the commit range I think is relevant to check I'm in the right place!