for whatever reason my 50 gig root partition ran out of space. i cleaned pacman cache and still had very little space left. For the longest time i had like 25-30 gigs used but then it jumped up to the point that i couldn't upgrade my system. so the very bad thing i did was go hack and slash to my var/log directory which seemed to contain some 37 gigs of log files. now i cant upgrade my system at all because i some logs were necessary to tell the system where those files are on the system?
sudo pacman -S linux-lqx linux-lqx-headers
Packages (75) acl-2.3.1-3 attr-2.5.1-3 audit-3.1.2-1 bash-5.1.016-4 binutils-2.41-3 brotli-1.0.9-12 bzip2-1.0.8-5
ca-certificates-20220905-1 ca-certificates-mozilla-3.92-1 ca-certificates-utils-20220905-1 coreutils-9.3-1
curl-8.2.1-1 diffutils-3.10-1 e2fsprogs-1.47.0-1 expat-2.5.0-1 file-5.45-1 filesystem-2023.01.31-1
findutils-4.9.0-3 gawk-5.2.2-1 gcc-libs-13.2.1-3 gdbm-1.23-2 glibc-2.38-3 gmp-6.3.0-1 grep-3.11-1
hwdata-0.373-1 iana-etc-20230803-1 jansson-2.14-2 kbd-2.6.2-1 keyutils-1.6.3-2 kmod-30-3 krb5-1.20.1-1
libarchive-3.7.1-1 libcap-2.69-1 libcap-ng-0.8.3-2 libelf-0.189-3 libevent-2.1.12-4 libffi-3.4.4-1
libidn2-2.3.4-3 libldap-2.6.6-1 libnghttp2-1.55.1-1 libp11-kit-0.25.0-1 libpsl-0.21.2-1 libsasl-2.1.28-4
libseccomp-2.5.4-2 libssh2-1.11.0-1 libtasn1-4.19.0-1 libtirpc-1.3.3-2 libudev-254.1-1 libunistring-1.1-2
libutempter-1.2.1-3 libverto-0.3.2-4 libxcrypt-4.4.36-1 linux-api-headers-6.4-1 lz4-1:1.9.4-1 mkinitcpio-36-1
mkinitcpio-busybox-1.36.1-1 mpfr-4.2.1-1 ncurses-6.4_20230520-1 openssl-3.1.2-1 p11-kit-0.25.0-1 pahole-1:1.25-4
pam-1.5.3-3 pambase-20221020-1 pcre2-10.42-2 readline-8.2.001-2 shadow-4.13-2 tzdata-2023c-2 udev-254.1-1
util-linux-2.39.2-1 util-linux-libs-2.39.2-1 xz-5.4.4-1 zlib-1:1.3-1 zstd-1.5.5-1 linux-lqx-6.4.12.lqx1-2
linux-lqx-headers-6.4.12.lqx1-2
and so when i try to install all that mess i get a crap ton of errors telling me that the file exists on the system. so many in gact that it would take a few hours maybe to scroll down the page to copy all the files. here's a small example of just the linux api header files
https://pastebin.com/xh6iWyCX
is there a way to restore this damage i have caused that doesnt involve reinstalling my systemm?
sudo pacman -S linux-lqx linux-lqx-headers --overwrite '*'
i ran that and it installed without errors
does that mean i diverted disaster?
in any case the last nvidia upgrade seemed to make liquorix not run... thats fine. i can use artix kernel and be pleased with it. no interest in making it work for sure. i guess we'll see if i ever get another upgrade from pacman or if it always says up to date... that's the test i guess if i averted disaster?
I don't think how anything bad happened from just wiping /var/log or cache, for pacman it's secret sauce is in /var/lib :-)
It's still strange how your logs are spammed that badly.
testdisk / photorec is my favorite for recovery of deleted files. But for best results you should avoid doing anything with the partition in question (or dd an image to work on later) so the freed space with the lost files isn't reused. You usually need some kind of logrotate thing setup as a cron job to keep logs from growing too much, there are various possiblities, occasionally it stops working then logs grow, but it might also be something writing to the logs excessively.
if only there was a log file for that, that i hadn't deleted!
This is not to do with deleting /var/log/* but most likely to do with pacman database corruption
Is that pastebin all of the errors or have you snipped it ?
If all the first thing I would try is
sudo pacman -Rdd linux-api-headers
sudo pacman -S linux-api-headers
If that fails it's possible to use pacman's --overwrite optioin
You'd need to read the manpage (Under "upgrade options") but I THINK
sudo pacman -S --overwrite=* linux-api-headers
might do it ?
I say think because I never do it that way for reasons that escape me but but I seem to remember it causing me problems in the past.
What I do do is
- Copy the output of conflicting files into the Kate text editor
- Replace all occurrences of "linux-api-headers: " with nothing
- Replace all occurrences of "exists in filesystem" (leave the space before exists) with nothing
- Switching Kate's replace tool to "Escape sequences" mode replace all occurrences of "\n" (newline) with nothing
- Copy the one line of filenames that remains
- Paste after sudo rm -v
- sudo pacman -S linux-api-headers
After getting to the point you can do an update again I'd be tempted to reinstall everything
https://wiki.archlinux.org/title/Pacman/Tips_and_tricks#Reinstalling_all_packages
thats just api errors. the errors lst is too big to be copied... anywat pacman seems to run i just have no updates available?
:: Starting full system upgrade...
there is nothing to do
OK you've lost me now. They are not "api" errors they are file conflicts which suggests pacman has lost the details of which files belong to that and maybe other packages.
No list is too big to be copied ?
Does pacman -Q still produce a list of your installed packages ? If it does you can probably fix your system as I've outlined.
I don't know the damage though. Running out of drive space on the root drive can cause all sorts of problems depending on what's being processed when it happens.
I would highly suggest to move your pacman cache directory onto another partition that has more space. The process is simple, just move the pacman cache directory and then edit /etc/pacman.conf to point to the new cache directory.
[quote author=gripped date=1693494829 link=msg=36631]
OK you've lost me now. They are not "api" errors they are file conflicts which suggests pacman has lost the details of which files belong to that and maybe other packages.
No list is too big to be copied ?
Does pacman -Q still produce a list of your installed packages ? If it does you can probably fix your system as I've outlined.
I don't know the damage though. Running out of drive space on the root drive can cause all sorts of problems depending on what's being processed when it happens.
[/quote]
there were a so many errors i just pulled those out as 1 example. there were probably 100 pages maybe more of errorsyah pacman -Q outputs this:https://pastebin.com/8SiHJR0Psudo pacman -S linux-lqx linux-lqx-headers --overwrite '*'
is what i ran to make install work again without errors. i can install files now i am just not sure that i am getting updates properly or not.
sudo pacman -Syu
reports everything is up to date
not sure if that's true though
when i deleted pacman cache it only freed up about 3 gb. When i deleted the log files it freed up about 35 gb.. so maybe i should move /var to another drive?
first I recommend check what is using that much space with tool like "ncdu"
my "/var/log" only contain 51.1MiB" 35 Gb ain't normal
at least next time you face the problem
Before you deleted the log files did you pay any intention to which file(s) were taking up the most space ?
Something is not right unless we are talking about many many years of logs.
Is logrotate setup and working ?
You could have an error or warning writing constantly to a logfile or maybe the log level is set to high eg debug for something ?
Even so logrotate should compress most repetition away.
For comparison my /var/log directory is 106MB's
Leave it a few days and see what has grown the most if you don't already know the culprit log. Then look inside it and try to figure out what the messages are and where they are coming from.
Edit: Ninja'd
so my root has gone from 12 to 20 gig in a few days. messages.log kernel.log and everything.log are each 2.4 gb in size. when i try to open up any of them they crash geany. here's a pic i managed to capture of what that looks like.
ok so in nano it looks like this repeated ad infinitum
Aug 30 11:00:11 mate-elitedesk syslog-ng[216]: syslog-ng starting up; version='4.2.0'
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: device [8086:a298] error status/mask=00000001/00000000
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: [ 0] RxErr (First)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: device [8086:a298] error status/mask=00000001/00000000
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: [ 0] RxErr (First)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: device [8086:a298] error status/mask=00000001/00000000
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: [ 0] RxErr (First)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: device [8086:a298] error status/mask=00000001/00000000
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: [ 0] RxErr (First)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: device [8086:a298] error status/mask=00000001/00000000
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: [ 0] RxErr (First)
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Aug 30 11:00:11 mate-elitedesk kernel: pcieport 0000:00:1d.0: AER: can't find device of ID00e8
artix is on a 500gb wd black pcie4 nvme drive connected to a nvme slot on the mb which is pcie3.
windows exists on a faxang 500 gb pcie 3 nvme drive connected to an add in card on a x4 lane.
in case any of that matters.
i also dont have any drives in fstab that aren't mounted. fstab looks just as it should.
so i need to know which item is ID00e8
Check your bios and cpu microcode is up to date.
Looks like this is the device throwing errors
https://devicehunt.com/view/type/pci/vendor/8086/device/A298
Do some searching for the same device and errors. I still think if logrotate was working you wouldn't end up with 35GB of logs.
https://ubuntuforums.org/showthread.php?t=2361702
well it's not a drive its a nug though related to my wifi/bluetooth nvme card?
this is what each of those logs looks like now
https://pastebin.com/ecatBEEN
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173
WORKAROUND: add pci=noaer to your kernel command line:
1) edit /etc/default/grub and and add pci=noaer to the line starting with GRUB_CMDLINE_LINUX_DEFAULT. It will look like this:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=noaer"
2) run "sudo update-grub"
3) reboot
i'll see in a few days if that fixes it or not i guess
never heard of logrotate b4 this mess. i just used the mate iso from the d/l section. guess i'll be looking into that next? maybe i should let it be because if logrotate had been working i'd have never known about this bug or its workaround?
root is now 12.8 gb lets see if it can stay there for a while?
and just some fun reading on aer as it relates to pci and such.
https://www.kernel.org/doc/html/latest/PCI/pcieaer-howto.html
8.3.2. Frequent Asked Questions
Q:
What happens if a PCIe device driver does not provide an error recovery handler (pci_driver->err_handler is equal to NULL)?
A:
The devices attached with the driver won't be recovered. If the error is fatal, kernel will print out warning messages. Please refer to section 3 for more information.
I'm fairly sure that logrotate should be working as standard on any artix iso but as I haven't installed my system from an iso I don't know for sure.
The artix logrotate installs a cron job into /etc/cron.daily. This runs logrotate based off the configuration in /etc/logrotate.conf which also loads further configurations in /etc/logrotate.d (where other packages put their logrotate configuration rules).
In a nutshell what tends to happen is each most recent log is plain text. Weekly that log gets compressed and named, for example, auth.log.1.gz and a new empty auth.log is created. A week later the process happens again and auth.log.1.gz becomes auth.log.2.gz. Once it gets to auth.log.4.gz it is simply deleted at the next rotation. Hence why you should not be able to gather 35GB of logs if logrotate is installed and working. If you don't have a working cron that would also prevent logrotate from working.
You are right that you yourself would not have known about the device issue without your bloated /var/log directory (It does not hurt to check dmesg and logs once in a while looking for errors). But seriously in future if you have huge log files the first thought should be "What is creating all these messages?". Not just deleting them. You live and learn. Your root drive filling up could have ended up worse. Glad you got away with it.
I have logrotate running as a daily cron job, plus rsyslog has rate limiting enabled by default to prevent log flooding, which as well as being annoying is a security hole as it can be used in malware attacks. syslog-ng apparently has a throttle option but I've no idea if it's enabled by default, not in this case by the sound of it.
Incidentally, I get the same sort of errors on my laptop which has an nvme drive, but looking at the time stamps you can see rsyslog stops the log growing too much:
Sep 1 20:27:02 xyz kernel: [ 2427.941856] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:02:00.0
Sep 1 20:27:02 xyz kernel: [ 2427.941874] nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 1 20:27:02 xyz kernel: [ 2427.941881] nvme 0000:02:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Sep 1 20:27:02 xyz kernel: [ 2427.941889] nvme 0000:02:00.0: [ 0] RxErr (First)
Sep 1 20:27:11 xyz kernel: [ 2437.156292] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:02:00.0
Sep 1 20:27:11 xyz kernel: [ 2437.156303] nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 1 20:27:11 xyz kernel: [ 2437.156307] nvme 0000:02:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Sep 1 20:27:11 xyz kernel: [ 2437.156311] nvme 0000:02:00.0: [ 0] RxErr (First)
Sep 1 20:27:52 xyz kernel: [ 2478.116243] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:02:00.0
Sep 1 20:27:52 xyz kernel: [ 2478.116264] nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 1 20:27:52 xyz kernel: [ 2478.116271] nvme 0000:02:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Sep 1 20:27:52 xyz kernel: [ 2478.116280] nvme 0000:02:00.0: [ 0] RxErr (First)
Sep 1 20:28:19 xyz kernel: [ 2504.745594] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:02:00.0
Sep 1 20:28:19 xyz kernel: [ 2504.745615] nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 1 20:28:19 xyz kernel: [ 2504.745623] nvme 0000:02:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Sep 1 20:28:19 xyz kernel: [ 2504.745631] nvme 0000:02:00.0: [ 0] RxErr (First)
Sep 1 20:28:59 xyz kernel: [ 2545.191077] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:02:00.0
Sep 1 20:28:59 xyz kernel: [ 2545.191100] nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 1 20:28:59 xyz kernel: [ 2545.191108] nvme 0000:02:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Sep 1 20:28:59 xyz kernel: [ 2545.191116] nvme 0000:02:00.0: [ 0] RxErr (First)
Sep 1 20:29:29 xyz kernel: [ 2575.276836] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:02:00.0
Sep 1 20:29:29 xyz kernel: [ 2575.276864] nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 1 20:29:29 xyz kernel: [ 2575.276875] nvme 0000:02:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Sep 1 20:29:29 xyz kernel: [ 2575.276889] nvme 0000:02:00.0: [ 0] RxErr (First)
Sep 1 20:29:40 xyz kernel: [ 2585.635256] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:02:00.0
Sep 1 20:29:40 xyz kernel: [ 2585.635283] nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 1 20:29:40 xyz kernel: [ 2585.635298] nvme 0000:02:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Sep 1 20:29:40 xyz kernel: [ 2585.635310] nvme 0000:02:00.0: [ 0] RxErr (First)
I found some advice online to ignore them when I searched a while back, so I did. :D
so just to be clear, i got null errors on my nvme drive because it's pcie4 in a pcie3 slot? i created this problem by using the wrong hardware and the null errors are the result of that hardware mismatch. and while i was led to a workaround in grub, that isn't the ideal option here. the ideal option would be to use a pcie3 nvme drive in a pcie3 slot. is this the correct understanding?
That helps a lot, knowing it is due to mixing PCIE3 and PCIE4. According to this, you should be able to use PCIE3 and PCIE4 together, so this is probably more due to a "Linux" issue, I am sure in the past there have been discussions about NVME suggesting support in Linux was sometimes imperfect, although I guess it keeps improving:
https://www.quora.com/What-happens-if-you-use-a-PCIe-4-0-NVMe-SSD-in-a-PCIe-3-0-M-2-motherboard-slot (https://www.quora.com/What-happens-if-you-use-a-PCIe-4-0-NVMe-SSD-in-a-PCIe-3-0-M-2-motherboard-slot)
noaer would disable my internal wifi card, so I couldn't use that. Another suggestion:
"But the above solution of adding "pci=noaer" to boot I do not think really "solves" anything other than hiding the error. The error is still happening, just not reporting it."
"... try pcie_aspm=off . This seems to disable power management mode which is throwing the error."
https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/ (https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/)
This option may affect sleep, but possibly only on the PCI bus. From a quick test it is working, there are no errors, and the wifi works too, no idea about the long term or sleep which I don't normally use anyway, I just shutdown fully.
TEAMGROUP MP33 2TB SLC Cache 3D NAND TLC NVMe 1.3 PCIe Gen3x4 M.2 2280 Internal Solid State Drive SSD (Read/Write Speed up to 1,800/1,500 MB/s) Compatible with Laptop & PC Desktop TM8FP6002T0C101 https://a.co/d/2HL3cnO
Coming in a few hours
My drive which gives the same errors is a Gen 3.0 x4 like that Teamgroup one too, so I wonder if that will help?
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/pc-sn720-ssd/data-sheet-pc-sn720-compute.pdf (https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/pc-sn720-ssd/data-sheet-pc-sn720-compute.pdf)
I put:
GRUB_CMDLINE_LINUX="pcie_aspm=off"
in /etc/default/grub, then ran update-grub. Barring any future issues that occur to persuade me otherwise, that will probably do until either the kernel fixes the issue or I upgrade to a newer laptop sometime in the future. :D
(What these options do is turn on or off kernel driver features, so they are not necessarily bad to use, it is not really any different than choosing config options when building a kernel.)
i had added that to the other gen4 wd black. i havent added it yet to this one. i thought i had d/l and installed mate dinit iso onto the 2tb nvme but it turns out i installed openrc so i guess that's what i'm using this go around? anyway all seems fine so far. and since i know about this error i set root to only 30 gb. if it fills up it will do so much more quickly!