Artix Linux Forum

Artix Linux => Applications & Software => Topic started by: Thats_me on 03 May 2025, 11:52:44

Title: Graphical Tool to find duplicate files
Post by: Thats_me on 03 May 2025, 11:52:44
Hi,

I am looking for a tool where I can find  duplicates content of files ind different folders.
Especially from pdf files.

Is there a package?

There ist an AUR for example https://aur.archlinux.org/packages/fslint
But if I search with octopi this AUR is not available. So I think it is not secure to install it?
Title: Re: Graphical Tool to find duplicate files
Post by: matrixphil on 03 May 2025, 12:25:17
If you're concerned about AUR safety, you can always review the PKGBUILD first before installing. Also, make sure your Octopi settings allow AUR search; t might be disabled by default.
Title: Re: Graphical Tool to find duplicate files
Post by: Thats_me on 03 May 2025, 12:37:14
Thank you.

Yes Octopi shows AUR. I have installed some Arch Pacages. I use octopi only for search packages.
I am unfortunately not able to build and test PKGBUILD.

Ok, then there is no chance to use a graphical tool with artix. I have installed also MX Linux for emergencies.
I guess there is a tool.

Title: Re: Graphical Tool to find duplicate files
Post by: ####### on 03 May 2025, 14:25:07
I use rmlint for this, it seems to work well although you should read the instructions carefully with anything like this to make sure you remove what you intended! Also it's probably a good idea to back up your system first. But in fact, it will tell you what it is going to remove before it does so. There's a graphical interface for it in the AUR as well, but that part I have never tried - it's so simple to use in the terminal I have never felt any need for a GUI. I find it helpful after running file system recoveries or as part of cleaning up my home dir before moving it to a new machine, or merging directory trees from different machines that have diverged over time.
The AUR is really just a collection of extra packages that are not popular enough to be included in the main repos and run through the build servers on every update and stored in the mirrors. There is nothing inherently insecure about it and it does have oversight and undesirable things are removed if found, which is exceedingly rare and usually involves obscure newly added items. The main binary repos are just even more secure and better tested.
Title: Re: Graphical Tool to find duplicate files
Post by: mrbrklyn on 07 May 2025, 23:55:41
I stole this from the internet - this problem screams for a shell script - not an interface...

Code: [Select]
find . -type f -exec stat --printf='%s/%n\0' {} + |
awk '
BEGIN{
        FS = "/"
        RS = ORS = "\0"
        q = "\047"
        md5_cmd = "md5sum"
}

{
    #get the files path from the two columns data delimited by slash char reported
    #by the stat command.
    filePath = substr($0, index($0, "/") +1)

    #record and group filePath having the same fileSize with NULL delimited
    sizes[$1] = ($1 in sizes? sizes[$1] : "") filePath ORS
}

END {
    for (size in sizes) {

        #split the filesPath for each group of files to
        #calculate the check-sum for last confirmation to see if there are
        #any duplicate files among the same sized files
        filesNr = split(sizes[size], filesName, ORS)

        #call md5sum only if there are more than two files with the same size in that group.
        if (filesNr > 2) {
            for (i = 1; i < filesNr; i++) {
                if ((md5_cmd " " q filesName[i] q) | getline md5 >0) {
                   
                    #split to extract the hash of a file
                    split(md5, hash, " ")

                    #remove leading back-slash from the hash if a fileName contain
                    #back-slash char in its name. see https://unix.stackexchange.com/q/424628/72456
                    sub(/\\/, "", hash[1])

                    #records all the same sized filesPath along with their hash, again NULL delimited
                    hashes[hash[1]] = (hash[1] in hashes? hashes[hash[1]] : "") filesName[i] ORS

                    #record also the size of files with hash used as key mapping
                    fileSize[hash[1]] = size
                }
            }
        }
    }
    for (fileName in hashes) {

        #process the hash of the same sized filesPath to verify if there is a hash
        #which occupied for more than one file.
        #here hash is the key and filesName are values of the hashes[] array.
        filesNr = split(hashes[fileName], filesName, ORS)

        #OK, if there is a hash with +2 files, then we found duplicates, print the size, hash and the path.
        if ( filesNr> 2) {
            print  fileSize[fileName] " bytes, MD5: " fileName
            for(i=1; i < filesNr; i++)
                print filesName[i]
        }
    }
}'
Title: Re: Graphical Tool to find duplicate files
Post by: Oltean on 08 May 2025, 09:23:16
I use czkawka-gui-bin from AUR. Is a graphical tool. maybe help you.
Title: Re: Graphical Tool to find duplicate files
Post by: Ambie on 08 May 2025, 10:13:49
Also there's Double Commander which can search for duplicates.