[SOLVED] count with python/awk or other script/command

Topic: [SOLVED] count with python/awk or other script/command (Read 1770 times) previous topic - next topic

0 Members and 2 Guests are viewing this topic.

[SOLVED] count with python/awk or other script/command

17 October 2023, 16:15:44

Hello. I'm having 12 letters a-l, I need to group them by 2 like this ab cd ef gh ij kl.

Next I'm trying to sort a list of 6 chars long strings based on the grouping I mentioned at the beginning.

e.g. if string is 'a a a b b c' i should have this output 5 1 meaning there are 5 letters in the 'ab' group and 1 in 'cd' group.

The output should be in descending order like 51 or 42 or 3111 etc

I know for sure awk can do this like it's a piece of cake but I'm no awk wizard right now. 😬

Any cool idea if not awk then python or perl I'm not very picky. In the mean time I'm trying to come up with a python script for this, tnx for any hints ✌🏻

Re: count with awk or other script/command

Reply #1 – 18 October 2023, 08:41:48

I'm having this awk line where it reads from 'fin' file

Code: [Select]

awk -F 'a' '{s+=(NF-1)} END {print s}' fin

How can I modify it so it counts not only for 'a' but for 'b' too withing 'ab' group. Awk is horribly complicated.

Or I can do it like this

Code: [Select]

awk -F 'a' '{s+=(NF-1)} END {print s}' fin
awk -F 'b' '{s+=(NF-1)} END {print s}' fin

and summing up the result. Also I need to make it read each row from 'fin' file.

Re: count with awk or other script/command

Reply #2 – 18 October 2023, 22:42:18

Quote from: Surf3r – on 18 October 2023, 08:41:48

Awk is horribly complicated.

Use python. Unless speed of execution is an issue. Not that I know for sure that awk is fast ? Though I assume it is faster than python?
I agree though as awk is only marginally more understandable to me than brainfuck

Re: count with awk or other script/command

Reply #3 – 20 October 2023, 10:29:53

Well come up with this one, primitive but it's working. I need now to instruct it to do it for each row from fin file not just 1. If anyone can hairstyle it to be better would be awesome

*need to order the output in descending order.

Code: [Select]

{
awk -F 'a' '{s+=(NF-1)} END {print s}' fin
awk -F 'b' '{s+=(NF-1)} END {print s}' fin
} > ab
{
awk -F 'c' '{s+=(NF-1)} END {print s}' fin
awk -F 'd' '{s+=(NF-1)} END {print s}' fin
} > cd
{
awk -F 'e' '{s+=(NF-1)} END {print s}' fin
awk -F 'f' '{s+=(NF-1)} END {print s}' fin
} > ef
{
awk -F 'g' '{s+=(NF-1)} END {print s}' fin
awk -F 'h' '{s+=(NF-1)} END {print s}' fin
} > gh
{
awk -F 'i' '{s+=(NF-1)} END {print s}' fin
awk -F 'j' '{s+=(NF-1)} END {print s}' fin
} > ij
{
awk -F 'k' '{s+=(NF-1)} END {print s}' fin
awk -F 'l' '{s+=(NF-1)} END {print s}' fin
} > kl
{
awk '{ sum += $1 } END { print sum }' ab
awk '{ sum += $1 } END { print sum }' cd
awk '{ sum += $1 } END { print sum }' ef
awk '{ sum += $1 } END { print sum }' gh
awk '{ sum += $1 } END { print sum }' ij
awk '{ sum += $1 } END { print sum }' kl
} > allc
cat allc | tr -d '\n' > tc4
cat tc4
echo "   "
rm allc tc4 ab cd ef gh ij kl
exec bash

Re: count with awk or other script/command

Reply #4 – 20 October 2023, 12:03:30

Tried this but didn't verk xd

Code: [Select]

while read in; do bash x.sh "$in"; done < fin

named my script x.sh and the file it reads from it's called fin.

It sums up all results from each line into just only 1 line and for other lines gives error 'command not found' despite that isn't a command but it has to bloody read from fin file.

Any ides how this should properly done? Tnx

edit: tried with xargs and at least got 3 lines but with the same result ll of them, so there's a little progress but minuscule.

Code: [Select]

cat fin | xargs -L1 bash x.sh

Re: count with awk or other script/command

Reply #5 – 25 October 2023, 09:19:09

Found a python script

Code: [Select]

word = "mississippi"

counter = {}

for letter in word:
    if letter not in counter:
        counter[letter] = 0
    counter[letter] += 1


counter
print(counter)

But idk how can I make it read line by line from a file tried

Code: [Select]

word = str(open("fin.txt"))

but this way it counts from the python script itself and not only from the text file 😬 Any python guru have any idea how can I make this script loop thru de text file line by line?

Also tried this but it counts for the lines not the chars inside those lines.

Code: [Select]

with open('fin.txt') as f:
	lines = [line.rstrip() for line in f]
word = lines
counter = {}
for letter in word:
    if letter not in counter:
        counter[letter] = 0
    counter[letter] += 1
print(counter)

Re: count with python/awk or other script/command

Reply #6 – 25 October 2023, 13:54:05

Firstly, I am an absolute newbie when it comes to questions like yours, however, these problems fascinate me and I did a little search and found this:

https://stackoverflow.com/questions/41029735/piping-sed-awk-or-awk-sed

It may or may not help you, but I just thought I would throw it out there anyway

Re: count with python/awk or other script/command

Reply #7 – 25 October 2023, 15:07:36

Think I'm close to find a solution in python. Python looks the most noobie friendly like I am, but make no mistake python can be mind blowing complicated too is just more accessible for everybody, n00b or expert.

Tnx for the link 👍🏻

Re: count with python/awk or other script/command

Reply #8 – 25 October 2023, 16:26:46

I would like to learn some python, but so far had not had the time to take it up. Perhaps one day I will be forced into learning it just to solve a problem like you have got!

Hope you manage to find a successful solution

Re: count with python/awk or other script/command

Reply #9 – 27 October 2023, 15:34:04

Found another python script that comes pretty close from what I need. Is just this one split lines and counts the length of each string in the file but doesn't count for each char in each string in that file.

Code: [Select]

def fileCount(fname):
    #counting variables
    d = {"lines":0, "words": 0, "lengths":[]}
    #file is opened and assigned a variable
    with open(fname, 'r') as f:
        for line in f:
            # split into words
            spl = line.split()
            # increase count for each line
            d["lines"] += 1
            # add length of split list which will give total words
            d["words"] += len(spl)
            # get the length of each word and sum
            d["lengths"].append(sum(len(word) for word in spl))
    return d

def main():
    fname = input('Enter the name of the file to be used: ')
    data = fileCount(fname)
    print ("There are {lines} lines in the file.".format(**data))
    print ("There are {} characters in the file.".format(sum(data["lengths"])))
    print ("There are {words} words in the file.".format(**data))
    # enumerate over the lengths, outputting char count for each line
    for ind, s in enumerate(data["lengths"], 1):
        print("Line: {} has {} characters.".format(ind, s))
main()

Think I need to find the proper syntax so it gives the letter count not words count.

Code: [Select]

d["words"] += len(spl)

and here when I print results..

Code: [Select]

print ("There are {words} words in the file.".format(**data))

Think I'm so close to dodge maybe 2 years worth of python learn lol

Re: count with python/awk or other script/command

Reply #10 – 27 October 2023, 21:24:30

Quote from: Surf3r – on 27 October 2023, 15:34:04

Think I'm so close to dodge maybe 2 years worth of python learn lol

Re: count with python/awk or other script/command

Reply #11 – 27 October 2023, 22:32:34

I'm not sure I understand what you're trying to do, but maybe this will be helpfule:

Code: [Select]

with open("fin.txt", "r") as file:
    text = file.readlines()
line_list = []

for line in text:
    
    letter_list = []
    for letter in line.lower():
        if letter.isalpha():
            letter_list.append(letter)
    if len(letter_list) != 0:
        words = []
        
        for x in range(0, len(letter_list), 6):
            word = letter_list[x: x + 6]
            words.append(word)
        
        
        line_list.append(words)

groups = ["ab", "cd", "ef", "gh", "ij", "kl"]

gr = {}

for la in line_list:
    
    for word in la:
        word.sort()
        counter = ""
        for group in groups:
            
            count = word.count(group[0])+ word.count(group[1])
            if count != 0:
                counter += str(count)
        gr["".join(word)]= "".join(sorted(counter, reverse=True))
        
for word, count in gr.items():
    print(word, count)

Re: count with python/awk or other script/command

Reply #12 – 28 October 2023, 04:20:42

Cool perfect implementation. This script I'm gonna use to try to obtain a math formula for what will gonna be so called 'Split Arrangements' meaning the way things arrange based on grouping/split total elements of letters, in this case groups of two but hopefully I'll get a general formula and eventually catch somewhere the grouping way also.

To give you a bit of context I've already determined a general formula but based on repeating elements (called them Mixed Arrangements) and not by grouping them like in this case I'm working on now.

After I'll crack/deduce the formula I'm gonna do another script for it but till then there's a bit of work.

Ultimately Split arr. same as Mixed arr. can be used to discover strings with the highest mathematically possible entropy.

The grouping doesn't make difference if element repeats or not but counts just how many times one of those withing a group are.

So for e.g a string like 'aaaa' will be in the same league as 'abba' while when count for repeating element those two are in quite different leagues.

It will be interesting to find where one vs the other type of arr. have power and where have weaknesses.

For combinations these things are easy to do but when it comes to these arr. things go wild.

And of course thank you very much for the script it's perfect

Re: count with python/awk or other script/command

Reply #13 – 28 October 2023, 14:26:17

Tried the script on the real thing and it looks like it somehow crops from 2.985.984 strings/lines to just a little bit shy over 12k.

I've checked if it was nano fault while processing that raw text file but no nano is safe and sound

So from around 20MB text file file I end up with just 140 kb

Is there some bug in python or it's simply the script?

Think it's because of the file size the script can digest at the most 12k lines (12376)
as is now.

Tnx again for the help 😇

Edit: I'll try to break that big 20MB text file into smaller ones maybe it can do it that way.

Re: count with python/awk or other script/command

Reply #14 – 28 October 2023, 15:44:18

Yep re-verified it. So even if I split that larger file into smaller ones still it crops about 6x or more. I have no idea why. If you feed 5000 lines it should print 5000 not less.

It loses about 3 lines every 36.