[SOLVED] count with python/awk or other script/command

Topic: [SOLVED] count with python/awk or other script/command (Read 3784 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: count with python/awk or other script/command

Reply #15 – 28 October 2023, 19:02:33

My python knowledge is limited to the basics so expect bugs or that something will not work as expected.

Try this:

import fileinput

line_list = []
line_count = 0
for line in fileinput.input(["fin.txt"]):
    
    letter_list = []
    for letter in line.lower():
        if letter.isalpha():
            letter_list.append(letter)
    if len(letter_list) != 0:
        words = []
        
        for x in range(0, len(letter_list), 6):
            word = letter_list[x: x + 6]
            words.append(word)
        
        
        line_list.append(words)

    line_count += 1
    
groups = ["ab", "cd", "ef", "gh", "ij", "kl"]

gr = {}

for la in line_list:
    
    for word in la:
        word.sort()
        counter = ""
        for group in groups:
            
            count = word.count(group[0]) + word.count(group[1])
            if count != 0:
                counter += str(count)
        gr["".join(word)] = "".join(sorted(counter, reverse=True))
        
for word, count in gr.items():
    print(word, count)
    
print('-'* 30)    
print(f"Line count = {line_count}")

Or this:

Code: [Select]

with open("fin.txt") as file:
    line_list = []
    line_count = 0
    for line in file:
        
        letter_list = []
        for letter in line.lower():
            if letter.isalpha():
                letter_list.append(letter)
        if len(letter_list) != 0:
            words = []
            
            for x in range(0, len(letter_list), 6):
                word = letter_list[x: x + 6]
                words.append(word)
            
            
            line_list.append(words)
        line_count += 1
        
groups = ["ab", "cd", "ef", "gh", "ij", "kl"]

gr = {}

for la in line_list:
    
    for word in la:
        word.sort()
        counter = ""
        for group in groups:
            
            count = word.count(group[0]) + word.count(group[1])
            if count != 0:
                counter += str(count)
        gr["".join(word)] = "".join(sorted(counter, reverse=True))
            
for word, count in gr.items():
    print(word, count)
print('-'* 30)    
print(f"Line count = {line_count}")

It should works better for large files. Check if 'Line_count' is correct. If it is then it must be other problem

Re: count with python/awk or other script/command

Reply #16 – 28 October 2023, 19:54:19

Tried both latest versions and both despite printing at the bottom the correct count it doesn't do all the rest but just counts them and what it prints is 89% chopped away, the only useful data is just around 11%

Anyway it does something but just super unreliable. But tnx for trying maybe the script must be polished in some parts to have accuracy.

Tnx for trying once again. ✌🏻

Re: count with python/awk or other script/command

Reply #17 – 28 October 2023, 20:50:05

One more try

, I think I've found the problem,
I saved words to the dictionary without dividing by line, so if a word repeated itself on many lines it was saved in the result only once, and with six-letter words there were many repetitions of the same word

Code: [Select]

with open("fin.txt") as file:
    line_list = []
    line_count = 0
    for line in file:
        
        letter_list = []
        for letter in line.lower():
            if letter.isalpha():
                letter_list.append(letter)
        if len(letter_list) != 0:
            words = []
            
            for x in range(0, len(letter_list), 6):
                word = letter_list[x: x + 6]
                words.append(word)
            
            
            line_list.append(words)
        line_count += 1
        
groups = ["ab", "cd", "ef", "gh", "ij", "kl"]

result = []

for la in line_list:
    gr = {}
    for word in la:
        word.sort()
        counter = ""
        for group in groups:
            
            count = word.count(group[0]) + word.count(group[1])
            if count != 0:
                counter += str(count)
        gr["".join(word)] = "".join(sorted(counter, reverse=True))
    result.append(gr)

for ind, value in enumerate(result):
    print(f"Line {ind} - {value}")            
    print("-"* 30)

print(f"Line count = {line_count}")

Re: count with python/awk or other script/command

Reply #18 – 28 October 2023, 23:43:43

Yeah now it looks like we are into something good. I'll try to remove those intermediary lines so output file will remain aprox. the same size.

Code: [Select]

#    print("-"* 30)

It looks good at first view. I'll try to process the file and come back with the results. Good script! Where's that thumb up when you needed..aw here it is 👍🏻

Tnx 🎯

Re: count with python/awk or other script/command

Reply #19 – 29 October 2023, 02:25:46

Managed to process it and looks like based on the chosen grouping example the best is to choose 2 letters, repeats or not, from 2 different groups and 2 other elements but each belonging to a different group. Here's the stats image in case someone might be curious.

The sequence at the bottom is no present in the OEIS data base so we have a W. The challenge that still remains is to deduce a general formula.
as a side note, you can see that choosing 1 letter from each group has not the most arr. number.

*had to split the file in 6 and sum them below in ascending order..

Tnx for the help.

Re: count with python/awk or other script/command

Reply #20 – 29 October 2023, 12:06:59

Now look at this small example. The string highlighted in the green box is artificially 'crafted' so it fits both those two criteria (mixed and split arr.) with the highest possible variants hence has higher entropy. If the string matches only one criteria is good but not as good as if it scores both ✅✅

Even if it may look like it's being singled out in fact it isn't. As long as the chosen string scores those larger piles no matter how many criteria are being used the more the better meaning those strings have higher and higher entropy.

Re: count with python/awk or other script/command

Reply #21 – 30 October 2023, 20:30:22

Formula cracked, here's how all calculations looks like. Done putting it on OEIS too. Hope this time gonna have more luck.
All those slices bigger or smaller multiplies with G^s (where G is the number of elements within a group and s is the string length) everything else is the same as my other formula but everything relates to groups and not single elements.
example
ab cd ef gh ij kl
A(group no,pattern group no.)
2 2 1 1 (2^2)×(2^2)×(2^1)×(2^1) which is 2^6 (G^s) × A(6,4) × 6!/(2!^2)*2!*(6-4)! = 1.036.800

Also it applies to all cases either t>s or t<s the only required thing is that groups to be equal and so it has to be one of t divisors or t has to be a multiple of G

Re: [SOLVED] count with python/awk or other script/command

Reply #22 – 03 November 2023, 17:44:34

Hello is there anywhere I can modify groups?

I mean for example if I wanna split fields in uneven groups like
abcdefg hijkl

Tried to modify the script groups

Code: [Select]

groups = ["abcdefg", "hijkl"]

but looks like there's another part of the script which doesn't like that. I'm assuming this

Code: [Select]

count = word.count(group[0]) + word.count(group[1])

Thank you, if you can take a look. The previous script is excellent for that grouping type but looks like there's no easy way for me to make it more versatile from the bird's eye view. And speaking of birds as a small divagation from the main topic, today a crow almost hit me with a walnut in the head she threw so to break it and of course eat it. 😂

Tnx and ✌🏻

Re: [SOLVED] count with python/awk or other script/command

Reply #23 – 03 November 2023, 18:23:29

Change this part of code :

Code: [Select]

 for group in groups:
           
            count = word.count(group[0]) + word.count(group[1])
            if count != 0:
                counter += str(count)

to this, output with "0"

Code: [Select]

for group in groups:
            count = 0
            for n in range(len(group)):
                count += word.count(group[n])
            #if count != 0:
            counter += str(count)

or, output without "0"

Code: [Select]

for group in groups:
            count = 0
            for n in range(len(group)):
                count += word.count(group[n])
            if count != 0:
                counter += str(count)

Does it work ?

Re: [SOLVED] count with python/awk or other script/command

Reply #24 – 03 November 2023, 21:06:55

I'll check in 1 minute. I'll be back asap.

Re: [SOLVED] count with python/awk or other script/command

Reply #25 – 03 November 2023, 21:28:53

Yes looks like it's working to perfection

Amazing. I'll check back with results, hope I can find it as easy as in the case of even groups. Thanks you nailed it perfectly 🎯

Re: [SOLVED] count with python/awk or other script/command

Reply #26 – 03 November 2023, 21:48:07

glad I could help

Re: [SOLVED] count with python/awk or other script/command

Reply #27 – 04 November 2023, 02:22:09

These are harder to break in unfortunately. Manged to deduce only 1 term out of 4 and I think it will be an Odyssey to fully get how they scale.

Re: [SOLVED] count with python/awk or other script/command

Reply #28 – 12 November 2023, 09:05:41

Break them in 1-2 days. Talked closely with an OEIS main editor and he said that arrays of t (total elements) are well known but he did not say a word about arrays of 's'. Interesting they didn't feel the need of 's' arrays from reasons I don't understand. All that I can say is they don't have it.

Now two things can be done, 1st, search for 't' and 's' arrays and see for which 't' array and which 's' array we will have most numerous objects (arr.) and 2nd view is to find any possible 't' arrays and retain only those strings that always qualify in the most numerous patterns. These I would call them 'hyper strings', meaning no matter how you spin them (but still keeping similar input) always they'll be in a numerous pattern. Like a cat that always falls on its feet. Cat strings 😄

Bellow I added an image how can be calculate not only relative to one array but both of them (t and s arrays) simultaneously. These are not abstract things but really reflect the exact number that form depending on the user input. Would be really unique thing to obtain a script that merge both perspectives (t and s arrays). One of editor said that combinatorics is really a new less known and little explored domain so this is why somehow I'm quite a pioneer on these rn.

Re: [SOLVED] count with python/awk or other script/command

Reply #29 – 14 November 2023, 12:15:16

Hello @wiezyr do you have any idea what should I change so it doesn't count only for small letters?

Code: [Select]

       
groups = ["!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~", "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz", "0123456789"]

result = []

Trying to use symbols too along with cap letter and numbers grouped above.

Think the script as is now is restricted by this two syntax:

Code: [Select]

        for letter in line.lower():
            if letter.isalpha():

I think I need to find something else instead of line.lower() and letter.isalpha().

isascii do you think would fix the problem?

And last question is this good enough to escape those problematic symbols like I did bellow?

Code: [Select]

"!\"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~"

Thanks, if you can take a look would be awesome.