Skip to main content
Topic solved
This topic has been marked as solved and requires no further attention.
Topic: Counting command or script (Read 1738 times) previous topic - next topic
0 Members and 3 Guests are viewing this topic.

Counting command or script



 Hello, I need to count some chars from a text file. Any wonder command to get this kind of output or I have to make a script?
 Thanks if anyone has a magic way it would be awesome! if can share it. Tried grep and some scripts but it gives poor results. Need this to make a script that belongs to a bigger project of mine about password quality strength. Thanks in advance  ;)

       
the input                                       the output
looks like                                      should look
   this                                                    this

  AAAAA                                               5
  AAAAB                                              4 1
  AAABB                                              3 2
  AAABC                                              3 1 1
  AAABD                                              3 1 1
  BAABE                                              2 2 1 

Re: Counting command or script

Reply #1
grep -oc "character" straight from the input?
Or as an alternative tr -dc "character" <<< input | wc -c
Iterate it for each line, then each character with a "while read -r" loop, then a "for x in a b c d" loop.

If you want to directly enumerate which charaters are on which line i'm afrain you will need awk.

Re: Counting command or script

Reply #2

 Cool, thanks, I'll check it out 👍🏻

Re: Counting command or script

Reply #3

 Found a script that worked for me
 
Code: [Select]
sed 's/[^A]//g' INPUT | awk '{ print length }' > OUTPUT

 Had to remove spaces and zeroes with tr

 And for the descending or ascending sort per each line tried sort command but it seems the way I did it didn't bring any mind blowing results  :D

Code: [Select]
sort -r -n FILE.txt
or
Code: [Select]
sort -r -g FILE.txt
for no avail
I'm sure there's a way of using sort in this case but I have no idea what flags I have to pass so to get desired result

Now I get like  all these 141 or 411 114 but I need descending like 411



Re: Counting command or script

Reply #4
Code: [Select]
iason:[nous]:~% cat t
AAAAA
AAAAB
AAABB
AAABC
AAABD
BAABE
iason:[nous]:~% perl -F'' -le '$c{$_}++ for @F; print %c; %c=();' t
A5
B1A4
A3B2
B1C1A3
A3D1B1
A2B2E1

Re: Counting command or script

Reply #5
Code: [Select]
#!/usr/bin/env python3
if __name__=='__main__':
import sys
with open(sys.argv[1],'r') as _:
lines = _.readlines()

for line in lines:
#remove lead/trail whitespace; split on whitespace and rejoin without
line = "".join(line.strip().split())
counts = {}
for c in line:
counts.setdefault(c,0)
counts[c] += 1
print(' '.join((str(_) for _ in reversed(sorted(counts.values())))))

I used this on your sample and it ran like this

$ python312 counter.py counter.data
5
4 1
3 2
3 1 1
3 1 1
2 2 1


Re: Counting command or script

Reply #6
Code: [Select]
% perl -F'' -le '$c{$_}++ for @F; print %c; %c=();' t | sed 's/[^0-9]/ /g'
 5
 4 1
 3 2
 3 1 1
 1 3 1
 2 2 1

Re: Counting command or script

Reply #7
Code: [Select]
% perl -F'' -le '$c{$_}++ for @F; print %c; %c=();' t | sed 's/[^0-9]/ /g'
Not right for large numbers eg

Code: [Select]
$ cat t && perl -F'' -le '$c{$_}++ for @F; print %c; %c=();' t | sed 's/[^0-9]/ /g'
AAAAAAAAAAAAAAAAAAAA
AAAAB
AABBBBBBBBBBBBBBBBBBABB
AAABAAAAAAAAAAAAAAC
AAABBBBBBBBBBBBBBBBBBBD
BAABE
 20
 4 1
 3 20
 1 17 1
 1 3 19
 2 2 1

The python version gives this
Code: [Select]
$ python counter.py t
20
4 1
20 3
17 1 1
19 3 1
2 2 1

 

Re: Counting command or script

Reply #8
  Aaa thank you very much  :) . I'll look thru all of them tnx. In the meantime found a python script from this website

 That generates arrangements with repetitions so virtually I can take that output, process it thru my requirements, for example print only strings with double aa in them.

 Script looks like this, it may be useful for those who wanna generate rep. arrangements.Now I'll look how I can use your solutions. Thanks again 👍🏻

Code: [Select]
'''Permutations of n elements drawn from k values'''

from itertools import product


# replicateM :: Applicative m => Int -> m a -> m [a]
def replicateM(n):
    '''A functor collecting values accumulated by
       n repetitions of m. (List instance only here).
    '''
    def rep(m):
        def go(x):
            return [[]] if 1 > x else (
                liftA2List(lambda a, b: [a] + b)(m)(go(x - 1))
            )
        return go(n)
    return lambda m: rep(m)


# TEST ----------------------------------------------------
# main :: IO ()
def main():
    '''Permutations of two elements, drawn from three values'''
    print(
        fTable(main.__doc__ + ':\n')(repr)(showList)(

            replicateM(7)

        )(['abcdefg'])
    )


# GENERIC FUNCTIONS ---------------------------------------

# liftA2List :: (a -> b -> c) -> [a] -> [b] -> [c]
def liftA2List(f):
    '''The binary operator f lifted to a function over two
       lists. f applied to each pair of arguments in the
       cartesian product of xs and ys.
    '''
    return lambda xs: lambda ys: [
        f(*xy) for xy in product(xs, ys)
    ]


# DISPLAY -------------------------------------------------

# fTable :: String -> (a -> String) ->
#                     (b -> String) -> (a -> b) -> [a] -> String
def fTable(s):
    '''Heading -> x display function -> fx display function ->
                     f -> xs -> tabular string.
    '''
    def go(xShow, fxShow, f, xs):
        ys = [xShow(x) for x in xs]
        w = max(map(len, ys))
        return s + '\n' + '\n'.join(map(
            lambda x, y: y.rjust(w, ' ') + ' -> ' + fxShow(f(x)),
            xs, ys
        ))
    return lambda xShow: lambda fxShow: lambda f: lambda xs: go(
        xShow, fxShow, f, xs
    )


# showList :: [a] -> String
def showList(xs):
    '''Stringification of a list.'''
    return '[' + ','.join(
        showList(x) if isinstance(x, list) else repr(x) for x in xs
    ) + ']'


# MAIN ---
if __name__ == '__main__':
    main()


Re: Counting command or script

Reply #10

 Tried  @replabrobin script but it shoots out only 2 columns, also didn't fully get why you use python312 or that is just a typo?

Maybe because you use different shell than bash or what can be the cause? You copied the code wrong I guess, idk  🤔


Re: Counting command or script

Reply #11
Code: [Select]
% perl -F'' -le '$c{$_}++ for @F; print %c; %c=();' t | sed 's/[^0-9]/ /g'
Not right for large numbers
It is right, just unsorted, which is easily fixable. This version also gets rid of sed(1):
Code: [Select]
hyperion:[nous]:/tmp% cat t tt
AAAAA
AAAAB
AAABB
AAABC
AAABD
BAABE
AAAAAAAAAAAAAAAAAAAA
AAAAB
AABBBBBBBBBBBBBBBBBBABB
AAABAAAAAAAAAAAAAAC
AAABBBBBBBBBBBBBBBBBBBD
BAABE
hyperion:[nous]:/tmp% perl -F'' -le '$,=" "; $c{$_}++ for @F; print sort{$b<=>$a} values %c; %c=();' t tt
5
4 1
3 2
3 1 1
3 1 1
2 2 1
20
4 1
20 3
17 1 1
19 3 1
2 2 1

Re: Counting command or script

Reply #12
 
 EDIT: It worked @replabrobin  script I feed the wrong numeric format while should have used letters so that's why it gave me errors

 Now my issue is that I have a 24MB text file but it's a long uninterrupted string 7x800k letters . What I need now is to break this string starting from the beginning into just 7 chars long string from beginning to end. So I should have a column instead of that huge row, a column with 7 elements on each row. After that  I can feed that format to the count.py script

Re: Counting command or script

Reply #13
the input                                       the output
looks like                                      should look
   this                                                    this
  AAAAA                                               5
  AAAAB                                              4 1
  AAABB                                              3 2
  AAABC                                              3 1 1
  AAABD                                              3 1 1
  BAABE                                              2 2 1 
Now my issue is that I have a 24MB text file but it's a long uninterrupted string 7x800k letters . What I need now is to break this string starting from the beginning into just 7 chars long string from beginning to end. So I should have a column instead of that huge row, a column with 7 elements on each row. After that  I can feed that format to the count.py script
You're wasting people's time.

Re: Counting command or script

Reply #14
Quote
You're wasting people's time.

Mmmm NOPE, deduced something very important/crucial for my project. Solved that string breaking problem with fold -b7 etc so no problem with that.

The revealing conclusion is that calculation of a string pattern does not scale as I've expected. It's either impossible to deduce or just super hard to deduce/demonstrate mathematically. Without crash testing different strings lengths it's impossible to have an accurate picture of how/why/what changed the result.

I'll keep updating conclusions to my main thread on this topic for now. Tnx for patience and remember guys  nothing is in vane even if it may look and feel like that.  :(