Print Page - [SOLVED] How to Regex Repetition? (grep)

Title: [SOLVED] How to Regex Repetition? (grep)
Post by: TheYellowArchitect on 24 August 2021, 21:32:45

Hello!
I want to make a bash script in linux, using terminal commands, and I'm stuck at the very beginning

I have the following text file:

Code: [Select]

[start of file]
[tabkey]text1
text2
[tabkey][tabkey]text3
[end of file]

Each of the above text is in its own line, so there are 3 lines in total. The first line has 1 tab, the 3rd has 2 tabs at the start.

If I use

Code: [Select]

grep  $'\t'

I get all lines with tabs, but not highlighted ofc.
So I ended up using

Code: [Select]

grep $'\t'".*"

to get text1 and text3.
However, how can I get only 1 \t?

I want to get exclusively text1, or exclusively text3, depending on tab count. I ask this because I can't grasp my head around repetition, {N} to repeat the previous command doesn't seem to work even for letters, yet I need it for the tab character.

Title: Re: How to Regex Repetition? (grep)
Post by: VictorBrand on 24 August 2021, 22:02:19

Quote from: TheYellowArchitect – on 24 August 2021, 21:32:45

However, how can I get only 1 \t?

Hello! In your case, it's better to use extended syntax of regexp with grep, it's option -E. If you want to get only one \t at the beginning, mark the beginning with ^, and then specify any symbol which is not \t:

Code: [Select]

grep -E $'^\t[^\t]'

(this will return text1).

If you need any number of \t before you encounter a non-\t symbol, specify this number in {}:

Code: [Select]

grep -E $'^\t{2}[^\t]'

(this will return text3).

Title: Re: How to Regex Repetition? (grep)
Post by: strajder on 25 August 2021, 07:58:04

Quote from: VictorBrand – on 24 August 2021, 22:02:19

If you need any number of \t before you encounter a non-\t symbol, specify this number in {}:

Just to add that this is explained in man grep:

Quote

{n} The preceding item is matched exactly n times.

Edit: Also,

Quote

   Basic vs Extended Regular Expressions
   In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, $, and
   $.

Title: Re: How to Regex Repetition? (grep)
Post by: TheYellowArchitect on 25 August 2021, 21:48:17

That was a swift reply, and it also worked instantly, wow

regex is hard I have to admit, getting a few word variants here and there seems easy, but anything beyond that gets very hard, and I'm kinda lucky I don't want anything more of regex

I was about to post on "why does this work" but I experimented and learned some new things lol
'^\t' is tab at the start of the line, and {N} is the repetition of previous command (in extended regex), and [^\t] initially seems bloated but is for avoiding further tabs (I was about to write tl;dr "why put this lol" until I made 3 tabs text in the same file, seems like it excludes the character after [^)

It ends up 'grep -E $'^\t{N}[^\t].*' is perfect. I would have never ended up on this alone, bless.

Though kinda off-topic, that dollar sign at the start, what is it for?

Title: Re: [SOLVED] How to Regex Repetition? (grep)
Post by: strajder on 25 August 2021, 22:25:01

I suggest reading the grep manpage (man grep).

Title: Re: [SOLVED] How to Regex Repetition? (grep)
Post by: VictorBrand on 25 August 2021, 22:29:22

Quote from: TheYellowArchitect – on 25 August 2021, 21:48:17

That was a swift reply, and it also worked instantly, wow

regex is hard I have to admit, getting a few word variants here and there seems easy, but anything beyond that gets very hard, and I'm kinda lucky I don't want anything more of regex

Glad you are satisfied :)

Regular expressions indeed may be hard to understand when they are written in a complex way, but in fact they are quite simple and simultaneously powerful. The most confusing thing about them is the fact that regexp syntax may somewhat vary from one application to another, although in its core it is the same. There is a good short educational video (https://www.youtube.com/watch?v=bgBWp9EIlMM) on regexps, the guy explains them quite clearly.

The only caveat about regexps is that they are using finite automata (or Turing Machine) in their core, thus they are quite slow. This is not the issue when you use them here or there in your scripts, though. But once I've seen how one guy wrote a sort of a scrapmetal parser with regexps. There were cycles of regexps which operated with strings extracted in other cycles of regexps and so on and do forth. Apparently this thing worked painfully slow.

Quote from: TheYellowArchitect – on 25 August 2021, 21:48:17

Though kinda off-topic, that dollar sign at the start, what is it for?

It's a part of the bash syntax. It causes escape-sequences to be interpreted and translated into their ANSI codes. In our case, \t must be translated, because grep doesn't respect such sequences in its regexp syntax. You can read about that here (https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html#ANSI_002dC-Quoting).

Artix Linux Forum

Artix Linux => Software development => Topic started by: TheYellowArchitect on 24 August 2021, 21:32:45