to get text1 and text3. However, how can I get only 1 \t?
I want to get exclusively text1, or exclusively text3, depending on tab count. I ask this because I can't grasp my head around repetition, {N} to repeat the previous command doesn't seem to work even for letters, yet I need it for the tab character.
Title: Re: How to Regex Repetition? (grep)
Post by: VictorBrand on 24 August 2021, 22:02:19
Hello! In your case, it's better to use extended syntax of regexp with grep, it's option -E. If you want to get only one \t at the beginning, mark the beginning with ^, and then specify any symbol which is not \t:
If you need any number of \t before you encounter a non-\t symbol, specify this number in {}:
Just to add that this is explained in man grep:
Quote
{n} The preceding item is matched exactly n times.
Edit: Also,
Quote
Basic vs Extended Regular Expressions In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
Title: Re: How to Regex Repetition? (grep)
Post by: TheYellowArchitect on 25 August 2021, 21:48:17
That was a swift reply, and it also worked instantly, wow
regex is hard I have to admit, getting a few word variants here and there seems easy, but anything beyond that gets very hard, and I'm kinda lucky I don't want anything more of regex
I was about to post on "why does this work" but I experimented and learned some new things lol '^\t' is tab at the start of the line, and {N} is the repetition of previous command (in extended regex), and [^\t] initially seems bloated but is for avoiding further tabs (I was about to write tl;dr "why put this lol" until I made 3 tabs text in the same file, seems like it excludes the character after [^)
It ends up 'grep -E $'^\t{N}[^\t].*' is perfect. I would have never ended up on this alone, bless.
Though kinda off-topic, that dollar sign at the start, what is it for?
Title: Re: [SOLVED] How to Regex Repetition? (grep)
Post by: strajder on 25 August 2021, 22:25:01
I suggest reading the grep manpage (man grep).
Title: Re: [SOLVED] How to Regex Repetition? (grep)
Post by: VictorBrand on 25 August 2021, 22:29:22
That was a swift reply, and it also worked instantly, wow
regex is hard I have to admit, getting a few word variants here and there seems easy, but anything beyond that gets very hard, and I'm kinda lucky I don't want anything more of regex
Glad you are satisfied :)
Regular expressions indeed may be hard to understand when they are written in a complex way, but in fact they are quite simple and simultaneously powerful. The most confusing thing about them is the fact that regexp syntax may somewhat vary from one application to another, although in its core it is the same. There is a good short educational video (https://www.youtube.com/watch?v=bgBWp9EIlMM) on regexps, the guy explains them quite clearly.
The only caveat about regexps is that they are using finite automata (or Turing Machine) in their core, thus they are quite slow. This is not the issue when you use them here or there in your scripts, though. But once I've seen how one guy wrote a sort of a scrapmetal parser with regexps. There were cycles of regexps which operated with strings extracted in other cycles of regexps and so on and do forth. Apparently this thing worked painfully slow.
Though kinda off-topic, that dollar sign at the start, what is it for?
It's a part of the bash syntax. It causes escape-sequences to be interpreted and translated into their ANSI codes. In our case, \t must be translated, because grep doesn't respect such sequences in its regexp syntax. You can read about that here (https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html#ANSI_002dC-Quoting).