Help with a complex regex

Hi guys, I am making progress with regex, but I still need help.

I am trying to make a special regex for my file parsor.

Basically, I would like to check if a line contain a dependency in this form, this is an example:

@ProgrammingTools-Main:Gcc>=14.2.0(Pass1,Lto)

Where the line can start with space before, the comparator can be >=,<=,<,>,= and it can have nothing after the version OR a parenthesis that contain a list of option separated by “,”.

I did this regex, but it does not work properly:

[\s]+@[A-Za-z0-9-]+:[A-Za-z0-9-]+(>=|<=|=|>|<)[A-Za-z0-9.-]+([(][A-Za-z-0-9-,]*[)])?

I tried with different text at this website: https://regexr.com/

@ProgrammingTools-Main:Gcc>=14.2.0
	@ProgrammingTools-Main:Gcc>=14.2.0(Pass1,Pass2,Pass3,Lto)
		@ProgrammingTools-Main:Gcc>=14.2.0Pass1,Pass2,Pass3)

Basically line 1 and 2 should match

str = <<-HEREDOC
  @ProgrammingTools-Main:Gcc>=14.2.0
      @ProgrammingTools-Main:Gcc>=14.2.0(Pass1,Pass2,Pass3,Lto)
          @ProgrammingTools-Main:Gcc>=14.2.0(Pass1,Pass2,Pass3)
  HEREDOC
 
pattern = /^[\s]+(@[\w\-]+):([\w\-]+)(>=|<=|=|>|<)([\w.\-]+)(?:\(([\w,\-]*)\))?/im
 
str.scan(pattern) do |match|
  pp match
end
1 Like

Thanks a lot for your help! Just 2 questions.

First: is it normal when I copy this regex to the regex website, it don’t work ?

Second: When I try the code, I am getting this:

Regex::MatchData("    @ProgrammingTools-Main:Gcc>=14.2.0(Pass1,Pass2,Pass3,Lto)"
 1:"@ProgrammingTools-Main"
 2:"Gcc"
 3:">="
 4:"14.2.0"
 5:"Pass1,Pass2,Pass3,Lto")
Regex::MatchData("        @ProgrammingTools-Main:Gcc>=14.2.0(Pass1,Pass2,Pass3)"
 1:"@ProgrammingTools-Main"
 2:"Gcc"
 3:">="
 4:"14.2.0"
 5:"Pass1,Pass2,Pass3")

Is it normal the last match does not include the first parenthesis ?

I would it match this as well, because it can be to without anything after

If you want to match all 3, just change it to:

/(@[\w\-]+):([\w\-]+)(>=|<=|=|>|<)([\w.\-]+)(?:\(([\w,\-]*)\))?/im

So I tried again with this tests:

str = <<-HEREDOC
  @ProgrammingTools-Main:Gcc>=14.2.0
      @ProgrammingTools-Main:Gcc>=14.2.0(Pass1,Pass2,Pass3,Lto)
          @ProgrammingTools-Main:Gcc>=14.2.0
          @ProgrammingTools-Main:Gcc>=14.2.0Pass1,Pass2,Pass3,Lto)
  HEREDOC

pattern = /(@[\w\-]+):([\w\-]+)(>=|<=|=|>|<)([\w.\-]+)(?:\(([\w,\-]*)\))?/im

str.scan(pattern) do |match|
  pp match
end

It is matching the last one, even the parenthesis was not opened at the beginning. If it is possible to fix this please ? Thank you again

Here you go:

(@[\w\-]+):([\w\-]+)(>=|<=|=|>|<)([\d.\-]+)(?:\(?([\w,\-]*)\)?)?

It is still matching even if you don’t open parenthesis :x

str = <<-HEREDOC
@ProgrammingTools-Main:Gcc>=14.2.0Pass1,Pass2,Pass3,Lto)
HEREDOC

pattern = /(@[\w\-]+):([\w\-]+)(>=|<=|=|>|<)([\d.\-]+)(?:\(?([\w,\-]*)\)?)?/

str.scan(pattern) do |match|
  pp match
end

Yeah, I thought that’s what you wanted :man_shrugging: I guess two things would come handy here: more clear communication regarding the desired outcome and some energy put by you into learning yourself regular expressions xD They ain’t that hard :slight_smile:

1 Like

These regexes all think that 14.2.0Pass1 is itself a valid version, it is up to you to constrain the version format further if you don’t want this to happen

I will see this later, but yes it’s true too.

First I would like to understand how I can catch a list in regex, because I don’t understand.

For example a simple list like:

arm,amd64,x86

How I can build a regex that catch the list if it’s one or more item separated by coma but reject if the list start with a coma for example ?

I tried this simple example:

str = <<-HEREDOC
x86_64,arm
	x86_64,arm
,x86_64,arm
	arm
arm

HEREDOC

pattern = /\s*(\w+)(,\w+)/

str.scan(pattern) do |match|
  pp match
end

I would like to match the line 1, 2 and 4 basically, but not the line 3

Spaces are giving troubles

After keep practising a bit, I think I got it how it work now, thanks a lot

I like creating complex Regex, but I also get used to use AI on a daily basis. Didn’t you try to ask AI for that task? By just copying your first post to Claude AI, i get this one, which seems to fulfill all your requirements:

^\s*@[A-Za-z0-9-]+:[A-Za-z0-9-]+(>=|<=|=|>|<)[A-Za-z0-9.-]+(\([A-Za-z0-9,-]+\))?$

But as Sija told you already - it depends how complex your “error handling” should be … and if, it must be defined explicitly.