Character classes in Regex

I’ve noticed that [[:word:]] doesn’t seem to work in Crystal and I don’t know why.

Shouldn’t all these be supported? pcresyntax specification

Here is example in Ruby:

% ruby -e 'p %r{[[[:word:]]\.]+\b}.match("lowercase_snake_case_name UPPERCASE_SNAKE_CASE_NAME CamelCaseName")'
#<MatchData "lowercase_snake_case_name">

And the same regex in Crystal gives this:

% crystal eval 'p %r{[[[:word]]\.]+\b}.match("lowercase_snake_case_name UPPERCASE_SNAKE_CASE_NAME CamelCaseName")'
nil

This however works:

% crystal eval 'p %r{[\w\.]+\b}.match("lowercase_snake_case_name UPPERCASE_SNAKE_CASE_NAME CamelCaseName")'
Regex::MatchData("lowercase_snake_case_name")

Looks like you have an extra set of brackets and you’re missing a : from the Crystal example:

➜  ~ crystal eval 'p %r{[[:word:]\.]+\b}.match("lowercase_snake_case_name UPPERCASE_SNAKE_CASE_NAME CamelCaseName")'
Regex::MatchData("lowercase_snake_case_name")

It’s also important to remember that Ruby and Crystal use different regex libraries, so they won’t be perfectly compatible. Ruby uses Oniguruma and Crystal uses PCRE. They’re very similar, but also not the same. :slightly_smiling_face:

1 Like

Argh, mistyped code for Crystal example here. But you are right about extra brackets.

Original wording regex I was trying to port from Ruby does not work:

% crystal eval 'p %r{[[[:word:]]\.]+\b}.match("lowercase_snake_case_name UPPERCASE_SNAKE_CASE_NAME CamelCaseName")'
nil

But after removing extra [] it does:

% crystal eval 'p %r{[[:word:]\.]+\b}.match("lowercase_snake_case_name UPPERCASE_SNAKE_CASE_NAME CamelCaseName")'
Regex::MatchData("lowercase_snake_case_name")
1 Like

I recently fixed this but it seems it’s going to be included in 1.3, not 1.2: Regex: use PCRE_UCP by asterite · Pull Request #11265 · crystal-lang/crystal · GitHub

You could try to convince them to include it in 1.2. It’s a one line change and it’s pretty harmless.

2 Likes

I’ve tried matching against [[:word]] with you PR and unfortunately it still doesn’t work.

What I’ve noticed is if I escape outer brackets it works even with Crystal 1.1.1:

%r{[\[[:word:]\]\.]+\b}.match("snake_case") # => Regex::MatchData("snake_case")

Without escaping outer brackets it doesn’t work in Crystal, but works in Ruby.

%r{[[[:word:]]\.]+\b}.match("snake_case") # => nil

This looks lika a syntax difference between PCRE and Oniguruma. You can test your expressions against each regex implementation and you will see the same behaviour as in Crystal and Ruby.

Apparently, PCRE doesn’t recognize [[:word:]] as a reference to a named set inside a character class, it needs to be escaped.

Correct. I’ve found this in PCRE docs

pcrepattern specification
An opening square bracket introduces a character class, terminated by a closing square bracket. A closing square bracket on its own is not special by default. However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square bracket causes a compile-time error. If a closing square bracket is required as a member of the class, it should be the first data character in the class (after an initial circumflex, if present) or escaped with a backslash.

I’ve checked, and indeed you only need to escape the closing bracket inside a character class (if it’s not first character in a class according to PCRE docs), no need to escape an opening one:

%r{[[[:word:]\]\.]+\b}.match("snake_case") # => Regex::MatchData("snake_case")
1 Like