Hello,
I would like to know if there is a way to interpolate a regex at compile time ?
I would like to do this:
macro interpolate_regex(regex)
/\A#{ {{regex}} }/
end
interpolate_regex(/foo/) # => /\A(?-imsx:foo)/
interpolate_regex(/foo/ix) # => /\A(?ix-ms:foo)/
interpolate_regex(/\n\/foo/) # => /\A(?-imsx:\n\/foo)/
But with interpolation happening at compile time.
I tried this:
macro interpolate_regex(regex)
/\A{{regex.source.id}}/
end
But doesn’t work if regex have options or escapes.
interpolate_regex(/foo/) # => /\Afoo/
interpolate_regex(/foo/ix) # => /\Afoo/
interpolate_regex(/\n\/foo/) # => unknown regex option: f (Expanded /\A\n/foo/ )
The best I could have is that:
macro interpolate_regex(regex)
{%
str = "(?"
str += 'i' if regex.options.includes?(:i)
str += "ms" if regex.options.includes?(:m)
str += 'x' if regex.options.includes?(:x)
str += '-'
str += 'i' unless regex.options.includes?(:i)
str += "ms" unless regex.options.includes?(:m)
str += 'x' unless regex.options.includes?(:x)
str += ':'
str += regex.source
str += ')'
%}
/\A{{str.id}}/
end
But this still doesn’t work in the case of /\n\/foo/
(the /
doesn’t get escaped). I wonder if there is a shorter/cleaner way?
I think it could be nice to have something like RegexLiteral#to_s
, that give a regex ready to interpolate.
FWIW might be worth doing some benchmarks to see if this would even be beneficial. PCRE does pattern compilation/caching itself, so majority of the performance boost might come from that. Plus LLVM might just be smart enough to optimize it to a more performant regex even before it gets to PCRE.
1 Like
Yeah, I wouldn’t expect any relevant performance gains from macro expansion. But there might be other reasons than optimization.
RegexLiteral#source
returns the raw string describing the regular expression. It is not aware of the syntax for expressing regular expressions in Crystal (delimited by forward slashes). If you want to manually embed it in such a literal, you need to escape the delimiter (a simple .gsub(/\//, "\\/")
should probably do).
I suppose the macro language could support interpolation in regex literals as well. Then you could implement it like this:
macro interpolate_regex(regex)
{{ /\A#{ regex }/ }}
end
This could be a feature request. String interpolation already works like that in the macro language.
1 Like
Thanks for the answers, the .gsub(/\//, "\\/")
is really helpful!
The performance difference is huge if you consider only comparison between doing the interpolation and doing nothing. However the difference is still non-negligible is you count a matching phase after.
Anyway, it add an extra cost to does the interpolation and call Regex#to_s
, which could be negligible or not depending of what you does aside.
benchmark:
require "benchmark"
macro runtime(regex)
/foo#{ {{regex}} }/
end
macro compiletime(regex)
/foo(?-imsx:{{regex.source.id}})/
end
p runtime(/bar/) # => /foo(?-imsx:bar)/
p compiletime(/bar/) # => /foo(?-imsx:bar)/
Benchmark.ips do |x|
x.report("runtime interpolation") { runtime(/bar/) }
x.report("compiletime interpolation") { compiletime(/bar/) }
end
# the difference is huge because it compare almost nothing with something.
short_text = "foobar"
Benchmark.ips do |x|
x.report("short runtime interpolation + text matching") { runtime(/bar/) =~ short_text }
x.report("short compiletime interpolation + text matching") { compiletime(/bar/) =~ short_text }
end
long_text = "foobaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaar"
Benchmark.ips do |x|
x.report("long runtime interpolation + text matching") { runtime(/baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaar/) =~ long_text }
x.report("long compiletime interpolation + text matching") { compiletime(/baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaar/) =~ long_text }
end
results:
runtime interpolation 120.55k ( 8.30µs) (±24.10%) 9.13kB/op 2384.45× slower
compiletime interpolation 287.45M ( 3.48ns) (±19.54%) 0.0B/op fastest
short runtime interpolation + text matching 99.00k ( 10.10µs) (±22.38%) 9.14kB/op 110.90× slower
short compiletime interpolation + text matching 10.98M ( 91.08ns) (±20.92%) 16.0B/op fastest
long runtime interpolation + text matching 57.96k ( 17.25µs) (±20.53%) 10.0kB/op 116.47× slower
long compiletime interpolation + text matching 6.75M (148.12ns) (±20.68%) 16.0B/op fastest
My original usecase (for probably my next shard!
), is to provide a parser API, in which user could use regex. For a simple json parser, passing from /\A{{regex.source.id}}/
to /\A#{ regex }/
make the performance x60 worst.
I don’t know is I had the only usecase, anyway with the gsub
it become possible, and it’s totally fine to me. Crystal is definitively awesome!
1 Like
Could you cache the interpolated regex? It only works faster if you write /foo/
because the compiler actually caches that regex in a hidden constant. So if you assign that regex somewhere and reuse that, the performance difference should be negligible.
Oh, that interesting!, thanks.
Yes in effect once cached, there no performance difference at all.
The benefit to have compile time interpolation would be still to have the regex cached automatically (without even knowing that!) instead of doing it manually, but it’s marginal.