Interpolate regex at compile time

I3oris · May 24, 2022, 8:22pm

Hello,

I would like to know if there is a way to interpolate a regex at compile time ?

I would like to do this:

macro interpolate_regex(regex)
  /\A#{ {{regex}} }/
end
interpolate_regex(/foo/)          # => /\A(?-imsx:foo)/
interpolate_regex(/foo/ix) # => /\A(?ix-ms:foo)/
interpolate_regex(/\n\/foo/) # => /\A(?-imsx:\n\/foo)/

But with interpolation happening at compile time.

I tried this:

macro interpolate_regex(regex)
  /\A{{regex.source.id}}/
end

But doesn’t work if regex have options or escapes.

interpolate_regex(/foo/) # => /\Afoo/
interpolate_regex(/foo/ix) # => /\Afoo/
interpolate_regex(/\n\/foo/) # => unknown regex option: f (Expanded /\A\n/foo/ )

The best I could have is that:

macro interpolate_regex(regex)
    {%
      str = "(?"
      str += 'i' if regex.options.includes?(:i)
      str += "ms" if regex.options.includes?(:m)
      str += 'x' if regex.options.includes?(:x)
      str += '-'
      str += 'i' unless regex.options.includes?(:i)
      str += "ms" unless regex.options.includes?(:m)
      str += 'x' unless regex.options.includes?(:x)
      str += ':'
      str += regex.source
      str += ')'
    %}
    /\A{{str.id}}/
  end

But this still doesn’t work in the case of /\n\/foo/ (the / doesn’t get escaped). I wonder if there is a shorter/cleaner way?

I think it could be nice to have something like RegexLiteral#to_s, that give a regex ready to interpolate.

Blacksmoke16 · May 25, 2022, 12:39am

FWIW might be worth doing some benchmarks to see if this would even be beneficial. PCRE does pattern compilation/caching itself, so majority of the performance boost might come from that. Plus LLVM might just be smart enough to optimize it to a more performant regex even before it gets to PCRE.

straight-shoota · May 25, 2022, 10:35am

Yeah, I wouldn’t expect any relevant performance gains from macro expansion. But there might be other reasons than optimization.

RegexLiteral#source returns the raw string describing the regular expression. It is not aware of the syntax for expressing regular expressions in Crystal (delimited by forward slashes). If you want to manually embed it in such a literal, you need to escape the delimiter (a simple .gsub(/\//, "\\/") should probably do).

I suppose the macro language could support interpolation in regex literals as well. Then you could implement it like this:

macro interpolate_regex(regex)
  {{ /\A#{ regex }/ }}
end

This could be a feature request. String interpolation already works like that in the macro language.

I3oris · May 25, 2022, 6:49pm

Thanks for the answers, the .gsub(/\//, "\\/") is really helpful!

The performance difference is huge if you consider only comparison between doing the interpolation and doing nothing. However the difference is still non-negligible is you count a matching phase after.

Anyway, it add an extra cost to does the interpolation and call Regex#to_s, which could be negligible or not depending of what you does aside.

benchmark:

require "benchmark"

macro runtime(regex)
  /foo#{ {{regex}} }/
end

macro compiletime(regex)
  /foo(?-imsx:{{regex.source.id}})/
end

p runtime(/bar/)     # => /foo(?-imsx:bar)/
p compiletime(/bar/) # => /foo(?-imsx:bar)/

Benchmark.ips do |x|
  x.report("runtime interpolation") { runtime(/bar/) }
  x.report("compiletime interpolation") { compiletime(/bar/) }
end
# the difference is huge because it compare almost nothing with something.

short_text = "foobar"
Benchmark.ips do |x|
  x.report("short runtime interpolation + text matching") { runtime(/bar/) =~ short_text }
  x.report("short compiletime interpolation + text matching") { compiletime(/bar/) =~ short_text }
end

long_text = "foobaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaar"
Benchmark.ips do |x|
  x.report("long runtime interpolation + text matching") { runtime(/baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaar/) =~ long_text }
  x.report("long compiletime interpolation + text matching") { compiletime(/baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaar/) =~ long_text }
end

results:

    runtime interpolation 120.55k (  8.30µs) (±24.10%)  9.13kB/op  2384.45× slower
compiletime interpolation 287.45M (  3.48ns) (±19.54%)    0.0B/op          fastest
    short runtime interpolation + text matching  99.00k ( 10.10µs) (±22.38%)  9.14kB/op  110.90× slower
short compiletime interpolation + text matching  10.98M ( 91.08ns) (±20.92%)   16.0B/op         fastest
    long runtime interpolation + text matching  57.96k ( 17.25µs) (±20.53%)  10.0kB/op  116.47× slower
long compiletime interpolation + text matching   6.75M (148.12ns) (±20.68%)   16.0B/op         fastest

My original usecase (for probably my next shard! ), is to provide a parser API, in which user could use regex. For a simple json parser, passing from /\A{{regex.source.id}}/ to /\A#{ regex }/ make the performance x60 worst.

I don’t know is I had the only usecase, anyway with the gsub it become possible, and it’s totally fine to me. Crystal is definitively awesome!

asterite · May 25, 2022, 7:57pm

Could you cache the interpolated regex? It only works faster if you write /foo/ because the compiler actually caches that regex in a hidden constant. So if you assign that regex somewhere and reuse that, the performance difference should be negligible.

I3oris · May 25, 2022, 8:26pm

Oh, that interesting!, thanks.

Yes in effect once cached, there no performance difference at all.

The benefit to have compile time interpolation would be still to have the regex cached automatically (without even knowing that!) instead of doing it manually, but it’s marginal.

Topic		Replies	Views
AST Node Interpolation outside of macro's Help & Support	4	274	November 2, 2021
Generic compile time assertions Help & Support	2	381	June 20, 2022
New translation API for Crystal handles interpolated strings, allowing translations to change the order of interpolated expressions News	5	516	December 29, 2020
Interpolate string to make a hex value Help & Support	18	587	April 14, 2021
There is a problem with escape regular expression meta characters in the macro code Help & Support	2	344	August 10, 2019

Interpolate regex at compile time

benchmark:

results:

Related topics