Handling the PCRE to PCRE2 transition and versioning

Hey, folks.

This is a summary of the discussion held on the Crystal Discord.

Right to the problem. It often feels that every other minor release is a breaking one. To put it right, that’s probably okay, given the size of the core team, which can’t test every possible case, but this an example of a library that works with Crystal 1.0, but not 1.7 (I believe that the community can provide even more such examples). This is a situation that ideally should never happen in a stable language.

I think in some ways the community and I in particular have not been paying attention to this problem because:

  • Crystal is a good language.

  • Breakages are usually tiny.

But there’s an even bigger change on the horizon: the PCRE to PCRE2 transition. Overall, it is well explained in the blog post.

What doesn’t feel right is that the blog post literally says, “There are some small differences which can cause breaking changes” but despite this, the change is planned in a minor release (1.8).

This seems even stranger given there is already a huge regression in the standard library. At least in this case there’s a ~50x decrease in performance and a ~1250x increase in memory usage. Doesn’t seem harmless, right? How many of such potential undiscovered cases are there (especially in external libraries)?

Another problem is the set time frame. The above blog post appeared just a month before 1.8, which is supposed to include PCRE2. However, it was first mentioned in the middle of this blog post, but without a clear date and really, I think it should have been a separate blog post from the beginning.

Overall, in my opinion, the situation is very alarming. Crystal is already a relatively niche language. I am currently using it in production, but given that any minor release can potentially be a major release, I may have to reconsider my decision.

Additionally:

Possible solutions:

  • PCRE1 support should remain guaranteed through any 1.x release, but upon 2.0.x it should no longer be guaranteed and the default be switched to PCRE2. Let’s allow some more time for the community to try this change, find what the effects are on their projects and adapt. (Suggested by an Amber maintainer)

  • Crystal should adopt something similar to LTS releases (as Node does) or editions (as Rust does) and clearly state that breaking changes are possible in releases beyond them.

  • ?

(post deleted by author)

IMO this is a bit of fear mongering. Yes it was a large decrease in performance, however there’s a pr that address the bulk of it, but of course is still more room for improvement. I just think panicking that I now need to test every regex i use in my application for performance regressions is a bit extreme.

EDIT: To be clear, it is possible that particular regexes may have issues going from PCRE to PCRE2, but saying that “this upgrade will make my application/regex 50x slower and use 1250x more memory” just isn’t true.

3 Likes

Thanks again, Sanks for transferring this from the Discord chat.

I think in this format we can have a more structured approach to the discussion.

Let me first start that the plan from Heads up: Crystal is upgrading its Regex engine - The Crystal Programming Language is not set in stone come what may. If we decide it’s necessary to change it, we can do that.

At this point, I don’t see a reason to do so. I think the plan is good.

Some background discussions on this topic can be found in Upgrade `Regex` engine · Issue #11331 · crystal-lang/crystal · GitHub, Add support for PCRE2 · Issue #12790 · crystal-lang/crystal · GitHub and Updating the syntax of regex literals · Issue #12857 · crystal-lang/crystal · GitHub

  1. The performance regression (Regex performance regression on PCRE2 · Issue #13144 · crystal-lang/crystal · GitHub) is already fixed. I expect it to be merged into master and available in the nightly builds shortly. This was merely a mistake in the bindings and neither a general problem with PCRE2 nor with the transition of user projects. It’s resolved and that should be everything about it.

  2. I think we have enough time before the release of 1.8 to fix any eventual other issues that might arise. That’s exactly why we have nightly releases, so people can test it and report any problems. If something comes up that the time frame doesn’t suffice, we will postpone.

    We don’t expect many Crystal projects to face any problems. If there are any, please let us know.
    I imagine most developers won’t even notice the transition.
    And again, that’s why we have nightly releases and a test period.

  3. In some way we’re forced to take action, and I think it’s better sooner than later. libpcre has reached end of life and is already starting to disappear from distributions’ package managers. There’s no good way forward with it unless we’d take the burdon of maintaining and distributing the library ourselves.

    We intend to maintain backwards compatibility in minor releases. But one could consider this a necessary bug fix for updating a stale and unsupported dependency to a supported version. Bug fixes can break things that depend on the specific behaviour that’s being fixed. Sometimes there’s just no way around that. We expect and hope the impact to be minimal, but we can’t be sure about that at this point.

    Of course taking this step now and in the way we’re planning to do, is not inescapable. We could do it differently. But this is the best path of action we could come up with.

  4. I understand versions are more or less just names. It’s usually a good idea to have some semantics embedded in it so you can get some basic understanding of a version by its name.
    But Crystal does not explicitly follow SemVer. It’s a nice concept, but I don’t think anyone can actually follow it in it’s strictest sense. And when you’re loosely following it, it’s a question of how lose.

    Would it make any difference if we called the upcoming release 2.0 instead of 1.8? I don’t think so. The result would still be the same: Some (few) programs may experience issues with PCRE2, others will work as before. If you want your broken program to support the new version with PCRE2, you need to update it.
    In the scope of the entire language, the PCRE2 transition is a rather small change. Attributing that with the name of a 2.0 release doesn’t feel right. It would communicate a major step in the language, when it’s actually just a version update of the regex engine, with (expectedly) only minor consequences. The label Crystal 2.0 has long been taken as an understanding of major changes to the language. Abandoning that would break established expectations: not about specifics of what will be in 2.0 but about the general theme and significance.

    As a reference for other projects that transitioned from PCRE to PCRE2, we examined the case of PHP who also did this in a minor release: PCRE to PCRE2 migration - PHP 7.3 • PHP.Watch As far as our knowledge goes, the migration wasn’t a big hussle for PHP projects. And regular expressions are pretty heavily used in PHP, much more than in typical Crystal projects.

That all being said, I’m open to talk about improving to the migration plan. It’s most likely not perfect ;)

And most important is the appeal to everyone: Please test your code with the current nightly builds to get an understanding of PCRE2 compatibility and report any issues on GitHub. Thank you for your help :bowing_man:

11 Likes

I would like to add a few words to the excellent exposition of straight-shoota.

First of all, thanks @Sanks for bringing the discussion. Constructive criticism is what makes us better and stronger.

I want to further explain why we can’t have major releases at every possible little breaking change: we do want to give proper support for major releases. That is, once we release to 2.0.0, we expect to continue maintaining a 1.x branch. This is a significant amount of work. To be frank, just maintaining Crystal’s presence on myriads of supported platforms currently takes a huge amount of work, work that is not used to grow the language and its ecosystem. Adding to that is unthinkable.

Therefore, we want to tag 2.0.0 when a bigger change occur. We feel this is not one such case, and we bake our decision on what PHP did. And staying with PCRE is not an option, because we will end up having to maintain PCRE ourselves.

That said, if we see that more time is needed, we can postpone the change for one more release. The month we have until 1.8 can serve to test. At this point —after the supposedly fixed perf regression— we do not expect many things to break, but we could be wrong.

All of what’s been said here serves to justify the PCRE → PCRE2 move. Looking at the more general trend of the project, I’d like to see that any doubt about the project stability is vanished. There are several flags added in the past releases that attest to this. Yet as humans, we might have introduced unwillingly a breaking change. We try to be fast about fixing those once they’re reported.

As a side note, in a rolling release it doesn’t make sense to send a patch release just before a minor. If the previous minor broke something, it’s the same work to pin the previous minor than the patch release.

PS: Sanks, I could compile crysterm successfully in master, so probably something got fixed in between.

8 Likes