Strange output

Code presented here is just test code, not real code.

Crystal 0.34.0 [4401e90f0] (2020-04-06)

LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu

Trying to remove comments from some Javascript code but getting some funny characters in the output and thus the compressed Javascript code is invalid specifically in this example
the last line with the rsingleTag statement.

rsingleTag = /^<(\w+)\s*\/?>(?:<\/\1>|)$/,

which incorrectly is written as (on my system at least)

rsingleTag = /^<(w+)s*/?>(?:</>|)$/,

Difficult to see the difference but the part near the end should read (?:</\1>|)$/,

I think it has to do with Unicode but not sure.

Example code

def ltrim(s : String) : String
  x = s.gsub(/^\s*/, "")
  return x
end

def rtrim(s : String) : String
  x = s.gsub(/\s*$/, "")
  return x
end

def trim(s : String) : String
  x = ltrim(s)
  x = rtrim(x)
  return x
end

def remove_comments(input : Array(String) ) : Array(String)

  multiline = false
  lines = [] of String

  (0..input.size()-1).each do |i|
    line = trim(input[i])

    #
    # multiline comments ?
    #
    if line =~ /^\/\*/ && line !~ /\/\*.+\*\//
      multiline = true
    end

    if line =~ /^\*\//
      multiline = false
    end

    #
    # single line comments
    #
    if multiline == false
      if line =~ /\/\*.+\*\//
 
        x = line.gsub(/\/\*.+\*\//,"")
        if x.size > 0
          lines << x
        end

      # //
      elsif line =~ /^\/\/.+/
        x = line.gsub(/^\/\/.+/,"")
        if x.size > 0
          lines << x
        end

      elsif multiline == false
        # debug
        if line =~ /rsingleTag/
          puts "line #{line}"
        end

        if line.size > 0 && line !~ /^\*\//
          lines << line
        end
      end
    end
  end # each

  return lines
end

s =<<-EOT
/*!
 * Multi line comments
 * Date: 2013-2-4
 */
(function( window, undefined ) {

// Can't do this because several apps including ASP.NET trace
// the stack via arguments.caller.callee and Firefox dies if
// you try to trace through "use strict" call chains. (#13335)
// Support: Firefox 18+
//"use strict";
var
        // Used for splitting on whitespace
        core_rnotwhite = /\S+/g,
        // Make sure we trim BOM and NBSP (here's looking at you, Safari 5.0 and IE)
        rtrim = /^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,
        // A simple way to check for HTML strings
        // Prioritize #id over <tag> to avoid XSS via location.hash (#9521)
        // Strict HTML recognition (#11290: must start with <)
        rquickExpr = /^(?:(<[\w\W]+>)[^>]*|#([\w-]*))$/,
        // Match a standalone tag
        rsingleTag = /^<(\w+)\s*\/?>(?:<\/\1>|)$/,
EOT

lines = s.split("\n")
newlines = remove_comments(lines)
File.write("javascript.js",newlines.join("\n"))

A heredoc generally allows interpolation and escapes.

Things are just getting escaped when you don’t want them to.

To denote a heredoc without interpolation or escapes, the opening heredoc identifier is enclosed in single quotes

Try doing like:

s = <<-'EOT'
  ...
EOT

https://play.crystal-lang.org/#/r/9nkx

EDIT: Also the trim methods already exist, but as rstrip, lstrip, and strip. See String - Crystal 1.10.1.