Question about the crystal syntax highlighter

I wrote GitHub - ralsina/tartrazine: A Crystal reimplementation of the Pygments/Chroma syntax highlighters to highlight code, and because crystal has its own highlighter, I connected both things.

HOWEVER :slight_smile: it fails with some files which are valid crystal. In particular it failed with crycco/src/crycco.cr at main · ralsina/crycco · GitHub

I have reduced the failing input to this:

/#{l[""]}/
"\n"

It fails trying to parse the \n

> bin/tartrazine -f html crycco.cr
Unhandled exception: unknown token: 'n' (Crystal::SyntaxException)
  from /usr/lib/crystal/compiler/crystal/syntax/lexer.cr:2949:7 in 'raise'
  from /usr/lib/crystal/compiler/crystal/syntax/lexer.cr:2948:5 in 'raise'
  from /usr/lib/crystal/compiler/crystal/syntax/lexer.cr:2941:7 in 'unknown_token'
  from /usr/lib/crystal/compiler/crystal/syntax/lexer.cr:158:14 in 'next_token'
  from /usr/lib/crystal/crystal/syntax_highlighter.cr:87:7 in 'highlight_normal_state'
  from /usr/lib/crystal/crystal/syntax_highlighter.cr:80:11 in 'highlight_normal_state'
  from /usr/lib/crystal/crystal/syntax_highlighter.cr:14:5 in 'highlight'
  from src/lexer.cr:412:7 in 'initialize'
  from src/lexer.cr:406:5 in 'new'
  from src/lexer.cr:423:7 in 'tokenizer'
  from src/lexer.cr:422:5 in 'tokenizer'
  from src/formatters/html.cr:79:19 in 'format_text'
  from src/formatters/html.cr:57:7 in 'format'
  from src/main.cr:112:3 in '__crystal_main'
  from /usr/lib/crystal/crystal/main.cr:118:5 in 'main_user_code'
  from /usr/lib/crystal/crystal/main.cr:104:7 in 'main'
  from /usr/lib/crystal/crystal/main.cr:130:3 in 'main'
  from /usr/lib/libc.so.6 in '??'
  from /usr/lib/libc.so.6 in '__libc_start_main'
  from bin/tartrazine in '_start'
  from ???

I can produce a minimal program calling the syntax highlighter for a self-contained test if needed

You can also see it in the interpreter, notice how it doesn’t highlight the bad lines:

image

This is quite bizarre. The parser is obviously able to parse it and interpret the code. But when parsed for the purpose of syntax highlighting, it fails… Both use cases should usually have the same result :thinking:

I am guessing the nesting of weird and unusual states (a string inside brackets inside an interpolation in a regex) is making the state of the highlighter break, but I looked and zero idea how to fix it :person_shrugging:

Yeah, looks like the syntax highlighter does not properly reset delimiter_state.

This diff fixes it, although I’m not sure if that’s the correct solution or breaks something else:

diff --git i/src/crystal/syntax_highlighter.cr w/src/crystal/syntax_highlighter.cr
index 1d4abcb60..a7794e96a 100644
--- i/src/crystal/syntax_highlighter.cr
+++ w/src/crystal/syntax_highlighter.cr
@@ -84,6 +84,8 @@ abstract class Crystal::SyntaxHighlighter
     space_before = false

     while true
+      previous_delimiter_state = lexer.token.delimiter_state
+
       token = lexer.next_token

       case token.type
@@ -105,6 +107,7 @@ abstract class Crystal::SyntaxHighlighter
           highlight_token token, last_is_def
         else
           highlight_delimiter_state lexer, token
+          token.delimiter_state = previous_delimiter_state
         end
       when .string_array_start?, .symbol_array_start?
         highlight_string_array lexer, token

Fix: Fix `SyntaxHihglighter` delimiter state by straight-shoota · Pull Request #15104 · crystal-lang/crystal · GitHub

1 Like