`Char#ord` at compile time

Is there a method so obtain a Char's codepoint at compile time for usage within an Enum?

While it’s possible to assign this to a constant, the same does not appear possible when assigning values to Enum members both directly and within a macro context.

# Works
A = 'A'.ord
 
# Does not
enum Example
  A = 'A'.ord
end
 
# Also does not
enum Example
  A = {{'A'.ord}}
end

Before I go diving into a way to add support for this, is there something blatantly obvious that I’m missing?

I looked through the docs and didn’t see anything like this. https://crystal-lang.org/api/0.35.1/Crystal/Macros/CharLiteral.html

If push comes to shove you can use the run macro to do this. You just make a simple crystal program that takes in a char and returns a codepoint.

puts ARGV[0][0].ord

However, doing this would be slow.

You may just want to generate this code instead, you can do this by making a macro in a file that runs a crystal program using run and calling a program that outputs the code to a file and requires it. Just make sure that crystal program outputs nothing to STDOUT, just writes to the file.

Macro File

require "./generated_file"

{{run "./src/my_code_generator"}}

Code Generator

File.open("./src/generated_file.cr") do |file|
 file.puts "enum MyEnum"
  'A'..'Z'.each do |char|
    file.puts "#{char} = #{char.ord}"
  end
 file.puts "end"
end

I didn’t test this exactly so it may not work 100% but the general idea is there.

Hi Kim!

Why do you need this?

Ahoy-hoy Ary!

The use case is protocol implementation. There’s a recurring pattern across a number of vendors to use ascii chars (presumably for readability / typability) as the basis of a socket API. This may be something like:

SOH <command> STX <data>, ... ETX

where <command> is a single byte.

The ideal way of modelling this is an Enum as it provides neat serialisation and deserialisation, while keeping type safety. This is all do-able, but requires a manual conversion to the byte value.

enum Command : UInt8
  Foo = 0x61 # 'a'
  Bar = 0x62 # 'b'
  # ...
end

Not a major issue, but given all vendor docs will refer to the character based values, it just an extra layer of conversation that a human (or external tool) is needing to do.

This complexity then levels up if you say move to a two byte <command> field that you may want to pack into a UInt16.

enum Command : UInt16
  Foo = 0x6178 # 'a', 'x'
  Bar = 0x6279 # 'b', 'y'
  # ...
end

A solution that would solve both of these is a CharLiteral#ord that spits out a NumberLiteral. This would provide the ability to do any bitpacking within a macro context. Alternatively, allowing expressions that assign enum values to evaluate the same way constants elsewhere. Ideally both!

Happy to dive into either of these with a PR, but keen to ensure I’m not missing something first.

I see, that makes sense! I’ll trying making all of that possible, after all chars and integers are very closely related. I guess for the last case where you need multiple chars -> bytes you can use a macro.

1 Like

Actually, nevermind. I won’t be adding new features to the language without discussing them first. So this might not happen at all.

Your best bet is to write a separate program that generates the code that you later will past into the real program. I guess these enums won’t change? Then it’s okay to do a separate, throw-away script for this. Alternatively you can use “macro run” but maybe it’s a bit too much.

1 Like

That’s fair. I’ll shift this over to an issue for discussion if / when appropriate.

I think I can see a couple of places where it should be possible implement too (for both the macro and compiler semantic within the Enum). Trying to get a little more familiar with the compiler internals so even if a PR doesn’t end up being the appropriate action, it’s an interesting mental exercise.

The problem is that the more that is added to Macros, the more we are creating a second Crystal inside Crystal.

3 Likes

Agree with @sol.vin. After experimenting with https://github.com/j8r/crystal-object-send, I start to think it might me possible to have “self-hosted” macro interpreter: a macro interpreter, using code generated with macros.

1 Like

Good point on macro sprawl. I’ve spun up an issue relating to this. To save repeating the information in multiple places: https://github.com/crystal-lang/crystal/issues/9830.

Rather than blowing up the macro language as the solution to anything, can’t we just allow enum Foo : Char here?

enum Foo : Char
  A = 'a'
  B = 'b'

  def ==(other : UInt8)
    value.ord == other
  end
end
2 Likes

Yeah, that’s a good idea. I’ll try to see what problems come from implementing that, but it might work.

That works for the single byte case, but leads to issues if there’s a need to bitpack multiple chars into a larger unsigned int.

IMHO the elegant solution is the byte literal syntax discussed on the linked issue as this allows expression as number tokens and usage with the existing MathInterpreter implementation as well as standalone.

Can’t you just use a UInt16? Using << and >> to shift bits is pretty simple and useful. You can also always use a BitField implementation like @elorest’s -> https://github.com/elorest/bitfields.

At this point I think the initial use case seems pretty contrived or unique to merit changing the language just for it. Especially when you can hardcode the values once.

1 Like

I agree, macros are pretty limiting anyways since we don’t even have macro level variables, which makes it really hard to manipulate data in the macro. If someone needs more functionality, using run or making a code generator seems to me a better way over all, even if it is going to be slow in certain use cases.

Actually, I said “contrived” but that’s not the correct word, sorry. I guess it’s a pretty unique use case and we have to wait a bit more until more such use cases arise to decide to change the language.

No probs. If additional use cases present and things tip in favour of having language level support for those literals, let me know and I’ll assemble a PR. Hardcoding with a comment is more that workable until then.

Old thread, but here’s my two cents:

macro char_enum(name,*defs)
  enum {{name.id}} : UInt8
    {% for d in defs %}
      {% for c, i in " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~".chars %}
        {% if c == d.value %}
          {{d.target}} = {{i+32}}
        {% end %}
      {% end %}
    {% end %}
  end
  {%debug%}
end
 
char_enum MyEnum, A='a', B='b'
 
x = MyEnum::A
 
p! x, x.value

or

macro char_enum(name,defs)
  enum {{name.id}} : UInt8
    {% for d in defs.expressions %}
      {% for c, i in " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~".chars %}
        {% if c == d.value %}
          {{d.target}} = {{i+32}}
        {% end %}
      {% end %}
    {% end %}
  end
  {%debug%}
end
 
char_enum MyEnum, begin
 A='a'
 B='b'
end
 
x = MyEnum::A
 
p! x, x.value

Not nice as enum MyEnum : Char would be, but usable.
Can be adapted for any power of two number of characters fitting in some integer type, looping over d.value characters instead of using it directly.
An improvement would be allowing both char and byte values.
A real implementation would have some sanity checking of the arguments to provide friendly error messages.
edit: fixed obvious error (off by 32…) – and check that string before using.

1 Like

Nice! I ended up with a similar approach: inactive-support/mapped_enum.cr at master · spider-gazelle/inactive-support · GitHub.

This allows for enums over arbitrary types with compile-time safety if the values are resolvable within a macro context.

2 Likes