Crystal equivalent of Ruby's open(url).read[].unpack?

I’m porting some Ruby code to Crystal which uses Ruby’s Kernel#open (alias to URI#open) to open an image url, read a few bytes and unpack them.

open(url).read[0x10..0x18].unpack('NN')
open(url).read[6..10].unpack('SS')

I don’t think that code is directly portable to Crystal, though I believe it’s possible.

Any ideas what I should use in Crystal?

Here’s more of the Ruby code for better context:

def img_width_height(url)
  fail ArgumentError, 'url is nil' unless url
  begin
    case url
    when /png\z/
      open(url).read[0x10..0x18].unpack('NN')
    when /gif\z/
      open(url).read[6..10].unpack('SS')
    else
      FastImage.size(url)
    end
  rescue => e
    @logger.warn('BlogLibraryBuilder#img_width_height') { "Unable to get image width/height. error_message=#{e.message}, url=#{url}" }
    nil
  end
end

See https://github.com/crystal-lang/crystal/wiki/FAQ#user-content-is-there-an-equivalent-to-rubys-arraypackstringunpack.

I’d like someone to edit that page because I don’t think our approach is superior.

In fact, I think we can implement pack and unpack as macros, which has this added benefits:

  • type safe
  • the format is validated at compile-time
  • it’s much more compact to write

I agree that Ruby’s way is cryptic, but it’s usually the case that you write that format string once and you never see it again unless the protocol changes, which is usually rare.

@ejstembler Here’s how you solve your problem for the PNG case:

url = "./hello.png"

File.open(url) do |file|
  file.skip(0x10)
  width = io.read_bytes(UInt32, IO::ByteFormat::NetworkEndian)
  height = io.read_bytes(UInt32, IO::ByteFormat::NetworkEndian)
  p! width, height
end

And here’s a way we could do it by introducing IO.unpack:

class IO
  macro unpack(io, format)
    {
      {% for char in format.chars %}
        {% if char == 'N' %}
          {{io}}.read_bytes(UInt32, IO::ByteFormat::NetworkEndian),
        {% else %}
          {% raise "unknown format char: #{char}" %}
        {% end %}
      {% end %}
    }
  end
end

url = "./hello.png"

File.open(url) do |file|
  file.skip(0x10)
  width, height = IO.unpack(file, "NN")
  p! width, height
end

If someone is up for the challenge, it would be really great the have Ruby’s pack and unpack as IO.pack and IO.unpack macros. It would be really convenient to have the same rules (the same chars) if possible.

3 Likes

open-uri makes an HTTP request.

require "http"
HTTP::Client.get(ARGV[0]) do |response|
  io = response.body_io
  io.skip(0x10)
  width = io.read_bytes(UInt32, IO::ByteFormat::NetworkEndian)
  height = io.read_bytes(UInt32, IO::ByteFormat::NetworkEndian)
  p! width, height
end

I think the unpack macro would regularly trip people up by the format argument having to be string literal rather than passing a string from somewhere.

I like that our IO focused binary decoding interface nudges people to working with IOs directly. The OP’s example is a great one showing how Ruby’s approach lead to people reading much more data into memory than necessary. Yes, IO.unpack could still nudge towards IO, but then does Tuple#pack or whatever the inverse would be? I prefer Crystal’s current approach, verbosity is not evil here, and it nudges people nicely towards solutions that only read what’s needed into memory.

2 Likes

Well, in OP’s example they were string literals :slight_smile:

I think the usual case is that there’s a protocol and how you have to write things is pretty match “hardcoded” and so that string would be hardcoded too.

The OP’s example is a great one showing how Ruby’s approach lead to people reading much more data into memory than necessary

That’s true, although in Ruby you could also read just the necessary amount before unpacking it.

Yes, IO.unpack could still nudge towards IO , but then does Tuple#pack or whatever the inverse would be?

It would be:

module IO
  macro pack(io, format, *args)
  end
end

so that you pass the format and then a variable number of arguments that should match the format. There’s no need to even create a tuple or array at runtime!

If I have time I’ll try to play with this idea… though I’m almost sure nobody would like the “cryptic” format… maybe it can work as a shard, though.

https://github.com/j8r/crystalizer can be used to deserialize the Bytes to an object, in the same way as JSON or YAML. Some features/annotations are missing to tell the position of a byte element in the bytes payload (not necessary for integers).

require "crystalizer/byte_format"

record Dimensions, width : UInt32, height : UInt32

io = IO::Memory.new(Bytes[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3])
# io.skip(0x10)
dimensions = Crystalizer::ByteFormat.new(io, IO::ByteFormat::NetworkEndian).deserialize to: Dimensions
puts dimensions  #=> Dimensions(@width=66051, @height=66051)

Well I said regularly, not most often :) I don’t think it’s far fetched for a protocol having a similar structure in multiple spots and people being inclined to extract that to constants or local variables for example. Or just do that to name things, like you would do with magic numbers, they’re very much like magic values!

I think a shard is a great place to explore this. We can always decide to pull a very popular shard into stdlib or ship it with the compiler.

2 Likes

Lo and behold: GitHub - HertzDevil/pack.cr: Crystal compile-time (un)pack macros from Perl / Ruby

Packing into an IO is done by Pack.pack_to, whereas .pack uses a temporary Bytes-based builder that is as compact as possible. Unpacking directly from an IO is not implemented yet; in fact, neither Perl nor Ruby has a similar capability. Note that the X and @ directives require seekable IOs in both directions.

There are currently two huge design difference between this library and Ruby / Perl. The first is that every repeat count or glob will correspond to exactly one argument or return value, so the Crystal values are never flattened:

# Crystal
buffer = Pack.pack("Lc*", 1, Int8[2, 3, 4, 5]) # => Bytes[1, 0, 0, 0, 2, 3, 4, 5]
Pack.unpack(buffer, "Lc*")                     # => {1, Int8[2, 3, 4, 5]}

Pack.pack("Lc4", 1, {2_i8, 3_i8, 4_i8, 5_i8})  # => Bytes[1, 0, 0, 0, 2, 3, 4, 5]
Pack.pack("Lc4", 1, Int8.slice(2, 3, 4, 5))    # => Bytes[1, 0, 0, 0, 2, 3, 4, 5]
Pack.pack("Lc4", 1, (2_i8..))                  # => Bytes[1, 0, 0, 0, 2, 3, 4, 5]
# Ruby
buffer = [1, 2, 3, 4, 5].pack("Lc*") # => "\x01\x00\x00\x00\x02\x03\x04\x05"
buffer.unpack("Lc*")                 # => [1, 2, 3, 4, 5]

[1, [2, 3, 4, 5]].pack("Lc*")        # TypeError (no implicit conversion of Array into Integer)
[1, *[2, 3, 4, 5]].pack("Lc*")       # => "\x01\x00\x00\x00\x02\x03\x04\x05"

For unpacking it’s to avoid creating very long Tuples from simple formats like c256. For packing it’s to maintain round-trip conversions and also to work around the inability to splat arbitrary containers (you can splat arrays in Ruby and you can most certainly “splat” lists in Perl even when you don’t ask for them). This means for us cccc and c4 will represent entirely different things.

The second difference is that unpacking a / A / Z results in a Bytes instead of String, because they say nothing about the string encoding of the byte sequences. Packing strings directly with those directives will probably still be allowed, via to_slice. In contrast, U produces a Char or a String depending on the count’s presence, and the result is always valid UTF-8.

6 Likes