The Crystal Programming Language Forum

File offsets above Int32

Hi folks!

Wonder if someone could help me out a bit. I am evaluating Crystal for a number of systems and I was wondering about large file support. Firstly - I can’t find the exact definition of IO#seek and the size of the offset argument. It seems like it is Int32 which effectively limits the size I can seek to to just shy of 2GB, which is really not much. I routinely need to output/proxy files which are larger. When I look at reading files, the farthest I could get to was the PReader here: https://github.com/crystal-lang/crystal/blob/master/src/file/preader.cr and it seems that the maximum offset I can feed it is also Int32, so I won’t even be able to read from a larger file (at higher offsets) if we are going to use it. Same concern with IO.copy where the size of the count variable is not annotated. Can I at least seek manually (without PReader) beyound the Int32::MAX?

How do people actually deal with larger files / larger offsets at the moment? Or is there something I am missing?

Hi!

It’s a current limitation of the language.

See:

How do people actually deal with larger files / larger offsets at the moment?

So the answer is: they don’t. Crystal is currently not suitable for those kind of programs.

Ok, that makes it tricky indeed for what I want to do (provide a readable IO over spans/intervals). I do understand that arrays/strings/indexables larger than 2GB cannot be used at the moment (but buffer pointers can be), and that was actually fine for my use case. But the fact that I can’t seek in a larger File or an arbitrary IO seems odd. Where can I find the definition of IO#seek? Are there IO objects in the stdlib which are not files but still have a #seek implementation, and is there a change offsets for the native OS primitives will go 64bit somewhere shortly?

Or, to simplify the question: what is the minimum viable workaround if I want to provide an IO-ish object with #seek and #size which are larger than int32max? Or would I be able to bypass it if I implement everything based off of my own type definitions (having their own, larger offsets)?

I.e. I believe my question is more related to https://github.com/crystal-lang/crystal/issues/4498#issuecomment-305997822 and also to https://github.com/crystal-lang/crystal/issues/3209#issuecomment-298173102 - what is the actual type IO#pos= and IO#pos accept and return?

I.e. if on my machine sizeof(some_off_t_var) is 8, does it mean that on the Crystal side I can pass an Int64 and it is going to be transparently bridged to lseek(2)?

You can try it and see if it works.