Built-in support for mmap?

I was reading through @HertzDevil’s `File.read`ing a file bigger than 2GB results in an `OverflowError` · Issue #14485 · crystal-lang/crystal · GitHub, and it got me wondering if it’s possible to bake-in mmap support underneath the File class. I can imagine an interface like File.open(..., memory_mapped: true) { |f|, and the underlying object that is yielded would automatically move the memory mapped window as needed. If so, what would that look like? I feel like an interface like this would be very useful and allow more people to more conveniently use memory mapped files.

It’s been a long time since I’ve played with mmap’d files, and I had only done it on Windows, so I’m not 100% sure how it would work. I recall the Windows APIs were very easy to use, but I don’t know if the POSIX ones are as easy or not.

Implementation is not hard as you can see from the examples in `File.read`ing a file bigger than 2GB results in an `OverflowError` · Issue #14485 · crystal-lang/crystal · GitHub

The Windows implementation is even a bit more verbose than on Linux.

Of course there may be more platform-specific details to address for a stable stdlib implementation.

And then there’s the question about designing a proper API.

I don’t think the intention to memory map needs to be expressed in the constructor / open call. You can always map any open file.

A memory mapping cannot be adequately exposed in a File instance. It’s just a pointer and a size (i.e. Bytes). Its lifetime doesn’t even depends on that of the file descriptor. However we need a special destructor to unmap it (this would probably require registering a custom finalizer for this specific object with GC.add_finalizer).

Then there are many more use cases for memory mapping than just an equivalent to reading a file’s content into memory. Changes can be persisted back. Memory can be shared between processes in different modes. It can be used for huge allocations of virtual memory which only grows as space is needed.

Oh yes, I agree with all of that, which is why I was thinking a defined API for basically all of that would be useful, particularly if it’s built cross-platform. open was more of a starting point for discussion, since that’s a typical use case.

A memory mapping cannot be adequately exposed in a File instance.

I may be reading this statement wrong, but I wasn’t implying the mapping itself be exposed. In fact, I’d prefer if the details were hidden behind a normal File interface. For example, when reading bytes and when it cross the view boundary, it would automatically move the view under the covers.

One issue is that every OS has their own extensions, and that the extensions will usually be pretty useful.

Memory can be shared between processes in different modes.

Yep, including without having a backing actual file. One example is wayland_client/src/buffer/wl_pool.cr at main · yxhuvud/wayland_client · GitHub where I use a memfd to back a piece of shared memory.

A defined API may be hard. I’d be happy enough with a sanctioned interface to the LibC parts that won’t be hidden away or changed. But then I use it for something that doesn’t need to be portable.

The discussion seems to tend to an API like MemoryMap.new(size, **flags) and MemoryMap.from(file) as well as File#to_memory_map ?

A good start could be a scratch shard to explore the idea :slight_smile:

3 Likes