I can’t store
150_000_000 items even into a simple
When I try to create
Hash with precreated space - with
initial_capacity in constructor - “Arithmetic overflow (OverflowError)” occurs in the
When I ignore
initial_capacity, “Maximum Hash size reached” error occurs while filling the
Hash somwhere around
150_000_000 items really too much today? What about a billion entries? Why is maximum so low?
So I should implement my own
This is unfortunately a limitation of the internal 32-bit index size.
See explanation here in the source code:
I’m afraid there is currently no alternative for collections using bigger size types in the standard library.
cf. Data structers for large datasets · Issue #8523 · crystal-lang/crystal · GitHub
This is a long-known problem but a solution that involves changing stdlib’s size type is hard. And apparently this limitation is rarely an issue in practice.
Maybe you could describe your use case or particular problem. There might be a way to model it without a huge hash.
Imagine just a big in-memory index (object_ids → position in a file + some other metadata) or something like this.
It looks like it won’t be a big problem to copy stdlib
Hash and make it based on Int64 (unfortunately, it will not be able to implement
Is my math off, or is that already 1.5gb of data if each entry has even only 10 bytes (total, including internaly used memory of the type)?
If you really need to work with such large indices of stuff, i would probably roll my own datatype.
That makes it easier later to do stuff like lazy loading of data or pagination when you run out of memory on the machine.
Interesting, is this something that’s feasible to use Redis for?
I can imagine that populating that giant index all at once wouldn’t work for that, but if that mapping is accumulated over time, it might be a decent tradeoff since it can hold 4 billion keys.