I can’t store 150_000_000 items even into a simple Hash(Int32, Int32).
When I try to create Hash with precreated space - with initial_capacity in constructor - “Arithmetic overflow (OverflowError)” occurs in the Hash creation
When I ignore initial_capacity, “Maximum Hash size reached” error occurs while filling the Hash somwhere around 100_000_000 entries.
What…? Is 150_000_000 items really too much today? What about a billion entries? Why is maximum so low?
This is a long-known problem but a solution that involves changing stdlib’s size type is hard. And apparently this limitation is rarely an issue in practice.
Imagine just a big in-memory index (object_ids → position in a file + some other metadata) or something like this.
It looks like it won’t be a big problem to copy stdlib Hash and make it based on Int64 (unfortunately, it will not be able to implement Enumerable (because size in Enumerable is Int32 etc).
Is my math off, or is that already 1.5gb of data if each entry has even only 10 bytes (total, including internaly used memory of the type)?
If you really need to work with such large indices of stuff, i would probably roll my own datatype.
That makes it easier later to do stuff like lazy loading of data or pagination when you run out of memory on the machine.
Interesting, is this something that’s feasible to use Redis for?
I can imagine that populating that giant index all at once wouldn’t work for that, but if that mapping is accumulated over time, it might be a decent tradeoff since it can hold 4 billion keys.