Issue reading really large file with read_file

I have this really large file I want to read in to my app. I noticed that my app wouldn’t build after an hour, so my guess was it was something related to this. I broke it out to a separate file to test.

Running crystal build --release test.cr builds this file in about 11 seconds for me. The domains.txt file contains 119,261 lines.

# test.cr
DOMAINS = {{ read_file("./domains.txt").split('\n') }}

Now, when I try to iterate over that array, that compile time shoots up from 11 seconds to infinity (as far as I can tell because it never finishes building).

DOMAINS = {{ read_file("./domains.txt").split('\n') }}

value = "test@test.com"

found = DOMAINS.find do |domain|
  value.ends_with?(domain)
end

pp found
❯ crystal build --stats --release test.cr -o zzz
Parse:                             00:00:00.000129278 (   0.75MB)
Semantic (top level):              00:00:00.320183866 (  43.35MB)
Semantic (new):                    00:00:00.000973178 (  43.35MB)
Semantic (type declarations):      00:00:00.015073108 (  43.35MB)
Semantic (abstract def check):     00:00:00.003208983 (  43.35MB)
Semantic (ivars initializers):     00:00:00.013159808 (  57.80MB)
Semantic (cvars initializers):     00:00:00.068934745 (  73.80MB)
Semantic (main):                   00:00:00.666560999 ( 234.23MB)
Semantic (cleanup):                00:00:00.007756949 ( 234.23MB)
Semantic (recursive struct check): 00:00:00.001534194 ( 234.23MB)
Codegen (crystal):                 00:00:01.745678227 ( 314.23MB)
Codegen (bc+obj):

It basically sits at this spot…

Has anyone seen anything similar? Anyone have a suggestion for getting around it?

I wonder if it’s not the read_file but just the massive Array? I removed the macro and just dropped in a regular array

DOMAINS = %w[
...
super
long
...
]

and after 5 minutes, it’s still trying to build. Normally my app builds in ~1minute

I tried this:

A = {% begin %}
  %w[
    {% for i in 0..100_000 %}
      {{i.stringify}}
    {% end %}
  ]
  {% end %}

puts A.size

It compiles in about 4 seconds and prints 100000 as an output.

Do you have code that we can copy/paste?

You’ll have to just download these two files, then build it with the release flag

Are you sure you compiled with --release?
For me your code hangs at codegen:

$ crystal run --release --stats test-array.cr
Parse:                             00:00:00.000045500 (   0.75MB)
Semantic (top level):              00:00:00.184131000 (  44.82MB)
Semantic (new):                    00:00:00.001422800 (  44.82MB)
Semantic (type declarations):      00:00:00.028435500 (  44.82MB)
Semantic (abstract def check):     00:00:00.005617600 (  44.82MB)
Semantic (ivars initializers):     00:00:00.015214200 (  59.77MB)
Semantic (cvars initializers):     00:00:00.066107500 (  75.77MB)
Semantic (main):                   00:00:00.802865400 ( 204.20MB)
Semantic (cleanup):                00:00:00.006389700 ( 204.20MB)
Semantic (recursive struct check): 00:00:00.000689500 ( 204.20MB)
Codegen (crystal):                 00:00:01.448765600 ( 236.20MB)
Codegen (bc+obj):

There is already an issue on this:

Workaround is just to move .split('\n') out of the macro expression which lets the array be created at runtime.

1 Like

Ok cool. I’ll give that a try, thanks! At least it’s a known issue.