Memory leak after upgrading to Crystal 0.36?

I just upgraded one of my Lucky apps to the latest 0.26 Lucky with Crystal 0.36.1, and it seems when my app boots, I get this error:

Listening on http://0.0.0.0:8000
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Out of memory - trying to allocate requested amount (336 bytes)...
GC Warning: Out of memory - trying to allocate requested amount (336 bytes)...
GC Warning: Out of memory - trying to allocate requested amount (336 bytes)...
GC Warning: Out of memory - trying to allocate requested amount (336 bytes)...
GC Warning: Out of memory - trying to allocate requested amount (336 bytes)...
GC Warning: Out of memory - trying to allocate requested amount (336 bytes)...
GC Warning: Header allocation failed: dropping block
GC Warning: Failed to expand heap by 16777216 bytes
GC Warning: Failed to expand heap by 262144 bytes
GC Warning: Out of Memory! Heap size: 3824 MiB. Returning NULL!
Invalid memory access (signal 11) at address 0x34
[0x563805012436] ???
[0x563804ea14a2] __crystal_sigfault_handler +514
[0x7fbfa3a74980] ???
[0x563804f80429] ???
[0x563804f89681] ???
[0x563804f7a3fc] ???
[0x563804f5cc22] ???
[0x563804fb068a] ???
[0x563804f45429] ???
[0x563804f436bb] ???
[0x563804f2c44b] ???
[0x563804f30280] ???
[0x563804f40bee] ???
[0x563804f2302c] ???
[0x563804f20dba] ???
[0x563804e9baad] ???
[0x0] ???

The app itself is fairly small, and headless (API only). I can’t really share much code, but what’s the best way to go about debugging this to figure out where this is coming from? Or is there any stats I can provide that might shine more light on this?

I should also note that when the app runs on Crystal 0.35.1 and Lucky 0.25.0, it’s fine.

GC: profiling stats and warning bindings by bcardiff · Pull Request #8281 · crystal-lang/crystal · GitHub Introduced GC callbacks for warnings that should help find the culprit allocation. See the referenced comments for examples.

1 Like

I hope I’m using this correctly. :joy:
I’m slapping this code around the place:

puts "*" * 50
pp GC.prof_stats
puts "*" * 50

I put one right before I boot the app, then put another one after a specific endpoint is hit. I’m just trying to locally first to get a feel for it, but here’s what I got when I boot the app:

web    | GC::ProfStats(
web    |  @bytes_before_gc=0,
web    |  @bytes_reclaimed_since_gc=0,
web    |  @bytes_since_gc=709520,
web    |  @free_bytes=57344,
web    |  @gc_no=1,
web    |  @heap_size=786432,
web    |  @markers_m1=7,
web    |  @non_gc_bytes=0,
web    |  @reclaimed_bytes_before_gc=0,
web    |  @unmapped_bytes=0)

Then after hitting the main endpoint once, I got this:

web    | GC::ProfStats(
web    |  @bytes_before_gc=0,
web    |  @bytes_reclaimed_since_gc=0,
web    |  @bytes_since_gc=958528,
web    |  @free_bytes=36864,
web    |  @gc_no=1,
web    |  @heap_size=1052672,
web    |  @markers_m1=7,
web    |  @non_gc_bytes=0,
web    |  @reclaimed_bytes_before_gc=0,
web    |  @unmapped_bytes=0)

Is this the right way to go about it? And if so, what sort of numbers should I be looking for to see that point me in the right direction?

Ok, I think I’ve tracked down where the issue is coming from, but I’m still not sure what the actual issue is. I have one main endpoint in particular that takes 200ms to run on the old version, and 27seconds on the new version… The code is identical between the two aside from shard and crystal versions.

I guess I need to figure out what is happening to cause so much time difference. :confused:

If you include before the warning is triggered the following code (based on GC Warning: Repeated allocation of very large block · Issue #2104 · crystal-lang/crystal · GitHub)

LibGC.set_warn_proc ->(msg, word) {
  raise String.new(msg)
}

you should see the backtrace of the allocation that caused the warning.

LibGC.set_warn_proc ->(msg, word) {
  raise String.new(msg)
}

x = [] of Pointer(Int32)
loop do
  x << Pointer.malloc(10_000_000, 0)
end
$ bin/crystal build foo.ign.cr 

# GC_MAXIMUM_HEAP_SIZE to force the warning sooner
$ GC_MAXIMUM_HEAP_SIZE=10000000 ./foo.ign
Unhandled exception: GC Warning: Out of Memory!  Trying to continue...
 (Exception)
  from foo.ign.cr:2:3 in '->'
  from GC_collect_or_expand
  from GC_alloc_large
  from GC_generic_malloc
  from GC_malloc_kind_global
  from src/gc/boehm.cr:115:5 in 'malloc_atomic'
  from src/gc.cr:35:3 in '__crystal_malloc_atomic64'
  from src/primitives.cr:153:3 in 'malloc'
  from src/pointer.cr:456:5 in 'malloc'
  from foo.ign.cr:7:3 in '__crystal_main'  # <-- allocation in user code
  from src/crystal/main.cr:110:5 in 'main_user_code'
  from src/crystal/main.cr:96:7 in 'main'
  from src/crystal/main.cr:119:3 in 'main'
2 Likes

Ah! Ok, I see now. I’ll give that a shot. Thanks!

Ok, this isn’t a Crystal specific issue. There’s something else up in my app, but being able to use that set_warn_proc made it easier to track some stuff down.

1 Like