Hi Crystal folks – I have a small Crystal 1.8.0 app (on Ubuntu 20.04) that serves some vector tile map data using http/server, the pg shard and crystal-sqlite3. Encoding the vector tile data is somewhat CPU intensive, so I have this running on a 64 core machine with
CRYSTAL_WORKERS set to 40.
This generally works pretty well, but, being used in mapping applications, traffic can be quite bursty: users are likely to make a rapid series of requests all within a short period of time as they pan/zoom around the map.
Every now and then the app freezes for a while and then crashes with the error:
Signals delivery fails constantly at GC #1644 Signals delivery fails constantly Aborted
I run the app in a
while true; do ... loop to restart it when it happens, but is there a better way to handle this, or prevent it? From what I can tell, the “Signals delivery fails constantly” error comes from bdwgc (bdwgc/pthread_stop_world.c at 9229da044bbc5f5f131741975c0c35522bed227d · ivmai/bdwgc · GitHub ) but this is a bit over my head as to what to actually do about it. It does seem like there’s a
GC_RETRY_SIGNALS environment variable that I can alter to affect how many times (if at all) lost signals are re-sent, but I really have no idea what’s going on here.
Any ideas about what I might consider?