I have this project based on Amber. It spawns some background processes that basically just syncs data from an external service.
Works fine in general, but recently I’ve discovered that it fails if the process is long running (a full sync rather than update). But only in production.
Or rather, no so much“fail” as simply exiting. No errors, no shutdown message, nothing caught in Sentry by raven. It just exits and docker restarts the container.
Doesn’t happen when I run the same docker image locally, with the same parameters.
I’m pretty stumped. The complete lack of error messages makes it seem that some code somewhere is doing a quiet exit, but the fact that it only happens on the production server suggests that some error ought to have happened.
How is the app running in production? Is it possible it’s getting killed because it’s hitting some resource limit or something that local doesn’t have?
A simple docker-compose up. Yeah, I was considering that, but I would expect it to be killed with some sort of clue. In K8n the smoking gun is the 137 exit code (means that it was killed by the OOM killer), and I believe the same goes for base docker. But docker claims it just exited with a zero exit code.
The production server is running CoreOS, so I’m a bit off my turf, but it shouldn’t be set up particularly restricted.
What you can do, is run your project without docker, compile it with debug symbols, push it to your server (–cross-compile may be necessary) and run it using gdb, and when it crashes, run backtrace on gdb to see where the program died. That should give you some clue
That is what I had did to find which thing made my program hang.
Before this showed up, I’d extended my little client the above-mentioned project was using, to allow for supplying a block to calls that fetched entries so you could process them as they came in, rather than getting them all and then .each the result. As the client already handles paging, this allows for a more “streaming” like approach.
Turns out, as soon as I changed the project to use a block instead, the problem went away. So I figure it was running out of memory (small server). Rather annoyed that there was no clue about it anywhere.