Compiler release options causing segfaults

code gets locked up in teh sigtrap handler Signal.cr:156 trying to write the file descriptior it blocks but only segfaults if i build with --release

reproducible source code example please?

And crystal --version!

0.33.0 crystal, code that was triggering it seems to be when i spawn a process via Process.new, and i wait for termination. i have a few other signal handlers installed HUP,INT,TERM to perform cleanup on exit. IF i compile without --release it works fine 0 issues, im going to roll back to 0.32.1 and see if the same issue occurs, i suspect its something in the compiler chnages introduced in 0.33

yeah i rolled back to crystal 0.32.1 and it no longer segfaults, now i was using a native lib that has a C interface and several enums, i had to change the enums for 0.33 to remove the commas from some of the enum lists. so it defentily appears during the relase optimization steps

If you know how to compile a compiler from source code you could do git bisect to find the commit that introduced the issue.

defintily seems signal related, during the optimization process, i have multiple UNIXSockets in my app, along with a spawned sub process, with a signal passing mechanism for a heartbeat. even in 0.32.1 with release flag i get randome hangups.

reproducible minimized code sample possible?

not really its all quite involved, and does a long running nginx -> kemal -> webscoket -> unix_socket -> actioncable handler process.

        output = IO::Memory.new
        processor = Process.new(hectate_cmd, args: @run_cmd, env: @run_env, error: output, output: output)
        @child = processor.pid
        @child_id = job_doc[JID].as(String)
        completed = false
        spawn do
            cntr = 0
            loop do
                if (child = @child) > 0 && !completed
                    sleep 1
                    cntr = cntr + 1;
                    if completed || (chid = @child_id).empty?
                        break
                    end
                    if !(chid = @child_id).empty? && !completed && cntr == 30
                        @db.touchJob(chid)
                        cntr = 0
                    end
                else
                    break
                end
            end
        end
        status = processor.wait
        completed = true
        exit_status = status.exit_status

I run ten of theses handlers in a fiber, in release mode after a while ill just randomly lose signals being sent and segfault/crash. again only if i build with release flag so maybe a bug in llvm optimization pass?

backtrace of failure message? What line does it fail on?

im tearing arpat each section to figure out where it dies, right now im dying with
Invalid memory access (signal 11) at address 0x100000048
[0x10e4196f7] *CallStack::print_backtrace:Int32 +39
[0x10e33b5e7] __crystal_sigfault_handler +487
[0x7fff6b66fb5d] _sigtramp +29
[0x10e313127] *String::Builder#to_s:String +7

well i found at;east 1 cause, it was a bad type cast to was passing a BSON pointer to a string interop, i changd to to get the json rep instead, but still digging

ok after a lot of digging, the signal handler hang (while important) was a red herring to my crash cause.
It boiled down, to a.) an invalid pointer cast, and b.) an array being updated @run_cmd, across fibers, leaving an invalid reference, that was intern passed to the process as the arguments, leading the segfault.

Thanks to the suggestions, I was able to track it down. so thanks

1 Like