Help with Crystal's stack unwinding, c bindings and eBPF uretprobes

Hey all, I’m trying to understand how Crystal, it’s c bindings and stack unwinding work. I’m a contributor to the Pixie project (https://px.dev) which instruments applications on Kubernetes through eBPF. One of the core parts of our functionality is TLS tracing, which is accomplished by uprobing the “read” and “write” functions of OpenSSL and other TLS libraries (SSL_write and SSL_read).

I was recently made aware of a crash that Pixie is causing for amqproxy, which is a Crystal application. The full details of the issue can be seen on this amqproxy issue (87 Illegal instruction (core dumped) amqproxy · Issue #117 · cloudamqp/amqproxy · GitHub).

For some additional background, it’s well known that uretprobes are not compatible with Golang. This is because Go’s management of the stack is in conflict with uretprobe’s modification of the stack (details in bcc#1320 and go#27077-comment).

The only difference is that Go depends on its ability to unwind stacks for GC and stack growth. I assert that uretprobes would break stack unwinding in any language, and regardless of calling convention.

For the amqproxy issue, it was suspected that since Crystal also performs stack unwinding that it would have the same issue as Go (87 Illegal instruction (core dumped) amqproxy · Issue #117 · cloudamqp/amqproxy · GitHub). My understanding is that Crystal would have had a separate stack for any native code it calls and so the uretprobes on OpenSSL should not touch any stack frame’s within Crystal’s “runtime stack”. If Pixie’s eBPF code was attaching a probe to a Crystal function, rather than a native code symbol, then I would expect it to have the same issue as Go’s uretprobes.

Are you able to confirm if my understanding of the issue is correct or if Crystal’s stack unwinding is in conflict with uretprobes? Is Crystal’s stack unwinding only limited to exception handling or are there other cases where the stack unwinding occurs (GC, etc)?

Any guidance, code links or design documentation that would help me understand how things work in Crystal would be greatly appreciated!

No, this is incorrect. Crystal and native call stacks can be very interleaved with each other (especially if there is a lot of callbacks happening), and they are executed in the same stacks. Do note the plural - each fiber will have a stack of its own.

Hi @yxhuvud, thanks for confirming that my suspicion was incorrect. It sounds like Crystal would exhibit the same issues as Go with uretprobes then. In order for me to address this issue, it would be very helpful to reproduce the issue in a more simplistic use case.

Do you have any guidance for how I should model this case in order to reproduce it properly? It seems like I need to build a sample application that calls a dynamically linked c library, which later calls native Crystal code that throws an exception (like the following).

# Call stack

main module
  -> crystal func (func1)
      -> c_library (dynamically linked c binding function)
      -> crystal func 2 w/ exception

In addition to that, are there any reliable indicators for determining if a given binary is a Crystal application? I see that a binary contains __crystal_ prefixed ELF symbols. While I investigate if there is a way to support these types of applications without uretprobes, I was hoping to filter out binaries like amqproxy where Pixie will cause application crashes.

You can reproduce with this simple library:

// call.c
// build: gcc -c -Wall -Werror -fpic call.c
// build: gcc -shared -o libcall.so call.o
typedef int (*Fp)(int);

int call(Fp f) {
  return f(42);
}
# call.cr
# build: crystal build call.cr --link-flags=$PWD/libcall.so
lib LibProc
  fun call(f : LibC::Int -> LibC::Int) : LibC::Int
end

def test(x)
  raise "Raising from Crystal"
  1
end

puts LibProc.call(->test(LibC::Int))

Every Crystal program has a __crystal_main function. I suppose that should work as a signal.

Thanks for the help. With that program and bpftrace, I was able to trigger a stack unwinding issue when a uretprobe is enabled.

$ sudo bpftrace -e 'uretprobe:/home/ddelnano/code/crystal/libcall.so:call { printf("read a line\n"); }'
Attaching 1 probe...

$ crystal build call.cr --link-flags=$PWD/libcall.so
ddelnano@pixie-dev:~/code/crystal (master) $ ./call
Failed to raise an exception: END_OF_STACK
[0x5642fdd01ad6] *Exception::CallStack::print_backtrace:Nil +118 in ./call
[0x5642fdce1784] __crystal_raise +52 in ./call
[0x5642fdce1df1] ?? +94845725974001 in ./call
[0x5642fdce1d77] ?? +94845725973879 in ./call
[0x5642fdce1d0e] ?? +94845725973774 in ./call
[0x5642fdcf12fd] ?? +94845726036733 in ./call
[0x5642fdcf12e6] ~procProc(Int32, Int32) +6 in ./call
[0x7f07baa96114] call +27 in /home/ddelnano/code/crystal/libcall.so
[0x7fffffffe000] ???

Tried to raise:: Raising from Crystal (Exception)
  from call.cr:7:3 in 'test'
  from call.cr:11:19 in '->'
  from /home/ddelnano/code/crystal/libcall.so in 'call'
  from ???