Invalid memory access (signal 11) on 1.13.1, not present on 1.12.2?

Hi there, just wondering if anyone else has seen an increase of Invalid memory access (signal 11) at address 0x0 on Crystal 1.13?

We’ve rolled back to 1.12.2 in production until I can get to the bottom of this!

I can’t post code just yet, haven’t had time to try to make a reduced version, but to give a quick picture: the signal 11 happens within a small internal HTTP microservice, GET endpoint, Kemal. The endpoint takes an optional query parameter. When the query parameter is specified, the code does an INSERT via the crystal-pg shard. We only get the signal 11 when this query parameter is specified. When it’s not specified, the endpoint returns without issue.

On 1.12.2: all works fine, no signal 11, regardless of with or without the query parameter.

On 1.13.1: oddly, the signal 11 seems to happen when built with --no-debug, but when I build with --debug, I can’t reproduce the issue. Hah! :stuck_out_tongue_winking_eye:

Also, the endpoint works totally fine when tested via spec-kemal. It’s only when compiled into a binary and hit with an external TCP connection that we get the signal 11.

I looked through recent PRs and Issues and found LLVM 18 breaks 128-bit integers in the interpreter · Issue #14832 · crystal-lang/crystal · GitHub but that only seems to be relevant to the interpreter. Just wondering if there is anywhere else I should be looking for clues. Thanks :heart:

What OSes and architectures does this happen on?

I feel like it shouldn’t make a difference because a null-pointer dereference doesn’t seem like it would be OS- or architecture-dependent, but since there’s a difference between --debug and --no-debug I’m not sure I want to make that assumption.

1 Like

amd64.

Debian based Docker image.

crystal1.12 or crystal1.13 package installed via apt from deb http://download.opensuse.org/repositories/devel:/languages:/crystal/Debian_12/ /

Thank you!

1 Like

Nothing’s jumping out at me in the diff between the two versions.

In the meantime, to aid in troubleshooting, when the error happens you should see something like this:

Invalid memory access (signal 11) at address 0x0
[0x104ce8ae4] *Exception::CallStack::print_backtrace:Nil +104 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
[0x104cd3d80] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil)@/opt/homebrew/Cellar/crystal/1.13.1/share/crystal/src/crystal/system/unix/signal.cr:143 +320 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
[0x1827c3584] _sigtramp +56 in /usr/lib/system/libsystem_platform.dylib
[0x104d7c398] *OMGLOL#quux:UInt8 +16 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
[0x104d7c380] *OMGLOL#baz:UInt8 +12 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
[0x104d7c36c] *OMGLOL#bar:UInt8 +12 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
[0x104d7c358] *OMGLOL#foo:UInt8 +12 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
[0x104cc79f0] __crystal_main +1004 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
[0x104d351e4] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +12 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
[0x104d3514c] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +60 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
[0x104cd0a30] main +12 in /Users/jamie/.cache/crystal/crystal-run-segv.tmp
code
struct OMGLOL
  def foo
    bar
  end

  def bar
    baz
  end

  def baz
    quux
  end

  def quux
    Pointer(UInt8).null.value
  end
end

OMGLOL.new.foo

Your backtrace will likely be deeper, but otherwise similar. The top 3 frames and the bottom 4 frames are part of the Crystal runtime so you can ignore those, but otherwise that might help narrow it down to at least which method is being called and what’s calling it so you can reproduce it.

1 Like

Just wanted to report that I now believe this segfault has been fixed by: Invalid memory access in release mode with LLVM 18 · Issue #14898 · crystal-lang/crystal · GitHub , which came about from someone else reporting a crystal-pg query segfault issue: Issue w/ Bool - Invalid memory access (signal 11) at address 0x0 · Issue #288 · will/crystal-pg · GitHub

(I haven’t tracked it down to just that patch, but I confirmed that building our app with the nightly Crystal 1.14.0-dev [f3fb7b648] (2024-08-18) from here does not have any issues, while 1.13.1 does.)

Thanks all! :heart:

7 Likes

Just to confirm: I pulled the crystal repo, tag 1.13.2, and built from source. I used this to build our app, and no more segfault.

Then I reverted only this patch: Fix misaligned store in `Bool` to union upcasts by HertzDevil · Pull Request #14906 · crystal-lang/crystal · GitHub and recompiled crystal. Then I built our app, and the segfault came back.

So I’m confident that this fixes our issue. :heavy_check_mark: Both using LLVM: 18.1.8

3 Likes