Compilation issue with Crystal 1.6.x

So I have a challenging one, not sure how to solve it either.
Basically seeing a segfault when compiling

This is the repo:

It does compile on Crystal 1.5.x currently
Any pointers on what I can do would be much appreciated - I’m pulling out some of the less critical shards used for instrumentation etc to see if that has any effect currently

So the failure happens with 1.6.1?

I’d recommend to build the compiler locally and then run with that. It should show more debug info. A meaningful stack trace could help localize the error.

Yeah 1.6.1. Thanks for the pointer :slight_smile:

By chance I had 1.6.0-dev compiled already when I was looking into static compilation issues
here is the backtrace for that: compiler error · GitHub

Just waiting for 1.6.1 to finish compiling

1.7.0-dev backtrace

This is very likely the result of LLVM phasing out typed pointers. Did you use LLVM 15 or 16?

using LLVM 11

Wow… could you try using a more recent LLVM version? Old LLVM versions were full of bugs. The Crystal team even reported a few of them. This is likely an LLVM bug.

What LLVM version do you have for Crystal 1.5.x?

That’s the latest version available for Debian stable when I run apt install llvm

I do have the same issue in docker containers such as the official crystal lang alpine containers.
Simple to replicate:

git clone https://github.com/placeos/core
shards install
crystal build ./src/core-app.cr

The release version of crystal uses the LLVM it comes packaged with

Some alternatives to figure out the problem:

  1. Try to use an LLVM compiled with assertions. Then instead of a segfault we’ll know what went wrong inside LLVM.
  2. Put puts in the methods that are near the end of the backtrace, and report back here with what’s printed, to see if we are clearly doing something wrong.

With point 2, I mean… this is the backtrace:

Crystal::CodeGenVisitor#aggregate_index<LLVM::Value, Int32>:LLVM::Value
Crystal::CodeGenVisitor#union_type_id<LLVM::Value>:LLVM::Value
Crystal::CodeGenVisitor#union_type_and_value_pointer<LLVM::Value, Crystal::MixedUnionType>:Tuple(LLVM::Value, LLVM::Value) 
Crystal::CodeGenVisitor#type_id_impl<LLVM::Value, Crystal::MixedUnionType>:LLVM::Value
Crystal::CodeGenVisitor#type_id<LLVM::Value, Crystal::Type+>:LLVM::Value
Crystal::CodeGenVisitor#codegen_dispatch<Crystal::Call, Crystal::ZeroOneOrMany(Crystal::Def+)>:Bool
Crystal::CodeGenVisitor#visit<Crystal::Call>:Bool

So I would put puts in CodeGenVisitor#visit(node : Call) and output node, and also do the same for type_id, to see what type is causing the issue.

1 Like

Get same backtrace when compile with 1.6.1 self-compiled compiler.

Crystal 1.6.1 [fc61bd678] (2022-10-21)

LLVM: 14.0.6
Default target: x86_64-pc-linux-gnu

http://dpaste.com/HMPT79QFX

Hi, i check all visit in src/compiler/crystal/codegen/codegen.cr no CodeGenVisitor#visit(node : Call).

Most of the last couple of hours was just compiling LLVM

Now it outputs this when I try to compile any crystal code…

crystal: /llvm-project/llvm/include/llvm/IR/Type.h:389: llvm::Type* llvm::Type::getNonOpaquePointerElementType() const: Assertion `NumContainedTys && "Attempting to get element type of opaque pointer"' failed.
Aborted

That’s after building LLVM with:

cmake -DCMAKE_BUILD_TYPE:STRING=Debug ./llvm
make install

That error is expected for LLVM 15; Crystal doesn’t support that version yet, nor does it use the LLVMContextSetOpaquePointers opt-out.

Older LLVM versions shouldn’t be using opaque pointers, though.

Dang, looks like I’m building LLVM 14 :joy:

I can reproduce the above backtrace use compiler 1.6.1 (built with llvm 14.0.6) and built compiler and app use same env (llvm version)

Now that I have LLVM 14 compiled with Debug enabled, in terms of point 1 @asterite
I’m seeing this when I try to compile:

crystal: /llvm-project/llvm/lib/IR/Instructions.cpp:3154: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
Aborted

part 2

visit => Emitter.new(@source, severity, exception)
visit => ::String.interpolation("missing edge manager for ", edge_id)
visit => backend.dispatch(dsl.emit(result.to_s))
visit => dsl.emit(result.to_s)
visit => result.to_s
visit => callbacks.try(&.each do |callback|
  manager.debug(module_id, &callback) do |__arg0|
    callback.call(__arg0)
  end
end)
type_id => val:   %499 = load %"Array(Proc(String, Nil))"*, %"Array(Proc(String, Nil))"** %callbacks, align 8, !dbg !36, type: (Array(Proc(String, Nil)) | Nil)
visit => %self.try(&.each do |callback|
  manager.debug(module_id, &callback) do |__arg0|
    callback.call(__arg0)
  end
end)
visit => %self.try(&.each do |callback|
  manager.debug(module_id, &callback) do |__arg0|
    callback.call(__arg0)
  end
end)
visit => __arg5.each do |callback|
  manager.debug(module_id, &callback) do |__arg0|
    callback.call(__arg0)
  end
end
visit => each_index do |i|
  yield unsafe_fetch(i)
end
visit => i < size
visit => size
visit => unsafe_fetch(i)
visit => manager.debug(module_id, &callback) do |__arg0|
  callback.call(__arg0)
end
crystal: /home/steve/projects/client-projects/llvm-project/llvm/lib/IR/Instructions.cpp:3154: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
Aborted

Looks like this is the culprit from a bisection

1 Like

although that made it into crystal 1.5.1 and there are no issues with that version

I noticed that most of our services don’t pass specs when running on crystal 1.6.1
I’m seeing issues like:

Invalid memory access (signal 11) at address 0x7ffdbf0650b0
[0x55d680829ca6] print_backtrace at /usr/share/crystal/src/exception/call_stack/libunwind.cr:100:5
[0x55d68068c5e6] -> at /usr/share/crystal/src/signal.cr:151:5
[0x7f6062743c9f] ?? +140051945372831 in /lib/ld-musl-x86_64.so.1

That occurs running specs using the official crystal lang 1.6.1 image and the 84codes image.
reverting back to crystal 1.5.1 and everything passes (example above is from our rest-api service)

Although I am probably conflating two separate problems

We could revert that commit and see if that fixes the issue. In the end I did that commit in preparation of something else that I never got to do. So reverting it would be harmless.

2 Likes