Segfault while bootstraping from ruby

I’ve been trying to bootstrap the compiler from ruby using GitHub - crystal-lang/bootstrap-script: Automated script to bootstrap the crystal compiler from source, with the goal of updating the script to the latest versions & using containers, but can’t get past the second stage:

==> Bootstrapping Crystal (001/160)
patching file bin/crystal
patching file bootstrap/crystal/compiler.cr
patching file bootstrap/crystal/program.cr
patching file lib/crystal/compiler.rb
                           user     system      total        real
parse                  0.000000   0.000000   0.000000 (  0.000615)
normalize              2.910000   0.120000   3.040000 (  3.030746)
type inference:       26.740000   1.190000  27.930000 ( 27.953474)
fix empty types        0.350000   0.000000   0.350000 (  0.352627)
afert type inference   0.730000   0.020000   0.750000 (  0.739854)
codegen-llvm          17.450000   0.930000  18.380000 ( 18.401006)
codegen-llc          .crystal//home/geopjr/bootstrap-script/buildroot/src/crystal/bootstrap/crystal.cr/main.s: Assembler messages:
.crystal//home/geopjr/bootstrap-script/buildroot/src/crystal/bootstrap/crystal.cr/main.s:8847: Warning: stand-alone `data16' prefix
.crystal//home/geopjr/bootstrap-script/buildroot/src/crystal/bootstrap/crystal.cr/main.s:8849: Warning: stand-alone `data16' prefix
.crystal//home/geopjr/bootstrap-script/buildroot/src/crystal/bootstrap/crystal.cr/main.s:8850: Warning: stand-alone `data16' prefix
  0.850000   0.170000  54.040000 ( 54.169867)
codegen-clang        /usr/bin/ld: .crystal//home/geopjr/bootstrap-script/buildroot/src/crystal/bootstrap/crystal.cr/main.o: in function `_2A_Crystal_3A__3A_Compiler_23_compile_3C_Crystal_3A__3A_Compiler_3E__3A_Nil':
.crystal//home/geopjr/bootstrap-script/buildroot/src/crystal/bootstrap/crystal.cr/main.bc:(.text+0x11b46): warning: the use of `tmpnam' is dangerous, better use `mkstemp'
/usr/bin/ld: warning: .crystal//home/geopjr/bootstrap-script/buildroot/src/crystal/bootstrap/crystal.cr/main.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
  0.000000   0.110000   0.570000 (  0.576038)
TOTAL:                49.030000   2.540000 105.060000 (105.224228)
==> Bootstrapping Crystal (002/160)
Normalize: 0.618294 seconds
Type inference: 10.2995 seconds
Codegen (crystal): 5.31784 seconds
Codegen (bitcode): 1.07611 seconds
./bootstrap: line 192: 24294 Segmentation fault      (core dumped) "$input" "${crystal_args[@]}" -o "$output" src/compiler/crystal.cr
Normalize: 0.598149 seconds
Type inference: 10.868 seconds
Codegen (crystal): 5.3217 seconds
Codegen (bitcode): 1.17425 seconds
./bootstrap: line 192: 25957 Segmentation fault      (core dumped) "$input" "${crystal_args[@]}" -o "$output" src/compiler/crystal.cr

Using the compiler itself is a bit more verbose:

1-crystal -e "puts \"foo\""

No such file or directory:
_2A_Exception_23_initialize_3C_Errno_2C__20_String_3E__3A_Array_28_String_29_ + [0]
_2A_Errno_23_initialize_3C_Errno_3E__3A_Array_28_String_29_ + [0]
_2A_Errno_3A__3A_new_3C_Errno_3A_Class_3E__3A_Errno + [0]
_2A_File_3A__3A_expand_path_3C_File_3A_Class_2C__20_String_3E__3A_String + [0]
_2A_Crystal_3A__3A_Program_23_require_from_load_path_3C_Crystal_3A__3A_Program_2C__20_String_3E__3A_Nil_20__7C__20_Crystal_3A__3A_ASTNode_2B_ + [0]
_2A_Crystal_3A__3A_Program_23_require_3C_Crystal_3A__3A_Program_2C__20_String_2C__20_Nil_20__7C__20_String_20__7C__20_Crystal_3A__3A_VirtualFile_3E__3A_Nil_20__7C__20_Crystal_3A__3A_ASTNode_2B_ + [0]
_2A_Crystal_3A__3A_Normalizer_23_transform_3C_Crystal_3A__3A_Normalizer_2C__20_Crystal_3A__3A_Require_2B__3E__3A_Crystal_3A__3A_ASTNode_2B_ + [0]
_2A_Crystal_3A__3A_ASTNode_2B__23_transform_3C_Crystal_3A__3A_ASTNode_2B__2C__20_Crystal_3A__3A_Normalizer_3E__3A_Crystal_3A__3A_ASTNode_2B_ + [0]
_2A_Crystal_3A__3A_Normalizer_23_transform_3C_Crystal_3A__3A_Normalizer_2C__20_Crystal_3A__3A_Expressions_2B__3E__3A_Crystal_3A__3A_ASTNode_2B_ + [0]
_2A_Crystal_3A__3A_ASTNode_2B__23_transform_3C_Crystal_3A__3A_ASTNode_2B__2C__20_Crystal_3A__3A_Normalizer_3E__3A_Crystal_3A__3A_ASTNode_2B_ + [0]
_2A_Crystal_3A__3A_Normalizer_23_normalize_3C_Crystal_3A__3A_Normalizer_2C__20_Crystal_3A__3A_ASTNode_2B__3E__3A_Crystal_3A__3A_ASTNode_2B_ + [0]
_2A_Crystal_3A__3A_Program_23_normalize_3C_Crystal_3A__3A_Program_2C__20_Crystal_3A__3A_ASTNode_2B__3E__3A_Crystal_3A__3A_ASTNode_2B_ + [0]
_2A_Crystal_3A__3A_Compiler_23_compile_3C_Crystal_3A__3A_Compiler_3E__3A_Nil + [0]
__crystal_main + [0]
main + [0]
__libc_init_first + [0]
__libc_start_main + [0]
_start + [0]

More info:

  • if you attempt to run the script, it will fail due to pcl’s download location
-- curl -L -o "$downloads"/pcl-1.12.tar.gz http://xmailserver.org/pcl-1.12.tar.gz
++ curl -L -o "$downloads"/pcl-1.12.tar.gz http://www.xmailserver.org/pcl-1.12.tar.gz
  • LLVM 3 requires python 2 (?) which might have been replaced by 3 on your distro (and removed completely from its repos), you can run the following to install it in the build env (run it when it fails and start again after it finishes):
curl -L -o ./downloads/Python-2.7.3.tgz http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz && cd ./buildroot/src/ && rm -Rf Python-2.7.3.tgz && tar xzf ../../downloads/Python-2.7.3.tgz && cd Python-2.7.3 && ./configure --prefix="$(pwd)"/../../ --enable-shared && make && make install && cd ../../../ && rm -rf buildroot/src/llvm-3.3.src/
  • It probably won’t be able to find librt and you might need to symlink it (not sure if this is the best approach but I didn’t want to modify the stage patches yet): ln -s /usr/lib/x86_64-linux-gnu/librt.so.1 ./buildroot/lib/librt.so
  • The script fails on my machine during linking one of the deps, so all my attempts have been on debian containers and vms, including debian versions before the glibc librt & python change, so I doubt the symlink or python2 is the issue

Any ideas? :person_shrugging:

what’s the goal?

I mean, you mention updating the script and using the container, but do you have a more specific goal?

do you have a more specific goal?

Yes, kind-of… there’s a certain distro in alpha with an unusual stack and while figuring out what would be the best way to package Crystal for it I went down the bootstrapping from Ruby rabbit hole. Fixing the script is not that important for reaching that goal as there are better ways to port it over but it’d be nice to have it as a working alternative.

However, I’m about to give up as even using the packages of the time yields the same result and I don’t think anyone is up to debugging a decade old build:

# git clone https://github.com/crystal-lang/crystal/ -b ruby
FROM ubuntu:precise
WORKDIR /app
COPY . .

RUN cat llvm-3.3.tar.gz | tar xz --strip-components=1 -C /usr
RUN sed -i -re 's/([a-z]{2}\.)?archive.ubuntu.com|security.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list
RUN apt-get update && apt-get install -y  zlib1g-dev ruby1.9.3 build-essential git libpcre3-dev libunwind7-dev wget


WORKDIR /app/crystal
RUN gem install bundler -v '< 1.8' --no-ri --no-rdoc
RUN bundle
# RUN bin/crystal -e "puts 1" # Works
RUN bin/crystal -stats bootstrap/crystal.cr
RUN ./crystal -e "puts 1"
# segfault
1 Like

Hi, @GeopJr and other readers…

I’ve also tried to get the bootstrap script working and ran into similar issues (on Debian 12 “bookworm”, in my case). Here’s what I did so far:

  1. Like you, I saw I needed Python 2. I built Python 2.7.18 from source and made a symlink so that the python2 command would run this version.
  2. I saw a message about needing to install GraphViz. This is a red herring: GraphViz was already installed, and this message is displayed regardless of whether or not it’s installed. So, you don’t need to worry about this message.
  3. I had an error about “Cannot find llvm-config”. This was displayed immediately after the GraphViz message (which is why I ended up looking into that). I’m still not sure why this happened, but after a bit of investigating, I ended up just doing rm -rf buildroot/src/llvm-3.3.src and re-running the bootstrap script, and it worked that time.
  4. I got the error with librt.so. Now I can share some useful information about this error: it happens because that library has been removed from modern glibc. It was only needed by time.linux.cr, which is easily fixed with the following patch (add this to the end of stage1.patch, and note that the empty-looking line before class Time needs a single space on it to be a valid patch file):
diff --git a/std/time.linux.cr b/std/time.linux.cr
index 8aa07734b..dbe094b9a 100644
--- a/std/time.linux.cr
+++ b/std/time.linux.cr
@@ -1,4 +1,4 @@
-lib Librt("rt")
+lib C
   struct TimeSpec
     tv_sec, tv_nsec : Int64
   end
@@ -7,7 +7,7 @@ end
 
 class Time
   def initialize
-    Librt.clock_gettime(0, out time)
+    C.clock_gettime(0, out time)
     @seconds = time.tv_sec + time.tv_nsec / 1e9
   end
 end

If you search for clock_gettime, you might find yourself here, which states “Link with -lrt (only for glibc versions before 2.17)”; that’s how I figured to just change Librt to C.

  1. I got an error about libunwind.so. I tried the following workaround (I don’t suggest doing this, as it doesn’t seem to work, see below):
# as root
cd /usr/lib/x86_64-linux-gnu
ln -s libunwind.so.8.0.1 libunwind.so

Doing this does allow stage1 to build, but when it tries to actually run it, it gets a segfault - maybe the same segfault that you’re getting…

I did do a little research, which reveals there’s multiple different libunwind projects, and maybe we’re linking to the wrong one. My libunwind.so.8.0.1, which is part of my Debian 12 system, comes from “nongnu” libunwind, whatever “nongnu” means. But there’s an LLVM libunwind as well, as discussed here, and I see it’s mentioned here in the LLVM docs. Since Crystal uses LLVM I’m pondering whether I need to somehow get it to link with the LLVM libunwind instead of the “nongnu” one, and whether that’ll resolve the segfault.

But I’m getting very out of my depth here, so that’s where I stopped. If anyone has some pointers for how to make further progress, I’d really appreciate it!

Hopefully related, I’ve also posted over on the Debian User Forums to try to find out how Debian is compiling Crystal, since I feel sure they must be bootstrapping it somehow.

1 Like

Hi all, I’ve made some progress…!

I installed Debian 7, 8, 9, 10 and 11 in virtual machines, in addition to my desktop Debian 12 installation.

I have now gotten past Stage 2 without a segfault, on Debian 9 & 10 only. On Debian 11 & 12 it still segfaults. I didn’t test 7 & 8 to this point since once I found the success on Debian 9 I felt there was no reason to proceed further on older versions at this point. Note that I cancelled the bootstrap process after it got to Stage 3 due to time constraints, so I don’t yet know if there’ll be further trouble later on.

As I suspected, you don’t want to manually create a libunwind.so symlink like I described above. There’s a package for it; it was only a matter of finding it.

From a clean Debian install, as well as ensuring that python2 runs Python 2, you need at least:

apt-get install \
  automake \
  build-essential \
  cmake \
  git \
  libpcre3-dev \
  libunwind-dev \
  libyaml-dev \
  zlib1g-dev
  • libyaml-dev and zlib1g-dev are required in order to get ruby to build in a way that lets you install gems. If either are missing, then ruby will successfully build, but you’ll get an error later when it tries to install gems, which doesn’t at all point you towards the fact that it’s due to these packages not being installed.
  • libpcre3-dev is what you need for the bootstrap script to work, not libpcre2-dev. Confusingly, although libpcre2-dev means PCRE 2, libpcre3-dev means PCRE 1 and is older. This had me stumped for a while.
  • libunwind-dev is (unsurprisingly) the package which creates libunwind.so. There are various numbered alternatives available as well, depending on the Debian version.
  • cmake is only required later in the bootstrap process. I haven’t actually gotten to the point where it’s required yet, but figured I may as well list it.

One basic thing I want to explore is if substituting one of the other available libunwind packages on Debian 11 and/or 12 resolves the segfault there. I suspect it probably won’t, though, in which case I guess the next thing I’ll want to do is check which other system libraries are being picked up, and whether I can replace more of them with older versions compiled from source, and somehow figure out a combination that won’t segfault.

I’ll also let the bootstrap script run further on 9 & 10 when I have more time, and see whether it can get all the way to the end yet or if I have any further trouble beyond stage 3.

I spent a while trying to get this to work again a while ago, on arch linux (the same install I wrote the script on…) I’m starting to suspect I may have to make a more substantial container to reproduce the build.

However, the idea was to have something as lightweight as possible to allow the bootstrap to be reproduced in an official debian package. But the set of things that need to be built into the local prefix is so much larger now, and the failures are harder to debug.

1 Like

Thanks for the reply!

I’m planning to do some more work on this soon, and happy to help out where I can. I’ve got several changes to the bootstrap script already - once I’ve got further I can certainly make a PR.

Yeah, that’s a total footgun :man_facepalming: Who could come up with such a crooked idea?