Post Mortem: Issues in the Crystal 1.14.1 Release process

Shards build

The build of shards failed. We were aware of issues with the shards build process (Current nightly build of `shards` is broken · Issue #330 · crystal-lang/distribution-scripts · GitHub), but previously the problem was that the binary was linked dynamically instead of statically. In this release process’ tarball the binary didn’t even exist.
I don’t recall whether or how I verified the shards build. Considering we were aware of potentialy issues, I’m sure I did test it somehow, but didn’t write that down. It might’ve been the same test as in the Dockerfile, [ "$(ldd bin/shards 2>&1 | wc -l)" -eq "1"] which unfortunately doesn’t fail when the file is missing entirely. :man_facepalming:

Anyway, the issue didn’t get noticed and the tarball with the broken shards binary was published. And shortly after we received a bug report: Error: Error: spawn shards ENOENT · Issue #51 · crystal-lang/install-crystal · GitHub

At this point we did the following to contain the botched package:

  • Put the GitHub release back into draft mode, hiding it from public view
  • Reverted the docker tags 1 and 1.14 to point to 1.14.0
  • Promoted the snap packages for 1.14.0 back to stable

The OBS packages were not published at this point anyway.

Then I triggered a rebuilt of the release workflow in CircleCI. This didn’t help though because it botched again.
I also built the tarball locally which succeeded with a correctly linked shards binary. So I proceeded with publishing the locally built tarball as package iteration 2, and removed the original tarballs (iteration 1) from the release.

Later I noticed that I had used the master version of distribution-scripts which was intended to be used for releasing 1.15, not 1.14. It included some relevant changes such as the update to shards 0.19.0 which was not supposed to be rolled out with 1.14.1.
With the correct version of distribution-scripts I was unable to produce a correct build for the generic tarball. It wouldn’t spit out a statically-linked shards binary.
I tried a couple of times. Locally and in circleci. I have cleaned the docker cache. I’ve added debug output to the dockerfile. There it says it’s statically linked. But in the exported tarball it’s suddenly dynamically linked.
I can’t see how this could end up the way it does, but somehow it still manages. The build process isn’t super complex, it’s a Dockerfile driven by a Makefile. It’s all a bit messy, but not rocket science.

Finally, I resorted to building shards manually, using basically the commands from the dockerfile. I doubled checked that everything is okay and just put the correctly built shards binaries into the tarball from release iteration 2, and published them as iteration 3 and rebuilt the docker images.

Windows issue

Uploading the Windows artifacts from GitHub Actions to the release is a manual process and involves renaming the files. I messed this up and renamed and uploaded the wrong file (I uploaded the contents of the installer zip as the portable package). Crystal release v1.14.1 Windows portable zip archive missing `crystal.exe` and related binaries · Issue #15331 · crystal-lang/crystal · GitHub
This was fixed easily be re-uploading the correct file.
We don’t use a package iteration number on the Windows artifacts, so I didn’t increase that.

How to prevent this in the future

What else I learned

  • There were some concerns about whether package iterations would break some automations somewhere, but we didn’t notice any issues with that. 1.14.1 was the latest release only for a couple of hours until it was replaced by 1.15.0, so some automations might have skipped it. But overall, I’m quite confident.
  • I realized that snapcraft releases can be managed quite easily in the web UI. That’s better than managing in the terminal.
14 Likes