Shards build
The build of shards
failed. We were aware of issues with the shards build process (Current nightly build of `shards` is broken · Issue #330 · crystal-lang/distribution-scripts · GitHub), but previously the problem was that the binary was linked dynamically instead of statically. In this release process’ tarball the binary didn’t even exist.
I don’t recall whether or how I verified the shards build. Considering we were aware of potentialy issues, I’m sure I did test it somehow, but didn’t write that down. It might’ve been the same test as in the Dockerfile, [ "$(ldd bin/shards 2>&1 | wc -l)" -eq "1"]
which unfortunately doesn’t fail when the file is missing entirely.
Anyway, the issue didn’t get noticed and the tarball with the broken shards
binary was published. And shortly after we received a bug report: Error: Error: spawn shards ENOENT · Issue #51 · crystal-lang/install-crystal · GitHub
At this point we did the following to contain the botched package:
- Put the GitHub release back into draft mode, hiding it from public view
- Reverted the docker tags
1
and1.14
to point to1.14.0
- Promoted the snap packages for 1.14.0 back to stable
The OBS packages were not published at this point anyway.
Then I triggered a rebuilt of the release workflow in CircleCI. This didn’t help though because it botched again.
I also built the tarball locally which succeeded with a correctly linked shards binary. So I proceeded with publishing the locally built tarball as package iteration 2, and removed the original tarballs (iteration 1) from the release.
Later I noticed that I had used the master version of distribution-scripts
which was intended to be used for releasing 1.15, not 1.14. It included some relevant changes such as the update to shards 0.19.0 which was not supposed to be rolled out with 1.14.1.
With the correct version of distribution-scripts
I was unable to produce a correct build for the generic tarball. It wouldn’t spit out a statically-linked shards binary.
I tried a couple of times. Locally and in circleci. I have cleaned the docker cache. I’ve added debug output to the dockerfile. There it says it’s statically linked. But in the exported tarball it’s suddenly dynamically linked.
I can’t see how this could end up the way it does, but somehow it still manages. The build process isn’t super complex, it’s a Dockerfile driven by a Makefile. It’s all a bit messy, but not rocket science.
Finally, I resorted to building shards manually, using basically the commands from the dockerfile. I doubled checked that everything is okay and just put the correctly built shards binaries into the tarball from release iteration 2, and published them as iteration 3 and rebuilt the docker images.
Windows issue
Uploading the Windows artifacts from GitHub Actions to the release is a manual process and involves renaming the files. I messed this up and renamed and uploaded the wrong file (I uploaded the contents of the installer zip as the portable package). Crystal release v1.14.1 Windows portable zip archive missing `crystal.exe` and related binaries · Issue #15331 · crystal-lang/crystal · GitHub
This was fixed easily be re-uploading the correct file.
We don’t use a package iteration number on the Windows artifacts, so I didn’t increase that.
How to prevent this in the future
- Improve validation of build artifacts before they’re published. We should do at least some basic smoke testing in the release process. (
[CI] Add check for shards binary in `test_dist_linux_on_docker` by straight-shoota · Pull Request #15394 · crystal-lang/crystal · GitHub)
- Figure out why statically linking shards fails sometimes and fix it (
Current nightly build of `shards` is broken · Issue #330 · crystal-lang/distribution-scripts · GitHub). We might want to consider moving to a different build tool (e.g. Nix for cross-platform builds)
- Automate publishing Windows release artifacts. Use versioned file names for build artifacts in Windows CI · Issue #14248 · crystal-lang/crystal · GitHub and automatic upload to the release.
What else I learned
- There were some concerns about whether package iterations would break some automations somewhere, but we didn’t notice any issues with that. 1.14.1 was the latest release only for a couple of hours until it was replaced by 1.15.0, so some automations might have skipped it. But overall, I’m quite confident.
- I realized that snapcraft releases can be managed quite easily in the web UI. That’s better than managing in the terminal.