Multiple shards in a single git repo?

Is it possible/feasible1 to have multiple shards in a single git repo? I’m currently working on a shard to support OpenTelemetry, which a bit different from other instrumentation frameworks I’ve used in that it prescribes separating the gathering of telemetry (the “API”, used by libraries) from the reporting of that telemetry into separate dependencies (the “SDK”, used by applications).

So I need to create at least 2 shards for this. Can I put them in the same git repo or do I need a separate repo for each shard?


1 Specifying “feasible” because I’m not sure that hacks like using different branches for each shard solves the problem in a meaningful way.

1 Like

Ha, I actually been working on the PHP SDK and have been thinking about how to best handle this in Crystal. Tl;dr, no, Shards doesn’t support monorepos, see Added possibility to install submodules of a git repository or local repository by Thellior · Pull Request #238 · crystal-lang/shards · GitHub. From what I thought of so far you basically have 2 options:

  1. Just maintain separate repos
  2. Include them in the same repo, but have them as separate entry points. I.e. you can require "opentelemetry/api" or require "opentelemetry/sdk", maybe with another that does both.

Also I’m pretty familiar with context propagation side of things if you want/need to chat thru anything.

Related: Added possibility to install submodules of a git repository or local repository by Thellior · Pull Request #238 · crystal-lang/shards · GitHub

I considered this but I intend to donate it to the OpenTelemetry project (at the request of a member of the governance committee, and I like that idea for several reasons) and they seem to prefer the monorepo approach whenever possible.

I’ve gone down this road, too. Sharing a dependency tree among all of the libraries is unfortunately a non-starter. An app that depends on a library that instruments itself with OpenTelemetry (as recommended by OTel) would then depend on everything the SDK depends on, all the way down to Protobufs for OTLP, not even counting other exporters yet. This is true even if they aren’t using OTel — the API should have no dependencies at all specifically for this scenario. This has the potential to block dependency updates when there are version mismatches downstream, which would be a huge source of friction.

One solution to this is to maintain instrumentation shards separately so the application would need to opt into that dependency. However, that puts a maintenance burden on the maintainers of an instrumentation shard (keep an eye on each release, figure out a support matrix for each version, etc) as well as developers of applications using them (the feature shards and the instrumentation shards may need to be updated in lockstep, but not always and it’s up to you to figure it out).

The other solution is the monorepo, which is why I was asking about it. :smile: It wouldn’t solve all the things (for example, instrumentation shards may still need to exist for shards that won’t accept opentelemetry-api as a dependency or for stdlib code like HTTP::Client) but it would definitely mitigate the sum of them, which may very well be why the OpenTelemetry project prefers monorepos.

Stephanie makes a really good point here. Using git repos rather than a centralized package repository where you can submit snapshots of select files makes this really, really difficult.

Since shards is based off of Go’s dep management, I’ll look into how the OTel Go libraries are distributed and see if I can get some insights there, but I think Go imports let you drill down into the directory structure of the git repo.

I have a feeling it’s gonna end up fanned out across a lot of different repos, unfortunately.

With shards, it’s impossible to have different shards with different dependencies in a single repository.

While I don’t think there’s a chance to support monorepos with multiple shards, it might be an option to allow specifying optional dependency groups (bundler has this, for example). Then you could ask shards to install opentelemetry with sdk dependency group.
OTOH, more dependencies are just a couple more git repositories being downloaded on installation. It might take some time depending on the size of the dependencies, but it’s not a really huge burdon to have unused dependencies laying around.

I don’t think shards is based on go packages. There are of course some similarities, and maybe parts have served as inspiration. But as far as I know about go packages, they’re actually not that much alike (at least not more than many other similar systems).

The standard Python package manager, Pip, lets you install from specify an optional subdirectory where top level of the package is located within a Git repository.

Maybe a similar interface can be adapted for Crystal?

It can infer the name of the package from the meta-data once the repository is cloned, but you can also explicitly specify the name of the package on the command line in order to avoid re-cloning the repo just to check the metadata.

This makes it relatively easy to distribute multiple packages in the same Git repo, or to otherwise be flexible about where the top level of a package is located within the repo.

Edit: relying on Git tags for versioning makes things more difficult. In Python, the package states its own version, independent of any tag in source control. There are tools that will automatically update that version number for you, based on Git tags, but the tags themselves aren’t relevant for versioning.

shard.yml specifies the current version as well. But that’s not very useful for discovering a versions list (which is essential for dependency resolution). You would have to look at the content of shard.yml in every commit in order to find out which versions are available. Git tags make this really easy.

Update: Checkout How I migrated Athena to a Monorepo...and you can too.

2 Likes