Awesome, let’s compare notes =)
I’ve already taken looks at similar projects, particularaly https://docs.rs/ and https://godoc.org/ (although the latter is in the process of being replaced by https://pkg.go.dev/ - I’m not sure whether the internal implementation has changed).
As already mentioned, I wouldn’t couple it tightly with the shards database but some parts like discovery of new releases can be shared.
My general idea was to keep the external interface as small as possible. So the doc build server would just react to requests for documentation. The request contains the location of the repository and an optional git ref. When it encouters a request for a repo it hasn’t seen before and thus is not readily available, it would fetch that repo and build the documentation. This should typically only take a few seconds. On the first request, the result would be empty and ask to try again after a short while.
The build delay can obviously be skipped by requesting the docs on the doc service beforehand, for example while running CI when releasing a new version. That should make the docs already available as the first real requests arrive.
I’d like the docs server internally to only use repositories directly, instead of shard names as on shardbox.org. That gives more freedom and flexibility, because you can also build documentation for repositories not listed on shardbox, for development branches and mirrors.
This basic interface can obviously be wrapped by a more human-friendly front, for example using shard names from shardbox database to identify repositories. Then you can ask for kemal
docs instead of github.com/kemalcr/kemal
docs.
I’m not entirely settled on the dataformat of the doc build. Storing the completely rendered doc generator output is easy to implemet and you just need to serve static HTML sites. But it also comes with a lot of overhead: all the HTML files of stdlib docs are about 80MB unzipped, the index.json - which also contains all the content but is still not very optimized in terms of duplicates - is only 10MB.
Storing only the content in a data structure and building the output on the fly offers more flexibility. Transparent HTTP caching can still be used to avoid rebuilding the output on every single request.
The core functionality should only be serving structured data (JSON). Not just the entire content of each build’s index.json, but also selectively individual namespaces.
Generating the HTML frontend on top of that might actually be considered a separate project. At least it’s very easy to separate when you have a simple JSON API in between. So implementation would just be fetching content from the docs server and presenting that in HTML. Probably similar to what crystal docs
currently produces, but it could also be very different. Maybe even a jamstack app that directly talks to the docs API and renders HTML on the client. We don’t need to worry about the specifics right now, because with a simple API it’s easy to integrate any web frontend.
Regarding the build process itself, every build should obviously happen in a sandbox. I’m not yet sure about the best approach here, but a solution based on docker/runc would probably be relatively easy to implement.
For most shards, building docs probably shouldn’t be much more than checking out the source code and running crystal docs
. But things can get more complicated when a shard needs extra dependencies for building its docs. A couple of standard tools should probably be available in every sandbox, but that probably won’t suffice for all cases. So for some shards we might need to add extra tools/dependencies to the sandbox. But that could be configured per repository.
That’s a few of my thoughts so far. I’ve some more detailed ideas, but that would go too deep for now.