One solution would be to run crystal docs
in a sandboxed environment. Not HTTP connections allowed, just read and write from the local project directory.
Yes, thatās the general plan with the current behaviour of the docs generator.
But my idea was to maybe change the compiler to not allow run
macro (or anything else that could do damage) when generating docs. This would help to make sure that when you can build docs locally, they also build on the docs server. It would also incentivise developers to write better docs because the documented API should really not be influenced by anything except the code thatās in the repository.
I think macro run
can create types and methods. If thatās the case, and if they are referenced elsewhere, crystal docs
will just not work. Maybe not a big deal.
Yes, but I donāt think thatās good practice. Exactly for these reasons. The alternative is using a generator script and storing the generated code in the repo. Thatās much more reliable and better allows to track changes.
I seriously doubt run
macro is used much anyway, and even less so for generating public API. If it only generates nodoc code, it doesnāt matter for the doc generator.
Agreed that a worker should generate docs behind the scenes and deploy the output to a static file server. I can get some cloud resources to do at least the static file hosting if folks want. Probably the background workers too.
I had been thinking about it over the weekend and wanted to write a proposal down. This is rough and Iām certainly ok with modifying/scrapping it!
CDN
I think Cloudflare is the best choice for a CDN to serve static files. Itās not tied to a major cloud provider, has a free tier, and greatly reduces the load on the origin server below.
Origin for the CDN
The CDN requires an origin server on a miss. @RX14 we can keep your colo server maybe? If you donāt want to keep running it I think the next best thing would be to use a cloud blob storage system like S3/Google cloud storage/Azure blob store (disclosure: I work in a MS Azure group. I donāt think it matters what we use in this case.
Background workers
This one is the most complex I think. I would split this up into a crawler and a processor that runs crystal docs
and sends static files to the origin database.
Crawler
The simplest solution IMO is to run an endless loop somewhere on a managed platform so we donāt have to deploy to VMs. Some that Iāve personally used that work well for this kind of background processing
We discussed above that it checks the existing shardbox database. It can keep a HashMap of shards itās already seen and send new ones to the processor via an API call.
If we are going to disallow macros, the crawler can just run crystal docs
and we can ignore the processor section below. Otherwise, read on.
Processor
The processor is a server that handles API calls from the crawler and dispatches them to isolated runtimes that run crystal docs
and send the static docs to the aforementioned CDN origin.
The Go playground executes arbitrary code with the gVisor secure container runtime and I think we can adopt some of their rough architecture: https://talks.golang.org/2019/playground-v3/playground-v3.slide#24.
We might find some ways to simplify this, but the basic idea of an HTTP server firing up a secure runtime sounds about right to me.
Let me know what you think.
I think the crawler already exists as part of shardbox. A CDN (cloudflare) in front of my server would be how I would deploy it. I donāt think thereās much need to split up the frontend from the documentation building prematurely. Worry about horizontal scaling when you need to scale. I donāt anticipate it in the near future.
First thing Iād work on would be the sandboxing, write a little crystal script which handles cloning the repo (or pulling), checking out the tag, and building the docs, along with copying them out. Then build a HTTP server to serve these files. Then make it do it on demand. Then make it show a progress bar while itās building. Then integrate with the crawler.
This is just how Iād approach it, off the top of my head.
This is getting into the weeds a little bit though. I think we all agree on approach. The details of the plan never survive contact with the real world.
Ah ok, great to know.
Sounds good.
Iām not understanding, how is this different from the crawler? Would this be an addition to it?
Sounds about right . Let me know what youāre thinking of for next steps.
The crawler finds new repos and releases
this docs server - when asked - builds docs for a given repo and release
My personal next steps are largely sorting out my colo server.
I see. Iāll start a docs server that can do crystal docs
s on demand for now. Should be fairly easy to hook it up to a background docs builder later on
That are already some very detailed ideas. Before thinking about deployment and even CDN weāll need to get at least a working prototoype. Thereās no need to artificially boost complexity right from the start.
Also to put scaling requirements into perspective: On average shardbox currently sees less than 10 new releases per day. The highest number of new releases was 30. Last week saw a total of 100, which is most likely caused by updates for the new Crystal release.
Even considering a generous growth factor: A typical crystal docs
run should only take few seconds, so thereād really be much room to grow even with a most simple single-threaded worker loop implementation.
I would target optimizations mostly towards performance, to deliver a build result as fast as possible. This will also help with throughput.
The most time is probably spent checking out the git repo. But that shouldnāt be too bad when the repo is locally cached and only needs to pull the delta. Might also consider keeping the workdir checked out. But Iām not sure whether that would improve a lot. Maybe for larger reposā¦
The shardbox worker already pulls the repos to check for new versions, so it would make sense to share the local git cache to speed up checkout time.
The first step from my perspective would be https://github.com/crystal-lang/crystal/issues/6947
The JSON format needs to be revisted and fixed. I donāt want to publish a public service when we know the data format it uses is in dire need to be refactored.
Prototypes can work with the current format, so that doesnāt block other efforts.
Slightly OT but just to note, Iām aware and rebuilding on top of another tool is on my todo list, just didnāt find the motivation yet. Back when I build carc.in it was actively developed and the maintainer was responsive.
One not so far mentioned alternative I have on my evaluation list is nsjail. A lot of sandboxing/containerization tools overlook the syscall filter part a bit. I will miss playpenās ability to build up a syscall whitelist from a couple of example runs :/
The other day I reworked crystal-gobject into a run macro, so that generates tons and tons of public API through it alone :D Keeping all the generated code in a repository was boring and verbose. This approach also doesnāt have me worry so much anymore about compatibility of the generated bindings to actually installed library versions.
@straight-shoota itās likely that I over-engineered the docs backend. I guess, file it under ideas for the future if theyāre ever needed
I can start with a simple service that checks out code, runs crystal docs
, and has a simple k/v store to cache docs. The problem is the sandbox, so Iāll leave the thing as a prototype until we figure that and the format in https://github.com/crystal-lang/crystal/issues/6947. Iāll take a look at https://github.com/crystal-lang/crystal/issues/6947 today.