Hosted documentation site

asterite · April 11, 2020, 3:34pm

One solution would be to run crystal docs in a sandboxed environment. Not HTTP connections allowed, just read and write from the local project directory.

straight-shoota · April 11, 2020, 4:07pm

Yes, that’s the general plan with the current behaviour of the docs generator.
But my idea was to maybe change the compiler to not allow run macro (or anything else that could do damage) when generating docs. This would help to make sure that when you can build docs locally, they also build on the docs server. It would also incentivise developers to write better docs because the documented API should really not be influenced by anything except the code that’s in the repository.

asterite · April 11, 2020, 4:52pm

I think macro run can create types and methods. If that’s the case, and if they are referenced elsewhere, crystal docs will just not work. Maybe not a big deal.

straight-shoota · April 11, 2020, 5:01pm

Yes, but I don’t think that’s good practice. Exactly for these reasons. The alternative is using a generator script and storing the generated code in the repo. That’s much more reliable and better allows to track changes.
I seriously doubt run macro is used much anyway, and even less so for generating public API. If it only generates nodoc code, it doesn’t matter for the doc generator.

arschles · April 13, 2020, 8:33pm

Agreed that a worker should generate docs behind the scenes and deploy the output to a static file server. I can get some cloud resources to do at least the static file hosting if folks want. Probably the background workers too.

I had been thinking about it over the weekend and wanted to write a proposal down. This is rough and I’m certainly ok with modifying/scrapping it!

CDN

I think Cloudflare is the best choice for a CDN to serve static files. It’s not tied to a major cloud provider, has a free tier, and greatly reduces the load on the origin server below.

Origin for the CDN

The CDN requires an origin server on a miss. @RX14 we can keep your colo server maybe? If you don’t want to keep running it I think the next best thing would be to use a cloud blob storage system like S3/Google cloud storage/Azure blob store (disclosure: I work in a MS Azure group. I don’t think it matters what we use in this case.

Background workers

This one is the most complex I think. I would split this up into a crawler and a processor that runs crystal docs and sends static files to the origin database.

Crawler

The simplest solution IMO is to run an endless loop somewhere on a managed platform so we don’t have to deploy to VMs. Some that I’ve personally used that work well for this kind of background processing

We discussed above that it checks the existing shardbox database. It can keep a HashMap of shards it’s already seen and send new ones to the processor via an API call.

If we are going to disallow macros, the crawler can just run crystal docs and we can ignore the processor section below. Otherwise, read on.

Processor

The processor is a server that handles API calls from the crawler and dispatches them to isolated runtimes that run crystal docs and send the static docs to the aforementioned CDN origin.

The Go playground executes arbitrary code with the gVisor secure container runtime and I think we can adopt some of their rough architecture: https://talks.golang.org/2019/playground-v3/playground-v3.slide#24.

We might find some ways to simplify this, but the basic idea of an HTTP server firing up a secure runtime sounds about right to me.

Let me know what you think.

RX14 · April 13, 2020, 8:53pm

I think the crawler already exists as part of shardbox. A CDN (cloudflare) in front of my server would be how I would deploy it. I don’t think there’s much need to split up the frontend from the documentation building prematurely. Worry about horizontal scaling when you need to scale. I don’t anticipate it in the near future.

First thing I’d work on would be the sandboxing, write a little crystal script which handles cloning the repo (or pulling), checking out the tag, and building the docs, along with copying them out. Then build a HTTP server to serve these files. Then make it do it on demand. Then make it show a progress bar while it’s building. Then integrate with the crawler.

This is just how I’d approach it, off the top of my head.

This is getting into the weeds a little bit though. I think we all agree on approach. The details of the plan never survive contact with the real world.

arschles · April 13, 2020, 9:58pm

Ah ok, great to know.

Sounds good.

I’m not understanding, how is this different from the crawler? Would this be an addition to it?

Sounds about right . Let me know what you’re thinking of for next steps.

RX14 · April 13, 2020, 10:14pm

The crawler finds new repos and releases

this docs server - when asked - builds docs for a given repo and release

My personal next steps are largely sorting out my colo server.

arschles · April 13, 2020, 10:18pm

I see. I’ll start a docs server that can do crystal docss on demand for now. Should be fairly easy to hook it up to a background docs builder later on

straight-shoota · April 14, 2020, 9:54am

That are already some very detailed ideas. Before thinking about deployment and even CDN we’ll need to get at least a working prototoype. There’s no need to artificially boost complexity right from the start.

Also to put scaling requirements into perspective: On average shardbox currently sees less than 10 new releases per day. The highest number of new releases was 30. Last week saw a total of 100, which is most likely caused by updates for the new Crystal release.
Even considering a generous growth factor: A typical crystal docs run should only take few seconds, so there’d really be much room to grow even with a most simple single-threaded worker loop implementation.

I would target optimizations mostly towards performance, to deliver a build result as fast as possible. This will also help with throughput.

The most time is probably spent checking out the git repo. But that shouldn’t be too bad when the repo is locally cached and only needs to pull the delta. Might also consider keeping the workdir checked out. But I’m not sure whether that would improve a lot. Maybe for larger repos…
The shardbox worker already pulls the repos to check for new versions, so it would make sense to share the local git cache to speed up checkout time.

The first step from my perspective would be https://github.com/crystal-lang/crystal/issues/6947
The JSON format needs to be revisted and fixed. I don’t want to publish a public service when we know the data format it uses is in dire need to be refactored.
Prototypes can work with the current format, so that doesn’t block other efforts.

jhass · April 14, 2020, 1:24pm

Slightly OT but just to note, I’m aware and rebuilding on top of another tool is on my todo list, just didn’t find the motivation yet. Back when I build carc.in it was actively developed and the maintainer was responsive.

One not so far mentioned alternative I have on my evaluation list is nsjail. A lot of sandboxing/containerization tools overlook the syscall filter part a bit. I will miss playpen’s ability to build up a syscall whitelist from a couple of example runs :/

jhass · April 14, 2020, 1:30pm

The other day I reworked crystal-gobject into a run macro, so that generates tons and tons of public API through it alone :D Keeping all the generated code in a repository was boring and verbose. This approach also doesn’t have me worry so much anymore about compatibility of the generated bindings to actually installed library versions.

arschles · April 14, 2020, 4:15pm

@straight-shoota it’s likely that I over-engineered the docs backend. I guess, file it under ideas for the future if they’re ever needed

I can start with a simple service that checks out code, runs crystal docs, and has a simple k/v store to cache docs. The problem is the sandbox, so I’ll leave the thing as a prototype until we figure that and the format in https://github.com/crystal-lang/crystal/issues/6947. I’ll take a look at https://github.com/crystal-lang/crystal/issues/6947 today.

Topic		Replies	Views
CrystalDoc.info - Hosted Shard API Documentation News	13	568	October 16, 2024
I would like to create an global index for the Crystal lang documenation Community	31	559	April 1, 2021
Making Documentation Better	22	2115	May 11, 2020
Crystal 0.35.1 api offline documentation	2	392	August 13, 2020
Docr - Crystal documentation tool Community release	14	757	April 28, 2023

Hosted documentation site

CDN

Origin for the CDN

Background workers

Crawler

Processor

Related topics