Most of you have heard of Mike Perham and his Ruby background job processing called Sidekiq. Sidekiq is almost seven years old now and it has some architecture flaws from the past. I’ve felt those flaws a lot when using Sidekiq in my Ruby projects in the past.
But better than me, Kir Shatrov, a leading engineer in Shopify, has described many problems the biggest E-commerce software faces when using Sidekiq and its siblings in his outstanding article – The State of Background Jobs in 2019. It inspired me very much on solving the described issues, creating my own solution.
In the same time, Mike is accumulating his experience too and creating a whole new background job processing software from the scratch and calls it Faktory. Faktory is polyglot (workers can be in any language) and written in Go.
However, Faktory still suffers from the same restrictions Sidekiq has – it uses Redis as its job storage and relies on queues. Furthermore, IIRC, Faktory aims to preserve Sidekiq API in its official Ruby worker implementation, which involves certain boundaries as well.
Both Faktory and Sidekiq are great for fast development cycle, they solve common issues. But as I and Shopify note, they have problems with scalability. Sidekiq does not support clustered Redis. Faktory hides Redis under the hood, making clustering an issue as well. Both systems rely on named queues, and they don’t know anything about multi-tenancy. There is also a known issue of ensuring that a job could be run exactly once, which leads to the requirement of locks.
Then comes Crystal
Redis is commonly used in existing background job processing systems because of its speed. But the sad truth is that it is not designed for worker-job use-cases. The recent streams feature shipped in 5.0 is somewhat experimental and lacks obvious features needed for better background job processing. It is also single-threaded, which, in addition to RAM boundaries, makes it highly CPU-dependent.
Crystal has speed comparable to C. Why not building a pure Crystal solution which would be designed exclusively for efficient background job processing? The language has great Socket API. And it could be paralleled right now (it would be even easier in upcoming releases). So, it is possible to get rid of Redis and store jobs for immediate processing right within Crystal application, taking the advantage of multi-threading.
But what about scheduling jobs and storing them for analysis etc? We need a disk storage to be not dependent on RAM. Well, there is SQLite. It is embeddable and fast. It allows making up to a million bulk inserts per second and has great reading performance. The problem is solved.
The resulting solution would be a single binary without any external dependencies – no Redis, no third-party SQL database server. Great! That’s how the Worcr idea was confirmed.
Worcr is the next-gen background job processing system which runs in a single binary. It stores data both in RAM and embedded SQLite database, which allows processing trillions of jobs without hitting the RAM limit. Worcr is multi-threaded and allows clustering itself for enhanced performance and reliability.
Dumb workers, smart dispatcher
Workers are separate processes written in any language which connect to the main Worcr instance and wait for a job to perform. Worcr takes care of proper dispatching, preserving limits, running scheduled jobs and so on.
Existing background job processing solutions rely on named queues to control the job execution order. It works nicely in the beginning, but soon you’ll encounter into a problem of having too many queues. You need more granularity on this matter. Kir Shatrov brings a concept of job weights, and I find it to be a perfect solution.
What if you have a job which should be processed differently for different users? For example, you have an
ImportGoods job and premium users who expect this job to perform immediately?
In Worcr, you can tag both jobs and workers and create rules for tag combinations. In this case, you would create a rule like if a job has “premium” tag and a worker has this tag too, then multiply the job weight by 10. And then you’d have a couple of workers with the “premium” tag. Such rule would prioritize premium jobs to run on premium workers. You can also create a rule to completely prohibit running non-premium jobs on premium workers.
You can have an arbitrary amount of job and worker tag combinations, as well as rules for them.
Kir brings a great example describing a user with 100k import jobs and other users with 100 jobs each. A smart dispatcher would distribute the jobs evenly between the users, trying to perform jobs with the same speed for all users. Worcr is able to do it with multiple strategies.
There is crucial feature of a truly reliable job processing system – jobs interrupt-ability. Worcr has concept of checkpoints which allow creating milestones for a job. When re-run, a job would continue from that milestone instead of starting from the beginning.
For example, you have a daily digest job and a million of users. You don’t want to send the digest twice to the same user once the job is re-run, so you’d have a
userID checkpoint updated on every e-mail sent successfully. Worcr allows to have an arbitrary amount of checkpoints for a single job.
This feature is complimentary to checkpoints. Workers can alert Worcr instance of a job’s progress, which allows to display it in the UI.
Taking the same digest example, it would be great to track a job’s progress. In this case it would be something like emails sent: 10,000/1,000,000 (1%). You can have an arbitrary amount of progresses per job.
expires_at value to the desired timestamp – right within Worcr, not in your client-facing application.
Worcr supports all other features expected from a background job processing system – window, bucket and concurrency limiting based on tags, job arguments or dynamic functions. CRONtab jobs. Job expirations. Job retrying in case of failure. Logging a job’s output in one place. Webhooks to notify about events.
Ease of use
Worcr is a nice alternative to existing background job processing solutions because it allows easier deployment, as it’s being a single binary without any external dependencies. Its workers can be written in any language and are relatively simpler, because they don’t care about a job’s queue, weight, uniqueness etc – all they do is processing jobs dispatched to them.
Worcr is an open-core software. It has an open-source LGPL-licensed version and commercial versions with more features. It does not limit you on the amount of servers/workers you have and does not have any tracking/license-key-checking code built-in.
Pro version gives you access to private GitHub team, which leads to the GitHub-flow you’re used to. You can create issues and pull-requests in those private repositories.
Pro users also gain access to a private Twist channel, which is used for async communication between all other Pro users and Worcr team. You can contact the Worcr team directly via Twist if you want to.
Enterprise version users can have their teammates join a private Twist team with channels dedicated exclusively to your and Worcr team. So we can communicate, together, privately.
Worcr can (and will) become one of those “killer” projects for Crystal, as it doesn’t require Crystal to enter (binaries are shipped via OS repositories and Docker), but if a company wants to extend its functionality, it would need to start learning Crystal or hire Crystal developers.
Currently Worcr is in pre-alpha state. I have most of the functionality thought, benchmarked and experimented of. I would release an alpha version (0.0.*) in May. It would include most of the Pro features and web UI, as well as Crystal worker code. Beta version (0.*.*) is expected to be released in July, and it would bring Ruby and Go workers as well as multi-threading (hopefully). Enterprise version and full release (1.*.*) is expected to be shipped this autumn.
To preserve the consistent speed of development (and, also, to eat), I need support from the Crystal community. I, Vlad Faust, have spent the last two years on Crystal, expanding its ecosystem with cool shards and even a recently released framework.
You can support Worcr development purchasing Pro license right now (it can be done online at https://worcr.com), before the code is released. Pro version would cost $250/mo, but you can subscribe to it for $50/mo, which is 80% cheaper. If you subscribe now, such a reduced price will be active for you forever, unless you explicitly cancel the subscription. Which means that on release, when the price is $250/mo, you still will be paying $50/mo for the full Pro access!
This price will be available until the Alpha release in May. In addition to discount, you’ll get a benefit of multiple free job postings on Crystal Jobs (it will become paid after it is released) and perpetual place in Worcr credits.
Supporting Worcr, you also support Onyx development, because framework issues are going to be solved along with the development progress of Worcr. Furthermore, it ensures Crystal Jobs development, as its new version relies on Onyx and Worcr as well. Needless to say about all other shards I maintain.
Alternatively, you can become my patron – the pledges would be spent on Crystal stuff as well.
I’m very thankful to the core team and the language and ecosystem contributors, because the thing we’ve all created is absolutely awesome. As I’ve said a couple of years ago, Crystal is the number one language for me and potentially millions of other developers who value expressiveness and convenience in daily work.
We are currently on the very edge before Crystal bursts in adoption among thousands of companies and millions of solo developers around the world. A few steps left before everything changes to the greater developer experience, reduced energy consumption and better code practices.
Thank you, Crystal. And thanks to everyone who supports me.