Announcing Worcr, your help needed

#1

Grandfather standard

Most of you have heard of Mike Perham and his Ruby background job processing called Sidekiq. Sidekiq is almost seven years old now and it has some architecture flaws from the past. I’ve felt those flaws a lot when using Sidekiq in my Ruby projects in the past.

But better than me, Kir Shatrov, a leading engineer in Shopify, has described many problems the biggest E-commerce software faces when using Sidekiq and its siblings in his outstanding article – The State of Background Jobs in 2019. It inspired me very much on solving the described issues, creating my own solution.

In the same time, Mike is accumulating his experience too and creating a whole new background job processing software from the scratch and calls it Faktory. Faktory is polyglot (workers can be in any language) and written in Go.

However, Faktory still suffers from the same restrictions Sidekiq has – it uses Redis as its job storage and relies on queues. Furthermore, IIRC, Faktory aims to preserve Sidekiq API in its official Ruby worker implementation, which involves certain boundaries as well.

Both Faktory and Sidekiq are great for fast development cycle, they solve common issues. But as I and Shopify note, they have problems with scalability. Sidekiq does not support clustered Redis. Faktory hides Redis under the hood, making clustering an issue as well. Both systems rely on named queues, and they don’t know anything about multi-tenancy. There is also a known issue of ensuring that a job could be run exactly once, which leads to the requirement of locks.

Then comes Crystal

Redis is commonly used in existing background job processing systems because of its speed. But the sad truth is that it is not designed for worker-job use-cases. The recent streams feature shipped in 5.0 is somewhat experimental and lacks obvious features needed for better background job processing. It is also single-threaded, which, in addition to RAM boundaries, makes it highly CPU-dependent.

Crystal has speed comparable to C. Why not building a pure Crystal solution which would be designed exclusively for efficient background job processing? The language has great Socket API. And it could be paralleled right now (it would be even easier in upcoming releases). So, it is possible to get rid of Redis and store jobs for immediate processing right within Crystal application, taking the advantage of multi-threading.

But what about scheduling jobs and storing them for analysis etc? We need a disk storage to be not dependent on RAM. Well, there is SQLite. It is embeddable and fast. It allows making up to a million bulk inserts per second and has great reading performance. The problem is solved.

The resulting solution would be a single binary without any external dependencies – no Redis, no third-party SQL database server. Great! That’s how the Worcr idea was confirmed.

logo
* That’s a factory with three pipes on the logo

Worcr is the next-gen background job processing system which runs in a single binary. It stores data both in RAM and embedded SQLite database, which allows processing trillions of jobs without hitting the RAM limit. Worcr is multi-threaded and allows clustering itself for enhanced performance and reliability.

Dumb workers, smart dispatcher

Workers are separate processes written in any language which connect to the main Worcr instance and wait for a job to perform. Worcr takes care of proper dispatching, preserving limits, running scheduled jobs and so on.

Weights

Existing background job processing solutions rely on named queues to control the job execution order. It works nicely in the beginning, but soon you’ll encounter into a problem of having too many queues. You need more granularity on this matter. Kir Shatrov brings a concept of job weights, and I find it to be a perfect solution.

Tags

What if you have a job which should be processed differently for different users? For example, you have an ImportGoods job and premium users who expect this job to perform immediately?

In Worcr, you can tag both jobs and workers and create rules for tag combinations. In this case, you would create a rule like if a job has “premium” tag and a worker has this tag too, then multiply the job weight by 10. And then you’d have a couple of workers with the “premium” tag. Such rule would prioritize premium jobs to run on premium workers. You can also create a rule to completely prohibit running non-premium jobs on premium workers.

You can have an arbitrary amount of job and worker tag combinations, as well as rules for them.

Multi-tenancy

Kir brings a great example describing a user with 100k import jobs and other users with 100 jobs each. A smart dispatcher would distribute the jobs evenly between the users, trying to perform jobs with the same speed for all users. Worcr is able to do it with multiple strategies.

Checkpoints

There is crucial feature of a truly reliable job processing system – jobs interrupt-ability. Worcr has concept of checkpoints which allow creating milestones for a job. When re-run, a job would continue from that milestone instead of starting from the beginning.

For example, you have a daily digest job and a million of users. You don’t want to send the digest twice to the same user once the job is re-run, so you’d have a userID checkpoint updated on every e-mail sent successfully. Worcr allows to have an arbitrary amount of checkpoints for a single job.

Progress

This feature is complimentary to checkpoints. Workers can alert Worcr instance of a job’s progress, which allows to display it in the UI.

Taking the same digest example, it would be great to track a job’s progress. In this case it would be something like emails sent: 10,000/1,000,000 (1%). You can have an arbitrary amount of progresses per job.

Embedded functions

In Worcr, you can have dynamic job limits, retries, expirations, tag rules and more, which are calculated in the Worcr instance itself. You can write these functions in embeddable languages (currently Javascript powered by duktape.cr and Lua powered by lua.cr).

Classic example – you don’t want to run a daily digest job after 12 hours have passed since a job enqueuing moment. You can write a simple Javascript function which would set the job’s expires_at value to the desired timestamp – right within Worcr, not in your client-facing application.

Other features

Worcr supports all other features expected from a background job processing system – window, bucket and concurrency limiting based on tags, job arguments or dynamic functions. CRONtab jobs. Job expirations. Job retrying in case of failure. Logging a job’s output in one place. Webhooks to notify about events.

Positioning

Ease of use

Worcr is a nice alternative to existing background job processing solutions because it allows easier deployment, as it’s being a single binary without any external dependencies. Its workers can be written in any language and are relatively simpler, because they don’t care about a job’s queue, weight, uniqueness etc – all they do is processing jobs dispatched to them.

Licensing

Worcr is an open-core software. It has an open-source LGPL-licensed version and commercial versions with more features. It does not limit you on the amount of servers/workers you have and does not have any tracking/license-key-checking code built-in.

Support

Pro version gives you access to private GitHub team, which leads to the GitHub-flow you’re used to. You can create issues and pull-requests in those private repositories.

Pro users also gain access to a private Twist channel, which is used for async communication between all other Pro users and Worcr team. You can contact the Worcr team directly via Twist if you want to.

Enterprise version users can have their teammates join a private Twist team with channels dedicated exclusively to your and Worcr team. So we can communicate, together, privately.

Current state

Worcr can (and will) become one of those “killer” projects for Crystal, as it doesn’t require Crystal to enter (binaries are shipped via OS repositories and Docker), but if a company wants to extend its functionality, it would need to start learning Crystal or hire Crystal developers.

Currently Worcr is in pre-alpha state. I have most of the functionality thought, benchmarked and experimented of. I would release an alpha version (0.0.*) in May. It would include most of the Pro features and web UI, as well as Crystal worker code. Beta version (0.*.*) is expected to be released in July, and it would bring Ruby and Go workers as well as multi-threading (hopefully). Enterprise version and full release (1.*.*) is expected to be shipped this autumn.

Worcr is built on top of Onyx Framework (I told you, it’s not exclusively for web) and relies on some of its unreleased features (for example, see Onyx::EDA issues).

Halp

To preserve the consistent speed of development (and, also, to eat), I need support from the Crystal community. I, Vlad Faust, have spent the last two years on Crystal, expanding its ecosystem with cool shards and even a recently released framework.

You can support Worcr development purchasing Pro license right now (it can be done online at https://worcr.com), before the code is released. Pro version would cost $250/mo, but you can subscribe to it for $50/mo, which is 80% cheaper. If you subscribe now, such a reduced price will be active for you forever, unless you explicitly cancel the subscription. Which means that on release, when the price is $250/mo, you still will be paying $50/mo for the full Pro access!

This price will be available until the Alpha release in May. In addition to discount, you’ll get a benefit of multiple free job postings on Crystal Jobs (it will become paid after it is released) and perpetual place in Worcr credits.

Supporting Worcr, you also support Onyx development, because framework issues are going to be solved along with the development progress of Worcr. Furthermore, it ensures Crystal Jobs development, as its new version relies on Onyx and Worcr as well. Needless to say about all other shards I maintain.

Alternatively, you can become my patron – the pledges would be spent on Crystal stuff as well.

Thanks

I’m very thankful to the core team and the language and ecosystem contributors, because the thing we’ve all created is absolutely awesome. As I’ve said a couple of years ago, Crystal is the number one language for me and potentially millions of other developers who value expressiveness and convenience in daily work.

We are currently on the very edge before Crystal bursts in adoption among thousands of companies and millions of solo developers around the world. A few steps left before everything changes to the greater developer experience, reduced energy consumption and better code practices.

Thank you, Crystal. And thanks to everyone who supports me.

12 Likes

#2

This looks really cool. The website is stellar and the writeup is fantastic. I love that you’re also trying to monetize so you can keep working on it.

I have a question about sqlite though. It seems this would make it difficult to share across containers/nodes/servers. For example if I have a multiple Heroku dynos there is no way to share the sqlite instance right? Also many things like GCP, EC2 and Heroku have ephemeral storage, so how would that work with Worcr and sqlite?

I also wonder how good sqlite is at ensuring consistency. How is the tooling for backups, followers, etc. And how comfortable will devops people be using sqlite

I’m sure you’ve thought of this I’m just curious what the answers are.

1 Like

#3

I’m also wondering why SQLite, what are its advantages? For instance RocksDB is efficient and can perform multiple transactions.

1 Like

#4

@paulcsmith, @j8r, thanks for your questions!

Arbitrary database location

Worcr instance depends on a single DATABASE_URL variable, which would look something like /opt/worcr/db.sqlite by default. Once Worcr instance is launched, it looks up for an existing database file and creates one if needed. The DATABASE_URL variable can be set to any other SQLite file location. It also can be :memory: for in-memory databases or a foreign host path.

So, answering @paulcsmith’s question about ephemeral storages – a developer could spin up an SQLite instance somewhere else and pass the according DATABASE_URL variable to a Worcr instance.

It is also possible to have a third-party SQLite pool solution like RQLite if you need DYI replication. It utilizes Raft protocol, however, and leads to worse performance.

Summing it up, the difficulty of database maintenance totally depends on developer. They can have a single file (which can be backed up with simple copying), in-memory space or a hosted SQLite instance – Worcr doesn’t care much. Of course, workers (those which actually perform jobs) wouldn’t know anything about the database, because they are connected directly to a Worcr instance via TCP sockets.

Clustering

I also thought about clustering. Standard Raft protocol wouldn’t work here because the whole leader-followers architecture brings certain performance penalties. I’ve implemented (on paper) a replication algorithm which would work nicely in this particular case.

Multiple Worcr instances would have the same priority (i.e. no single leader) and be able to synchronize fast and distribute jobs evenly between connected workers. Each of them would have its own SQLite instance (again, provided via DATABASE_URL with meaningful default value). If all nodes use a file as a database, then they would be synchronized with other nodes with the help of SQLite’s WAL files.

Clustering is planned to be an Enterprise-only feature. That’s because a single Worcr instance would be able to handle proper amount of jobs for most of businesses.

Of course, there can be issues with proper replication implementation, but well, I have some time until autumn :smile:

Why SQLite

@j8r, SQLite is quite efficient as well. I’ve tested it to be able to insert up to a million entries in bulk on my SSD machine. It also has great reading performance and supports multi-threading. It takes some time to master it, understanding locks, for example. But it wouldn’t be a problem for a Worcr user, as it all happens under the hood.

Furthermore, using embedded SQLite as data storage allows getting rid of third-party dependencies, so it would be very simple to try out using Worcr – just launch its binary and you’re all set.

1 Like

#5

I have been waiting for Worcr to be released and am super excited to try it out.

I have a crystal project in maintenance that I might have to convert to worcr once it is released.

I have some feature ideas that I have implemented in other background worker frameworks is there a way to request features?

0 Likes

#6

Well, yes, @wontruefree. You can join https://gitter.im/worcr to discuss stuff :smiley:

0 Likes

#7

In addition to my reply to Paul – IIRC Heroku has a concept of “services”, for example, PostgreSQL is a service. Worcr could be treated as a service as well. As I’ve mentioned above, it does not require an external database connection by default, so it could be understood as self-sustaining application, which, in theory, could be deployed to Heroku with a single button click.

0 Likes

#8

Hey vlad, it’s great to hear about new ideas for shards improving the Crystal ecosystem and beyond.

However, I’d like to point out some tripping points.

As you already mentioned, job queue worker implementations are not a new thing. There are a bunch of them available, even implemented in Crystal or with Crystal clients. Sidekiq is obviously very famous in the Ruby ecosystem, especially because Mike managed to make a business out of it. That’s something you strive for as well. So essentially, you need to convince people to not use sidekiq with it’s big community and ecosystem of tutorials, plugins and coverage and worcr instead. That’s a really hard selling point, you need some killer features to be even able to enter the competition.

There is also Faktory from the same author, which seems to be very similar to your description of worcr. And honestly, I fail to see a major advantage over Faktory, which obviously has a much stronger starting position given its background and the fact that it’s already a productive software.

Some of the advantages you mention are rather insignificant. For example the official Ruby client having an API similar to Sidekiq. That’s really understandable and makes it easy for people to switch. And it’s really easy to create an alternative Ruby client implementation with a dedicated, improved API. Worcr doesn’t even have a single client for any language yet.

You critizise Faktory for relying on (intransparent) Redis and associated scalability problems. But you propose a SQLite data store instead. That doesn’t make any sense. In several ways, SQLite is to be considered less scalable than Redis.
You’re right that Redis isn’t ideal for queue data management. But that doesn’t seem to matter too much when there are very successful, battle tested implementations build on that.

Besides these design questions, there are certainly a lot of features that you propose for worcr and are missing in Faktory and/or sidekiq. But again, they’re already working solutions and such features could very well be added if they’re really in demand. Worcr needs to first catch up to the basic feature level before providing on top.

I don’t want to discourage you from your efforts, I just want to express some reasonable doubts about the feasibility of your endeavour. Since you intend to build a business on this, you should take a look at the risks of getting any success out of it. You certainly seem very convinced about your idea, that’s obviously a good thing!

What really confuses me that you already have a very detailed product matrix. Having a roadmap of intended features is one thing. But these very specific plans and their pricing don’t seem like much more than hot air. Without any line of code published and still months away from a public release, with a single developer financing it out of his pocket, it doesn’t look like a very convincing business plan.

2 Likes

#9

This is far beyond the scope of my knowledge. But @vladfaust you are a nice person and helpful, so… I hope this is successful, you deserve it

Good luck!

1 Like

#10

Any feedback is welcome. Thanks to Paul and that guy on Reddit, I’m now convinced that using PostgreSQL as long-term job storage is far better option than SQLite. Yes, SQLite gives an ability to be independent from third-party services, but in return it brings maintenance problems – now I see it. It costs me nothing to switch to PostgreSQL right now.

Criticizing shows that you care, and that’s wonderful. Yes, Johannes, there are many obstacles waiting for me on this way. But I’ll stand up against them and finish what I’ve started. I always do. I’ll do my best to create a well-grounded alternative to existing solutions.

Regarding to the “there no single line of code yet” point. I was thinking that I’ve already prove to the community that I tend to keep promises and am truly dedicated to what I’m working on. Therefore I was hoping to find some developers who would support me financially on this path (with benefits for them in return). This post is 90% announcement of my plans with list of features and 10% crowdfunding.

@girng thanks for your kind words. These may be just a few chars for you, but big heartwarming message for me. I appreciate it :pray:

4 Likes

#11

I agree to that. And it’s good to see that you question your own design decisions to strive for better results. But you promoted dependency-free SQLite datastore as a major selling point. Now you’re quickly replacing it because of well-known limitations. Reflecting the confidence you’ve expressed previously, this is not very convincing of the general design decisions you’ve made for worcr.

Maybe you shouldn’t be so detailed and over-confident on design specifics that early in the development process. Things tend to change when you realize, they don’t work out as you initially intended. And nobody expects a fully featured product description this early.

1 Like

#13

Changed my mind and removed my previous feedback as I’ve re-read your message and believe it to be in good faith, so to save the thread going off-topic I’ll leave it removed.

What I do still find to be a valuable distinction is that I see these posts less as a fixed-and-unwavering plan of attack but more of an RFC like ‘hey guys, this is what I’m planning… any feedback?’ and making changes based on that feedback is an important part of improvement. It’s probably that difference in that I see this as a collection of feedback attempt rather than a fixed technical spec so I find it a positive that Vlad and yourself have managed to move this in a positive direction based on the feedback you’ve given :slight_smile:

3 Likes

#14

PostgreSQL is also not as scalable as Redis. A simple Key-Value storage (not specifically Redis) is often more easily scalable. Is SQL really needed? Anyway, you could use RocksDB and have communication directly between Worcr instances. The user won’t even have to know what database is used.

0 Likes

#15

Didn’t Factory use RocksDB and then move away from it?

0 Likes

#16

True https://www.mikeperham.com/2018/10/16/faktory-0.9.0---hello-redis/
TL;DR: replacing RocksDB by Redis is mostly due to documentation/ease of use for the maintainers, despite being moderatly slower.

0 Likes

#17

Small development log here. As I’ve mentioned above, I’ve chosen PostgreSQL as the main storage. The DB would store jobs, schedules, limits, tags, crons and so on. PG also LISTEN/NOTIFY features which works well for multiple Worcr nodes.

I’ve also decided to add Redis to the stack. Redis allows to have locking and heartbeating with minimal developer and machine resources footprint.

Therefore, Worcr instance would depend on both Redis and PostgreSQL services. It requires more devops, but in exchange we gain simpler distribute-ability of Worcr nodes. Plus, as you guys have mentioned above, both of these are well-known and there are already many DBAAS options and best practices.

Regarding to the progress itself. Well, it’s kinda complex task. I’ve rarely dealt with such levels of abstraction. So much things and usecases to keep in mind.

  • I’ve got a TCP/RPC async (i.e. with multiple parallel commands from single connection) Worcr server and client working
  • I’ve got job’s enqueuing with tags and proper distribution of it among the workers queues according to its weight and weight modifiers (also with embedded functions!)
  • I’ve got abstract limits working with Redis – concurrency, bucket, fixed and sliding window; they utilize Redis-side scripting, of course. Abstract because there are two types of limits in Worcr – job limits which apply on an attempt to give a job to a worker, and rate limits which workers acquire from a Worcr node upon performing
  • I’m currently working on proper lifecycle of Worcr and its workers (e.g. a worker wants to shut down)

About weight modifiers and limits. Both of them have match rules, i.e. which job classes, job tags and worker tags to apply themselves to. For example, a weight modifier of x5 which applies on jobs with "premium" tag and workers with "premium" tag, which allows to more precise job distribution. Furthermore, apart from static match rules defined on the DB columns-level, there are also dynamic match rules (both for weight modifiers and job limits), which are essentially Javascript or Lua functions. For example, complex rules like (job tag is "premium" AND "e-mail") AND (worker tag is "premium" AND NOT "disallow-email") should be defined with embedded languages.

I’ve come to understanding what do I want Worcr to be. Ideally, you’d have simple jobs and dumb workers. Just simple processes which await for a payload to come, in any language. All heavy-lifting like limiting, prioritizing, tenancy, load handling, error handling, retrying, logging, integrating with third-parties (via webhooks), analytics – it all should be done via simple GUI without a need to re-implement the wheel. This solution also has to be easily deployable and scalable regardless of your business size – from small job boards with daily e-mail newsletters to gigantic SaaS’es.

And I think I’m on the right path here.

1 Like

#18

What? This sounds very weird. What exactly do you need Redis for that PostgreSQL can’t do and justifies an extra dependency?

0 Likes

#19

I use Redis for worker heartbeats and limit (both job and rate) locks – tasks which are very frequent and ephemeral. Also Redis allows to program with Lua, and it’s easier for me to write Lua rather than plpgsql.

0 Likes

#20

I’m happy with current Redis + PostgreSQL stack.

I’ve moved weighted queues to Redis, which simplifed the code structure. Redis allows to push reliability to the limit. For example, all Worcr nodes LISTEN to PostgreSQL updates. Using Redis allows to smartly lock the update and guarantee that it would be handled exactly once across all the nodes (yes, I know what I’m speaking of, I’m also using PostgreSQL locks and transactions in this case). Worcr will support Redis cluster as well.

Using PostgreSQL allows to store almost infinite amount of jobs, unlike when Redis is the main storage. Moreover, it allows to store any metadata needed, which includes logs and analytics. For instance, workers are able to send logs related to current job performing directly to a Worcr node.

Having two services – one for speed and another for storage – allows to eliminate the need to communicate between nodes at all, which, again, simplifes the code.

A worker can connect to multiple Worcr nodes in parallel for increased availability. If it lost a connection to a node, it would try to connect to another one. If a worker needs to quit, it would yield all its jobs back to the node. If a node needs to quit and there are no more nodes left, a worker would try to yield the job attempt it’s currently performing, or tell the node to wait until it finishes the job.

Apart from traditional TCP workers, Worcr has instead two types of workers – abstract Socket (which does not define the transport, it could be a TCP, WebSocket or UDP worker) and Lambda (the name may change), which calls a remote worker on-premise. That means that Worcr has built-in support for serverless workers and even allows to create web workers (in browsers).

Unlike traditional workers, Worcr workers are dumb, as I’ve said before. It allows to write them in any language with minimal code. They just wait for a job to perform. All distribution logic, including priority, load balancing and limits – happens in Worcr. It allows to move a solid part of application logic to a single place. The rules defined in Worcr are extremely flexible and support several embedded languages – Lua and Javascript.

Why did I write it all? Because it’s hard for me to formulate my thoughts properly, and I hope this post would help my potential customers to better understand the product. If you’re interested and/or have any questions, join https://gitter.im/worcr then. I would be happy to cover more use-cases and build better architecture.

0 Likes