Croupier, a tasks / dataflow library

I posted this on reddit but a comment said the forum was where people are nowadays, so here it is :slight_smile:

I have published the 1st sort-of-feature-complete version of Croupier at GitHub - ralsina/croupier: A library to create and execute tasks with dependencies

You may wonder what a tasks / dataflow library is?

Well, it lets you define tasks which are procs and connect them in a dependency graph. Some of these tasks may consume/produce files or values from a k/v store.

So, when you tell Croupier to “run” it will examine the state of the whole system and execute exactly the things that are needed, in an order that guarantees their dependencies are ready for them.

Why? Well, for example for another project, GitHub - ralsina/nicolino: A not-quite-minimalisting SSG written in Crystal. In a SSG you want to take markdown files, images, CSS, JS, etc, smush them through templates and generate HTML. Also you want to classify them into categories, and create indexes, and so on.

Doing that in a large site takes time, so you want to build things incrementally when needed.

Using a dataflow library like Croupier you don’t need to care about how to do that. Just create the tasks, link them via dependencies, declare what file it generates, and ask it to run things.

Here’s a toy example:

require "croupier"

b1 = Croupier::TaskProc.new{
  puts "task1 running"
  File.read("input.txt").downcase
}

Croupier::Task.new(
  output: "fileA",
  inputs: ["input.txt"],
  proc: b1
)

b2 = Croupier::TaskProc.new{
  puts "task2 running"
  File.read("fileA").upcase
}

Croupier::Task.new(
  output: "fileB",
  inputs: ["fileA"],
  proc: b2
)

Croupier::Task.run_tasks

Moving that complexity into a generic library means Nicolino can stay small and each feature is just a small fragment of uncoupled code with defined inputs and outputs.

I know it’s a pretty niche thing, but maybe someone will find it useful.

8 Likes

One thing you may consider doing to improve the DX is to allow Croupier::Task.new to accept a block, vs needing to manually pass it a proc. E.g.

Croupier::Task.new output: "fileA", inputs: ["input.txt"] do 
  puts "task1 running"
  File.read("input.txt").downcase
end

This could be a pretty easy integration since captured blocks are essentially procs, so would be able to pass the captured block to the existing constructor’s proc parameter.

ref: Capturing blocks - Crystal

2 Likes

Good point, thx!

How similar is this to Rakefiles?

This is not similar at all but it could be used as the basis for something like that. In fact …

GitHub - ralsina/hace: Hace is sort-of-make-like implemented using ralsina/croupier :smiley:

UPDATE well, I didn’t remember rakefiles correctly I guess!

Yeah, this could be somewhat like rakefiles I guess. Of course with Crystal being compiled that’s very uncomfortable.

Done: Add support for passing blocks to tasks · ralsina/croupier@e89e2b7 · GitHub

3 Likes

Just posting to mention that I just did a 0.8.0 release which optimizes dependency graph calculation. When combined with 0.7.0 which optimized marking “stale” task propagation … well, it makes it about 20x faster for large graph tasks :-)

1 Like

Blame @bcardiff for showing me that paper, but now Croupier 0.10.0 has (optional) early cutoff.

What does it mean?

Here it is in a diagram from the paper:

This means things like Hacé or Nicolino which are croupier-based get more efficient for free :-)

2 Likes

I will make a shameless plug to an old experiment

The main thing there was having a model of resumable tasks and having an atomic expansion of following subtasks. It was aiming to power a web crawler.

Nice! I am thinking of making the task scheduler in Croupier pluggable to attempt some of the other possibilities

Hi @ralsina , can tasks run in parallel if there are no dependencies?

Sure!

Supports -Dpreview_mt for real threads, and the task runner is a work-stealing queue with as many workers as you want, so tasks will run in parallel, dependencies permitting.

2 Likes

Right on!!