MT schedulers impact


#1

https://github.com/crystal-lang/crystal/pull/7214 looks promising, but due to lack of deep understanding of how it works, I’d like to ask how it’s going to affect the language (if it’s merged)? From my point of view, the only good thing the PR brings is faster switching between fibers, but fibers themselves will still stay in a single thread, right? So no changes in the existing code are expected, i.e. objects locking etc.


#2

My understanding of the PR is limited as well, but in the description ysbaddaden points out improvements in performance using 1, 4 and 8 threads (all without GC)

switch[1/128]: crystal: 10000000 yields in 288 ms, 34631387 yields per second
switch[4/128]: crystal: 10000000 yields in 128 ms, 77606737 yields per second
switch[8/128]: crystal: 10000000 yields in 91 ms, 108759411 yields per second

So I think this is a great step towards parallelism, crystal in 2019 is gonna be great :)


#3

MT stands for multi-threaded. It does affect the semantic of the language: if you share a variable between threads it will be possible to trigger race conditions.


#4

As far as I understand, only the scheduler logic becomes multi-threaded, leaving all the fibers in a single thread?


#5

No, it runs fibers in all threads. This is the real deal. It will break your code, because there will be race conditions and until the stdlib is audited, all the unsafe parts of the stdlib may cause segfaults. A lot of work to do. For example: should array raise or lock or segfault when accessed concurrently. I’m sure we don’t choose the latter.


#6

I’m sure we don’t choose the latter.

Great news :laughing:

Thanks for the clarification, Chris!


#7

Note that we’ll probably choose the latter. Having Array be thread-safe by default will probably make everything super slow.


#8

We cannot have concurrency issues become segfaults. Java raises a ConcurrentModificationException when you use ArrayList concurrently, and ArrayList in java is fast. We must do the same.

We may profile the fiber locks in crystal and realise the overhead for the very happy case of the lock only ever being held by one fier is small, in that case i’d support making Array thread-safe by default. But we must wait for data.


#9

It seems Java has Vector, which is thread safe (nobody uses it because of the performance penalties) and AtrayList, which isn’t thread safe. Concurrent modification exception happens when you modify an Array while holding an Iterator, or something like that.


#10

That said, I’d like to see the performance of a thread-safe array, but we’ll see.


#11

At the very least, Array should not possibly cause segfaults when used in a data race. I am not convinced this is true, given realloc can change the address of @buffer while another thread is iterating.


#12

At the very least there seems to be a need for a RWLock around resizing arrays.

Also of note is non-atomic memory writes to memory locations containing unions. If the type id of the union is read, then another thread retires writes to the type id and pointer, then the original thread reads the new pointer, then it’s using the wrong type id with the wrong pointer, and we get a segfault because we dispatched to the wrong method.

I can’t imagine how to solve that in a reasonable performance budget.


#13

Only if you’re really running multiple threads. You should always be able to choose how many threads to be spawned and that can be only one.


#14

Race conditions cannot turn into segfaults. They can turn into exceptions, bad behaviour, whatever, but not remote code executions. If that is the cost of parallelism, then its a cost nobody will be willing to pay.


#15

This can be solved by not reading the type id separate from the value. The codegen must read the entire value, then dispatch based on the read type id. If another thread changes the value then it doesn’t matter, the read value is already consistent.


#16

I just checked and we are not doing that right now.


#17

Is it possible? It’d be a 128bit atomic read on 64bit and a 64bit atomic read on 32bit. Not all architectures support that.


#18

Actually, it’d be an unbounded size atomic read, since pointers are already safe: we can save the pointer to a register then read the type ID from the class header. The problem comes from unions of structs. For an Array(Struct1 | Struct2), where Struct1 and Struct2 are large structs, then how can this work? I can’t think of a way apart from boxing all structs which leave the stack.

And if we box all structs which leave the stack, we might as well remove structs and expend the energy on accurate escape anlysis.


#19

That’s why I think that segfaults and other undefined behavior can and will happen in Crystal if you don’t synchronize access to shared resources.


#20

I’d rather disallow shared memory than have random segfaults and undefined behaviour. A crystal program which doesn’t use pointers or other explicitly unsafe constructs should never segfault.