MT schedulers impact

Is it possible? It’d be a 128bit atomic read on 64bit and a 64bit atomic read on 32bit. Not all architectures support that.

Actually, it’d be an unbounded size atomic read, since pointers are already safe: we can save the pointer to a register then read the type ID from the class header. The problem comes from unions of structs. For an Array(Struct1 | Struct2), where Struct1 and Struct2 are large structs, then how can this work? I can’t think of a way apart from boxing all structs which leave the stack.

And if we box all structs which leave the stack, we might as well remove structs and expend the energy on accurate escape anlysis.

That’s why I think that segfaults and other undefined behavior can and will happen in Crystal if you don’t synchronize access to shared resources.

I’d rather disallow shared memory than have random segfaults and undefined behaviour. A crystal program which doesn’t use pointers or other explicitly unsafe constructs should never segfault.

So, I just tried this code:

require "llvm"

ctx = LLVM::Context.new
mod = ctx.new_module("main")
i256 = ctx.int(256)
mod.functions.add("foo", ([] of LLVM::Type), i256) do |func|
  func.basic_blocks.append do |builder|
    x = builder.alloca i256
    ld = builder.load x
    ld.ordering = :sequentially_consistent
    builder.ret ld
  end
end

LLVM.init_x86

triple = String.new(LibLLVM.get_default_target_triple)
target = LLVM::Target.from_triple(triple)
target_machine = target.create_target_machine(triple)
target_machine.emit_obj_to_file(mod, "some.o")
target_machine.emit_asm_to_file(mod, "some.s")

puts mod

It compiles well. The generated LLVM IR is:

; ModuleID = 'main'
source_filename = "main"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

define i256 @foo() {
  %1 = alloca i256, align 8
  %2 = alloca i256
  %3 = bitcast i256* %2 to i8*
  %4 = bitcast i256* %1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 32, i8* %4)
  call void @__atomic_load(i64 32, i8* %3, i8* %4, i32 5)
  %5 = load i256, i256* %1, align 8
  call void @llvm.lifetime.end.p0i8(i64 32, i8* %4)
  ret i256 %5
}

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #0

declare void @__atomic_load(i64, i8*, i8*, i32)

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #0

; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #1

attributes #0 = { argmemonly nounwind }
attributes #1 = { nounwind }

The generated assembly is:

	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 14
	.globl	_foo
	.p2align	4, 0x90
_foo:
	.cfi_startproc
	pushq	%rbx
	.cfi_def_cfa_offset 16
	subq	$64, %rsp
	.cfi_def_cfa_offset 80
	.cfi_offset %rbx, -16
	movq	%rdi, %rbx
	leaq	32(%rsp), %rsi
	movq	%rsp, %rdx
	movl	$32, %edi
	movl	$5, %ecx
	callq	___atomic_load
	movq	(%rsp), %rax
	movq	8(%rsp), %rcx
	movq	16(%rsp), %rdx
	movq	24(%rsp), %rsi
	movq	%rsi, 24(%rbx)
	movq	%rdx, 16(%rbx)
	movq	%rcx, 8(%rbx)
	movq	%rax, (%rbx)
	movq	%rbx, %rax
	addq	$64, %rsp
	popq	%rbx
	retq
	.cfi_endproc


.subsections_via_symbols

I don’t understand everything, but there’s that magic __atomic_load function… maybe LLVM already takes care of this? When you do a load you can mark it as atomic and specify an ordering (sequentially consistent has the strongest guarantee, but I didn’t read it completely to understand all the variants). If that’s the case then maybe it’s simpler than what we thought (and we get it for free for different sizes of union types). But we’d have to try it out (or read/investigate).

1 Like

Reading up on __atomic_load, I think it will probably be quite slow. But I think that’s fine, this is probably the best solution.

__atomic_load will just hash the pointer address to point to one of 1024 different spinlocks which it will spin on until it acquires a lock and reads the value.

2 Likes

I guess we can use that and slowly optimize things. For example the optimization is only needed for instance variables holding unions that aren’t read-only. Right now the compiler doesn’t detect read-only instance variables but it could, or we could add some keyword for that (for example readonly). Then, I don’t think instance variables holding unions are that common so it might not be too much a problem.

2 Likes

Actually, it’s only needed for instance variables holding unions which contain structs, which is rare. Unions of classes are stored as simple pointers iirc.

I have a split brain on this topic.

On one side, I hate “we should have safe types in multithreaded environment”. Multithreading IS THE PAIN and it is unavoidable! Even if we remove segfaults our programs will not magically became correct. Multithreaded program could be correct only if you thought about its correctness and used appropriate synchronization. Otherwise it will be crap, even it will not segfault.

On the other side, Go’s builtin map panics if you try to access it in wrong way, ie when you mutate it in one “thread” and perform any access on other “thread”. And this panic could not be recovered, it always lead to process shutdown. And it is good, because otherwise we’d miss at least four bugs in our production this year.

I mean, we should not make every variable or field access safe. It is meaningless. The best effort should be some help to avoid some errors, but only if this help is relatively cheap.

2 Likes

Concerning to scheduler, I’d like to create non-migrating Fibers. I really want to have a way to create some Fibers on the same Thread and to be sure they will not be scheduled on other Thread (at least if I didn’t move it other Thread explicitly).

2 Likes

Is this for C bindings, or for pure mutual exclusion? I can imagine having a fiber ExclusionGroup which you could ass fibers to such that two fibers in a group cannot execute at the same time. But should that be in addition to manually pinning a fiber to a thread or not?

I’m not denying that, but segfaults are RCEs, if you can find a gadget which lines up user-controlled data with a function pointer in a union dispatch, and the language should not permit that. People will mess up concurrency, they will use locks instead of CSP, and they will make mistakes. But those should be data corruption bugs or raised exceptions, not RCEs.

if you can find a gadget which lines up user-controlled data with a function pointer in a union dispatch, and the language should not permit that.

Crystal is intentionally such low level and fast. You cannot sit on both chairs: safety and low level performance. I will wonder if you can.

You cannot sit on both chairs: safety and low level performance. I will wonder if you can.

Ok, there is Rust :-) Then Crystal should became Rust-ish.

Concerning unions: if union tag has free bits (and I bet it has), then spin lock could be implemented using them.

3 Likes

Safety has always been more important to crystal than speed. If safety and performance conflict, the safe behaviour (within reason) must always be the default. There can of course be an unsafe escape hatch.

That’s a really good idea! I think in the future this will definitely be done.

3 Likes

Hey, I just wanted to say that I’m happy the team is so open in discussions here, keep it up! :slight_smile:

2 Likes

I’m glad someone already brought this in the discussion.

Since Crystal is heading towards a work-stealing scheduler (which is great!), I was also wondering if there is any plan to disallow some fibers to be scheduled on other threads than the “original” one.

I have this concern for C bindings, since some libraries requires specific calls to be made on the same thread.

Maybe a solution could be to add a pin flag to fibers, preventing schedulers seating in other threads to steal such marked fibers?

3 Likes

Appears there is a (non documented) same_thread option to spawn Multithreaded Crystal initial thoughts - #3 by Blacksmoke16 I remember there was some discussion of work stealing at some point FWIW.