MT schedulers impact

asterite · December 28, 2018, 3:01pm

So, I just tried this code:

require "llvm"

ctx = LLVM::Context.new
mod = ctx.new_module("main")
i256 = ctx.int(256)
mod.functions.add("foo", ([] of LLVM::Type), i256) do |func|
  func.basic_blocks.append do |builder|
    x = builder.alloca i256
    ld = builder.load x
    ld.ordering = :sequentially_consistent
    builder.ret ld
  end
end

LLVM.init_x86

triple = String.new(LibLLVM.get_default_target_triple)
target = LLVM::Target.from_triple(triple)
target_machine = target.create_target_machine(triple)
target_machine.emit_obj_to_file(mod, "some.o")
target_machine.emit_asm_to_file(mod, "some.s")

puts mod

It compiles well. The generated LLVM IR is:

; ModuleID = 'main'
source_filename = "main"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

define i256 @foo() {
  %1 = alloca i256, align 8
  %2 = alloca i256
  %3 = bitcast i256* %2 to i8*
  %4 = bitcast i256* %1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 32, i8* %4)
  call void @__atomic_load(i64 32, i8* %3, i8* %4, i32 5)
  %5 = load i256, i256* %1, align 8
  call void @llvm.lifetime.end.p0i8(i64 32, i8* %4)
  ret i256 %5
}

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #0

declare void @__atomic_load(i64, i8*, i8*, i32)

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #0

; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #1

attributes #0 = { argmemonly nounwind }
attributes #1 = { nounwind }

The generated assembly is:

	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 14
	.globl	_foo
	.p2align	4, 0x90
_foo:
	.cfi_startproc
	pushq	%rbx
	.cfi_def_cfa_offset 16
	subq	$64, %rsp
	.cfi_def_cfa_offset 80
	.cfi_offset %rbx, -16
	movq	%rdi, %rbx
	leaq	32(%rsp), %rsi
	movq	%rsp, %rdx
	movl	$32, %edi
	movl	$5, %ecx
	callq	___atomic_load
	movq	(%rsp), %rax
	movq	8(%rsp), %rcx
	movq	16(%rsp), %rdx
	movq	24(%rsp), %rsi
	movq	%rsi, 24(%rbx)
	movq	%rdx, 16(%rbx)
	movq	%rcx, 8(%rbx)
	movq	%rax, (%rbx)
	movq	%rbx, %rax
	addq	$64, %rsp
	popq	%rbx
	retq
	.cfi_endproc


.subsections_via_symbols

I don’t understand everything, but there’s that magic __atomic_load function… maybe LLVM already takes care of this? When you do a load you can mark it as atomic and specify an ordering (sequentially consistent has the strongest guarantee, but I didn’t read it completely to understand all the variants). If that’s the case then maybe it’s simpler than what we thought (and we get it for free for different sizes of union types). But we’d have to try it out (or read/investigate).

RX14 · December 28, 2018, 3:16pm

Reading up on __atomic_load, I think it will probably be quite slow. But I think that’s fine, this is probably the best solution.

RX14 · December 28, 2018, 3:24pm

github.com

llvm-mirror/compiler-rt/blob/master/lib/builtins/atomic.c

/*===-- atomic.c - Implement support functions for atomic operations.------===
 *
 *                     The LLVM Compiler Infrastructure
 *
 * This file is dual licensed under the MIT and the University of Illinois Open
 * Source Licenses. See LICENSE.TXT for details.
 *
 *===----------------------------------------------------------------------===
 *
 *  atomic.c defines a set of functions for performing atomic accesses on
 *  arbitrary-sized memory locations.  This design uses locks that should
 *  be fast in the uncontended case, for two reasons:
 * 
 *  1) This code must work with C programs that do not link to anything
 *     (including pthreads) and so it should not depend on any pthread
 *     functions.
 *  2) Atomic operations, rather than explicit mutexes, are most commonly used
 *     on code where contended operations are rate.
 * 
 *  To avoid needing a per-object lock, this code allocates an array of

This file has been truncated. show original

__atomic_load will just hash the pointer address to point to one of 1024 different spinlocks which it will spin on until it acquires a lock and reads the value.

asterite · December 28, 2018, 6:46pm

I guess we can use that and slowly optimize things. For example the optimization is only needed for instance variables holding unions that aren’t read-only. Right now the compiler doesn’t detect read-only instance variables but it could, or we could add some keyword for that (for example readonly). Then, I don’t think instance variables holding unions are that common so it might not be too much a problem.

RX14 · December 28, 2018, 10:46pm

Actually, it’s only needed for instance variables holding unions which contain structs, which is rare. Unions of classes are stored as simple pointers iirc.

funny-falcon · December 28, 2018, 11:01pm

I have a split brain on this topic.

On one side, I hate “we should have safe types in multithreaded environment”. Multithreading IS THE PAIN and it is unavoidable! Even if we remove segfaults our programs will not magically became correct. Multithreaded program could be correct only if you thought about its correctness and used appropriate synchronization. Otherwise it will be crap, even it will not segfault.

On the other side, Go’s builtin map panics if you try to access it in wrong way, ie when you mutate it in one “thread” and perform any access on other “thread”. And this panic could not be recovered, it always lead to process shutdown. And it is good, because otherwise we’d miss at least four bugs in our production this year.

I mean, we should not make every variable or field access safe. It is meaningless. The best effort should be some help to avoid some errors, but only if this help is relatively cheap.

funny-falcon · December 28, 2018, 11:02pm

Concerning to scheduler, I’d like to create non-migrating Fibers. I really want to have a way to create some Fibers on the same Thread and to be sure they will not be scheduled on other Thread (at least if I didn’t move it other Thread explicitly).

RX14 · December 29, 2018, 3:29pm

Is this for C bindings, or for pure mutual exclusion? I can imagine having a fiber ExclusionGroup which you could ass fibers to such that two fibers in a group cannot execute at the same time. But should that be in addition to manually pinning a fiber to a thread or not?

RX14 · December 29, 2018, 3:31pm

I’m not denying that, but segfaults are RCEs, if you can find a gadget which lines up user-controlled data with a function pointer in a union dispatch, and the language should not permit that. People will mess up concurrency, they will use locks instead of CSP, and they will make mistakes. But those should be data corruption bugs or raised exceptions, not RCEs.

funny-falcon · December 29, 2018, 4:05pm

if you can find a gadget which lines up user-controlled data with a function pointer in a union dispatch, and the language should not permit that.

Crystal is intentionally such low level and fast. You cannot sit on both chairs: safety and low level performance. I will wonder if you can.

funny-falcon · December 29, 2018, 4:07pm

You cannot sit on both chairs: safety and low level performance. I will wonder if you can.

Ok, there is Rust :-) Then Crystal should became Rust-ish.

funny-falcon · December 29, 2018, 4:10pm

Concerning unions: if union tag has free bits (and I bet it has), then spin lock could be implemented using them.

RX14 · December 29, 2018, 8:26pm

Safety has always been more important to crystal than speed. If safety and performance conflict, the safe behaviour (within reason) must always be the default. There can of course be an unsafe escape hatch.

That’s a really good idea! I think in the future this will definitely be done.

vladfaust · December 29, 2018, 11:32pm

Hey, I just wanted to say that I’m happy the team is so open in discussions here, keep it up!

rumenzu · April 2, 2019, 2:11pm

I’m glad someone already brought this in the discussion.

Since Crystal is heading towards a work-stealing scheduler (which is great!), I was also wondering if there is any plan to disallow some fibers to be scheduled on other threads than the “original” one.

I have this concern for C bindings, since some libraries requires specific calls to be made on the same thread.

Maybe a solution could be to add a pin flag to fibers, preventing schedulers seating in other threads to steal such marked fibers?

rogerdpack · October 14, 2021, 5:46am

Appears there is a (non documented) same_thread option to spawn Multithreaded Crystal initial thoughts - #3 by Blacksmoke16 I remember there was some discussion of work stealing at some point FWIW.

Topic		Replies	Views
Crystal and parallelism Help & Support	7	238	February 10, 2025
Charting the route to multi-threading support Crystal Contrib	4	348	October 23, 2024
MT and Mutex Help & Support	16	2176	November 1, 2019
Blog: Parallelism in Crystal Official	6	947	September 13, 2019
Crystal multithreading support Crystal Contrib	24	1440	December 15, 2024

MT schedulers impact

Related topics