So, I just tried this code:
require "llvm"
ctx = LLVM::Context.new
mod = ctx.new_module("main")
i256 = ctx.int(256)
mod.functions.add("foo", ([] of LLVM::Type), i256) do |func|
func.basic_blocks.append do |builder|
x = builder.alloca i256
ld = builder.load x
ld.ordering = :sequentially_consistent
builder.ret ld
end
end
LLVM.init_x86
triple = String.new(LibLLVM.get_default_target_triple)
target = LLVM::Target.from_triple(triple)
target_machine = target.create_target_machine(triple)
target_machine.emit_obj_to_file(mod, "some.o")
target_machine.emit_asm_to_file(mod, "some.s")
puts mod
It compiles well. The generated LLVM IR is:
; ModuleID = 'main'
source_filename = "main"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
define i256 @foo() {
%1 = alloca i256, align 8
%2 = alloca i256
%3 = bitcast i256* %2 to i8*
%4 = bitcast i256* %1 to i8*
call void @llvm.lifetime.start.p0i8(i64 32, i8* %4)
call void @__atomic_load(i64 32, i8* %3, i8* %4, i32 5)
%5 = load i256, i256* %1, align 8
call void @llvm.lifetime.end.p0i8(i64 32, i8* %4)
ret i256 %5
}
; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #0
declare void @__atomic_load(i64, i8*, i8*, i32)
; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #0
; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #1
attributes #0 = { argmemonly nounwind }
attributes #1 = { nounwind }
The generated assembly is:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 14
.globl _foo
.p2align 4, 0x90
_foo:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
subq $64, %rsp
.cfi_def_cfa_offset 80
.cfi_offset %rbx, -16
movq %rdi, %rbx
leaq 32(%rsp), %rsi
movq %rsp, %rdx
movl $32, %edi
movl $5, %ecx
callq ___atomic_load
movq (%rsp), %rax
movq 8(%rsp), %rcx
movq 16(%rsp), %rdx
movq 24(%rsp), %rsi
movq %rsi, 24(%rbx)
movq %rdx, 16(%rbx)
movq %rcx, 8(%rbx)
movq %rax, (%rbx)
movq %rbx, %rax
addq $64, %rsp
popq %rbx
retq
.cfi_endproc
.subsections_via_symbols
I don’t understand everything, but there’s that magic __atomic_load
function… maybe LLVM already takes care of this? When you do a load
you can mark it as atomic
and specify an ordering (sequentially consistent has the strongest guarantee, but I didn’t read it completely to understand all the variants). If that’s the case then maybe it’s simpler than what we thought (and we get it for free for different sizes of union types). But we’d have to try it out (or read/investigate).