The Crystal Programming Language Forum

Multi-thread program: Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (Errno)

my parallel code demo.cr is this:

def test(num : Int32)
	num.times do |line|
  		proc = ->(x : Int32) do
		    spawn do
				# to do something
		    end
		  end
		proc.call(line)
	end
end

inf = ARGV[0].to_i
test(inf)

Fiber.yield
$crystal --version
Crystal 0.31.1 [0e2e1d067] (2019-09-30)

LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu

after build with --release -Dpreview_mt, then run get error:

$ export CRYSTAL_WORKERS=2;./demo 100 # no error,when 100
$ export CRYSTAL_WORKERS=2;./demo 1000000
Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (Errno)
GC Warning: Failed to expand heap by 2228224 bytes
GC Warning: Failed to expand heap by 262144 bytes
GC Warning: Out of Memory! Heap size: 6 MiB. Returning NULL!
Invalid memory access (signal 11) at address 0x6018
[0x4380b6] ???
[0x437a47] __crystal_sigfault_handler +263
[0x3143a0f130] ???
[0x423415] ???
[0x417fd2] ???
[0x415151] ???
[0x42fb93] ???
[0x42dfce] ???
[0x4108c7] main +55
[0x3142e21b45] __libc_start_main +245
[0x40c719] ???
[0x0] ???

$export CRYSTAL_WORKERS=3;./demo 1000000
Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (Errno)
  from ???

thanks ~

Regards

What happens with CRYSTAL_WORKERS=1 or if you compile without -Dpreview_mt?

Would also be great to know how much memory the application allocates in total before crashing.

get the same error:

$export CRYSTAL_WORKERS=1;./demo 100000
Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (Errno)
GC Warning: Failed to expand heap by 1703936 bytes
GC Warning: Failed to expand heap by 262144 bytes
GC Warning: Out of Memory! Heap size: 4 MiB. Returning NULL!
Invalid memory access (signal 11) at address 0xc018
[0x4380b6] ???
[0x437a47] __crystal_sigfault_handler +263
[0x3143a0f130] ???
[0x423415] ???
[0x417fd2] ???
[0x415151] ???
[0x42fb93] ???
[0x42dfce] ???
[0x4108c7] main +55
[0x3142e21b45] __libc_start_main +245
[0x40c719] ???
[0x0] ???

What OS are you in? 32 bits or 64 bits?

I’m running this on a mac, no problem.

CentOS release 6.9 (Final) and 64bits

https://crystal-lang.org/reference/guides/concurrency.html said this:
On a 64-bit machine it lets us spawn millions and millions of fibers,
my situation is that I have a array which have more than millions elements, and I wish to do something for each elements independently by parallel.

But in your example above you’re only spawning 100K fibers and it already runs out of memory. So it would be great if you could tell us how much memory the program actually uses. The number of fibers that can be spawned is strictly limited by the amount of memory available.

Regarding the general use case, when dealing with large numbers of jobs, it’s probably better not to spawn a fiber for each item. It doesn’t make sense when most of them are going to be locked anyway for a long time until it’s their time to process. Instead, use a number of worker fibers which loop through the list.

Thanks for the guide, I understand more about why fiber number is limited~ In most situation, the required memory for all task is often changed, so I am not sure that.
You said Instead, use a number of worker fibers which loop through the list how can I use worker fibers? Sorry, I just don’t understand the how to parallellism instead of concurrency.

@orangeSi something like this maybe?

def test(num : Int32, ch, threads)
	num.times do |line|
        ch.send(line)
    end
end

def signal_finished(ch, count)
    count.times do
        ch.send(nil)
    end
end

def process(num : Int32)
    # Do something slow, sleep to simulate slow operation
    sleep(0.1)
end

def launch_workers(count)
    count.times do
        spawn do
            # When gets nil from the channel ends the working thread loop
            # ch.receive blocks until the channel has a message available to be read
            while line = ch.receive
                process(line)
            end
        end
    end
end

ch = Channel(Int32 | Nil).new
threads = 4
launch_workers(threads)

inf = ARGV[0].to_i

test(inf, ch)
signal_finished(ch,threads)
1 Like

How much RAM does the box have that you’re running it on?

Hi all,

I have the same issue. Simple program - concurrent fizzbuzz. Latest crystal (0.34.0), Fedora 32 (beta), 26GB RAM, Intel Core i7 (12 cores).

  def self.run(number : String)
    number.to_i.times do |i|
      spawn do
        if i % 15 == 0
          puts "#{i}: fizbuzz"
        elsif i % 5 == 0
          puts "#{i}: buzz"
        elsif i % 3 == 0
          puts "#{i}: fizz"
        else
          puts i
        end
      end

      i += 1
    end

    Fiber.yield
  end
end

Crystalplay.run ARGV[0]

I run it in the terminal with $ crystal crystalplay.cr with the argument of 100000 and fails immediately with:

...
1025: buzz
GC Warning: Failed to expand heap by 2658304 bytes
GC Warning: Failed to expand heap by 262144 bytes
GC Warning: Out of Memory! Heap size: 7 MiB. Returning NULL!
Invalid memory access (signal 11) at address 0x2000
[0x459726] *CallStack::print_backtrace:Int32 +118
[0x44bff0] __crystal_sigfault_handler +304
[0x7f3dc85d0a90] ???
[0x459c4e] *Pointer(Pointer(Void)) +14
[0x471406] *Deque(Pointer(Void)) +166
[0x471096] *Fiber::StackPool#release<Pointer(Void)>:Deque(Pointer(Void)) +102
[0x472148] *Fiber#run:(IO::FileDescriptor | Nil) +328
[0x44bb36] ~proc2Proc(Fiber, (IO::FileDescriptor | Nil)) +6
[0x0] ???

When I modify a code a bit to allow spawning in batches (sort of):

...
      end

      i += 1

      sleep Random.new.rand / 1000 if i % 1000 == 0 
    end

    Fiber.yield
  end
end
...

That works well and fast. I’m guessing it obviously would - doesn’t try to allocate that much all at once.

Then I build it normally (without a --release flag) - fine, works. But when I add a --release flag, it fails with:

Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (RuntimeError)
  from ???
  from /usr/share/crystal/src/deque.cr:380:13 in 'checkout'
  from /usr/share/crystal/src/fiber.cr:90:5 in 'spawn'
  from src/crystalplay.cr:4:7 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:105:5 in 'main'
  from __libc_start_main
  from _start
  from ???

There is no rule at which level of RAM used it fails. I’ve got 13gb left for the program to run - so I assume it’s plenty.

I’ve run similar program with go, and it was failing when the argument was 10 000 000 - but then it was actually reaching end of the RAM using ~13gb.

(I know about running things with worker queues but I’m just trying to understand the issues here while stretching Crystal’s capabilities).

Thanks in advance for your advise.

1 Like

Can you share the exact command used to build the code?

Are you running with preview_mt or without?

I suspect the reason for the random to make it work, is that when the random value occurs, the sleep allows all the previously scheduled fibers to finish.

The build command it works with - crystal build [filename]
The build commands it doesn’t - crystal build --release [filename] / crystal build -Dpreview_mt --release [filename]

Re sleep - this is why I use it for batches like. This is although not ideal, so I’m still curious how to deal with this.

I don’t see how your program could successfully run with -Dpreview_mt at all. It’s gonna output gibberish (because fibers concurrently write to STDOUT) and suddenly stop (when the main fiber is done spawning new fibers it immediately exits). Without -Dpreview_mt it totally depends on unspecified scheduler behaviour (that the scheduler runs the newly spawned fibers first before resuming the suspended main fiber).

Despite that, the memory allocation error obviously looks strange when there’s still memory available. I’m not sure what exactly causes this. But your main fiber spawns 100k fibers at once, that means a lot of memory allocations in a tight loop. So it might just overwhelm the GC…?

The sleep doesn’t make the fibers run in batches. In single thread it just means the previously spawned fiber runs immediately (which makes the use of fibers completely useless). In multi thread it just makes the entire program run slower and reduces pressure on the GC. Btw. you can reach the same result with sleep 0 (which is equivalent to Fiber.yield), the random sleep amount has no effect besides stretching the time the program runs.

This is just a synthetic example and I can’t think of any realistic use case that would show such a high pressure behaviour. Usually when spawning lots of fibers you’ll have at least a few cycles doing other things like doing some real setup on the fiber.

I ran the code in Mac, 100k fibers and 1m fibers, it worked fine.

@straight-shoota - thanks for the comprehensive analysis, do you think we’re on to something on linux then when @asterite was able to run it no problem on the mac? (btw @asterite - when you say it run ok, you mean run by crystal run... or a built version? (it worked for me with crystal run, failed with built versions).

You’re correct with the multi thread build - it spits out some gibberish, and still fails almost immediately.

I do realise it’s a synthetic example, but I wasn’t expecting it failing so fast - I thought the behaviour would be similar to go’s here - it successfully runs up until it fills up memory full, and then it’d break (and because of that I rewrote the go version to use worker queues, thus why I’m aware of them being more appropriate). Still - it’s somewhat unclear why crystal would fail promptly.

@asterite Can you try amping up the number of fibers until it fails to allocate? And see whether it actually filled all your memory. That would be an indicator that this problem might be specific to linux or the linux build of bdwgc.

it worked for me with crystal run, failed with built versions

You never mentioned that be before. Can you be more specific? I couldn’t find any difference between crystal run and crystal build.

I can reproduce the error on linux, too. But only with -Dpreview_mt. Wheter with --release or not doesn’t matter. It fails after about 500k iterations each.

However, I’m pretty sure yesterday it also failed without -Dpreview_mt and much sooner, as indicated in the original report. That however doesn’t reproduce. I might have rebooted in between.

Macbook with 16GB of ram it runs until about 9.2M and after consuming 10GB of Ram (crashes the MacOS).

With and without preview_mt about the same behaviour

It works in both cases. crystal run should be the one failing because it will spawn a new process for the actual executable, so in that case you end up with more memory… so it’s pretty strange that it fails on the built version.

Can you try amping up the number of fibers until it fails to allocate?

With 10_000_000 I get the GC warning. Then it froze my machine (or, it became unusable) and I had to restart. The program never crashed.

Anyone else with a linux machine that can try this? On mac it seems to work fine.

@asterite I repeat my code and record the process as this:

$head -1 test2.sh 
set -x
$sh test2.sh 
+ cat demo.cr
def test(num : Int32)
  num.times do |line|
    proc = ->(x : Int32) do
      spawn do
	# to do something
        sleep(0.1)
      end
   end
   proc.call(line)
  end
end

test(ARGV[0].to_i)
Fiber.yield
+ echo '# without -Dpreview_mt'
# without -Dpreview_mt
+ crystal build demo.cr -o demo1
+ export CRYSTAL_WORKERS=1
+ CRYSTAL_WORKERS=1
+ ./demo1 100
+ export CRYSTAL_WORKERS=3
+ CRYSTAL_WORKERS=3
+ ./demo1 100
+ export CRYSTAL_WORKERS=1
+ CRYSTAL_WORKERS=1
+ ./demo1 100000
Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (Errno)
  from /usr/share/crystal/src/crystal/system/unix/fiber.cr:11:5 in 'allocate_stack'
  from /usr/share/crystal/src/fiber/stack_pool.cr:27:53 in 'checkout'
  from /usr/share/crystal/src/fiber.cr:90:29 in 'initialize'
  from /usr/share/crystal/src/fiber.cr:88:3 in 'new'
  from /usr/share/crystal/src/concurrent.cr:62:3 in 'spawn'
  from demo.cr:4:7 in '->'
  from demo.cr:255:3 in 'test'
  from demo.cr:14:1 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:106:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:92:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:115:3 in 'main'
  from __libc_start_main
  from _start
  from ???
+ export CRYSTAL_WORKERS=3
+ CRYSTAL_WORKERS=3
+ ./demo1 100000
Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (Errno)
  from /usr/share/crystal/src/crystal/system/unix/fiber.cr:11:5 in 'allocate_stack'
  from /usr/share/crystal/src/fiber/stack_pool.cr:27:53 in 'checkout'
  from /usr/share/crystal/src/fiber.cr:90:29 in 'initialize'
  from /usr/share/crystal/src/fiber.cr:88:3 in 'new'
  from /usr/share/crystal/src/concurrent.cr:62:3 in 'spawn'
  from demo.cr:4:7 in '->'
  from demo.cr:255:3 in 'test'
  from demo.cr:14:1 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:106:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:92:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:115:3 in 'main'
  from __libc_start_main
  from _start
  from ???
+ echo ''

+ echo '# with -Dpreview_mt'
# with -Dpreview_mt
+ crystal build demo.cr -o demo2 -Dpreview_mt
+ export CRYSTAL_WORKERS=1
+ CRYSTAL_WORKERS=1
+ ./demo2 100
+ export CRYSTAL_WORKERS=3
+ CRYSTAL_WORKERS=3
+ ./demo2 100
+ export CRYSTAL_WORKERS=1
+ CRYSTAL_WORKERS=1
+ ./demo2 100000
Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (Errno)
  from /usr/share/crystal/src/crystal/system/unix/fiber.cr:11:5 in 'allocate_stack'
  from /usr/share/crystal/src/fiber/stack_pool.cr:27:53 in 'checkout'
  from /usr/share/crystal/src/fiber.cr:90:29 in 'initialize'
  from /usr/share/crystal/src/fiber.cr:88:3 in 'new'
  from /usr/share/crystal/src/concurrent.cr:62:3 in 'spawn'
  from demo.cr:4:7 in '->'
  from demo.cr:255:3 in 'test'
  from demo.cr:14:1 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:106:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:92:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:115:3 in 'main'
  from __libc_start_main
  from _start
  from ???
+ export CRYSTAL_WORKERS=3
+ CRYSTAL_WORKERS=3
+ ./demo2 100000
Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (Errno)
GC Warning: Failed to expand heap by 2691072 bytes
GC Warning: Failed to expand heap by 262144 bytes
GC Warning: Out of Memory! Heap size: 7 MiB. Returning NULL!
Invalid memory access (signal 11) at address 0x6028
[0x559d937ac3e6] *CallStack::print_backtrace:Int32 +118
[0x559d9379efc0] __crystal_sigfault_handler +304
[0x7f16ca42b890] ???
[0x559d937ea8c3] *Pointer(Debug::DWARF::LineNumbers::Row) +227
[0x559d937e9c9d] *Array(Debug::DWARF::LineNumbers::Row) +285
[0x559d937e9b6d] *Array(Debug::DWARF::LineNumbers::Row) +205
[0x559d937e6fcf] *Debug::DWARF::LineNumbers#register_to_matrix<Debug::DWARF::LineNumbers::Sequence, Debug::DWARF::LineNumbers::Register>:Nil +2047
[0x559d937e5070] *Debug::DWARF::LineNumbers#read_statement_program<Debug::DWARF::LineNumbers::Sequence>:Nil +2848
[0x559d937e3970] *Debug::DWARF::LineNumbers#decode_sequences<(UInt32 | UInt64)>:Nil +3856
[0x559d937e2a4c] *Debug::DWARF::LineNumbers#initialize<IO::FileDescriptor+, (UInt32 | UInt64)>:Nil +108
[0x559d937e29a4] *Debug::DWARF::LineNumbers::new<IO::FileDescriptor+, (UInt32 | UInt64)>:Debug::DWARF::LineNumbers +132
[0x559d937a5c08] *CallStack::read_dwarf_sections:(Array(Tuple(UInt64, UInt64, String)) | Nil) +9928
[0x559d937a335d] *CallStack::decode_line_number<UInt64>:Tuple(String, Int32, Int32) +45
[0x559d937a2b6e] *CallStack#decode_backtrace:Array(String) +302
[0x559d937a2a22] *CallStack#printable_backtrace:Array(String) +50
[0x559d937fda9d] *Exception+ +77
[0x559d937d646c] *Crystal::System::print_exception<String, Exception+>:(Int32 | Nil) +220
[0x559d93806b67] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +455
[0x559d9379c066] main +6
[0x7f16c97d9b97] __libc_start_main +231
[0x559d9379057a] _start +42
[0x0] ???
+ echo 3
3

and the crystal version/liunux version/cpu/memory as this:

$crystal --version
Crystal 0.33.0 [612825a53] (2020-02-14)

LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu

$cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"

$free -h
              total        used        free      shared  buff/cache   available
Mem:            62G        342M         59G        1.3M        2.5G         61G
Swap:          975M         95M        880M

$lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              12
On-line CPU(s) list: 0-11
Thread(s) per core:  2
Core(s) per socket:  6
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Xeon(R) E-2236 CPU @ 3.40GHz
Stepping:            10
CPU MHz:             3400.617
BogoMIPS:            6816.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            12288K
NUMA node0 CPU(s):   0-11
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts md_clear flush_l1d