First time posting here, so I tried to really make sure I could reproduce this with as little code as possible, which is what is here.
TL;DR: this fails in several ways, but only in those ways - or succeeds just fine, and I cannot figure out why
My expectation of course is that it would run fine or that it would crash the same way every time. The fact that it does a range of things (even accounting for slight differences in input) doesn’t seem to track with what I would expect.
This is from a much larger base of code that does monte carlo simulations, but I kept cutting away until it got down to this. I’ve sanitized it and some areas of the code are test cases where more code would fill them (e.g. instead of randomly filling a hash, a db call does it, etc).
I can run this many times in sequence and it always either does:
-
runs totally fine (great!)
-
hangs (the spot in the code where it outputs N/size with X and Y, it will show the X and then never show the Y - this is always between 480-490 of the 500). When it hangs, there are no db locks, there are no active db calls, there is no CPU load, and I can leave it overnight and still nothing happens. It feels like maybe a thread deadlock? I have no idea.
-
crashes with (the number varies, but the text does not):
Duplicate large block deallocation
[2] 33314 abort ./test
-
crashes with something like this - the length always varies (this is a short one) but it always has three “Invalid memory access (signal 11) at address 0x8” at the top, and it is always that exact text, and always 0x8. Sometimes these are much larger, this is the smallest I have seen.
Invalid memory access (signal 11) at address 0x8
Invalid memory access (signal 11) at address 0x8
Invalid memory access (signal 11) at address 0x8
[0x109c292e6] *Exception::CallStack::print_backtrace:Int32 +38
[0x109bafd75] __crystal_sigfault_handler +309
[0x7fff2039ad7d] _sigtramp +29
[0x109c0ba5c] *DB::PoolPreparedStatement#build_statement:DB::Statement+ +108
[0x109c263c8] ~procProc(Nil)@test.cr:47 +1288
[0x109baa4bc] *Fiber#run:(IO::FileDescriptor | Nil) +60
I build with this:
crystal build -Dpreview_mt --no-debug --release test.cr
The shard versions:
db (0.10.0)
pg (0.23.1)
dotenv (0.3.0)
crystal -v
Crystal 0.36.1 (2021-02-02)
LLVM: 11.0.1
Default target: x86_64-apple-macosx
I haven’t recently tested this on Ubuntu, but a week or two ago it was doing this on my Linux box as well. My Mac has 8 cores and 32GB RAM, Linux has 64 cores and 192GB RAM. I have tried to watch using htop on both and never seen it get close to saturating even 4 cores and it doesn’t (on there) show a RAM issue on either.
I have full control of the DB so can adjust any parameters on that, but I see no sign of it being stressed during any of this.
I have played with the parameters in the connection string and nothing seems to help this.
Probably stating the obvious, but to show the DB part is relevant and the resources involved with that, if I call boo instead of woo it always returns in in full in less than a second, every time, always - even though the numeric output of it is the same.
Code:
require "dotenv"
require "db"
require "pg"
Dotenv.load
DB_CONNECTION_STRING = "postgres://#{ENV["DB_USERNAME"]}:#{ENV["DB_PASSWORD"]}@#{ENV["DB_HOST"]}:#{ENV["DB_PORT"]}/#{ENV["DB_NAME"]}?max_pool_size=200&initial_pool_size=200&max_idle_pool_size=200&retry_attemps=30&checkout_timeout=60"
TEST_DB = DB.open(DB_CONNECTION_STRING)
class A
def go
numbers_hash = Hash(Int32, Int32).new
while numbers_hash.size < 500
x = (3..12056).sample
numbers_hash[x] = x * 10
end
channel = Channel(UInt8).new
numbers_hash.each do |a, b|
spawn do
B.new.woo(a, b)
channel.send 1_u8
end
end
counter = 0_u64
numbers_hash.each {
puts "X: #{counter += 1}/#{numbers_hash.size}"
channel.receive
puts "Y: #{counter}/#{numbers_hash.size}"
}
end
end
class B
def boo(a,b)
test_count = {0,1}.sample
1_u8
end
def woo(a, b)
# this SQL only ever returns 0 or 1
test_sql = "select count(*)::smallint as example_count from my_table where id=$1;"
test_count = TEST_DB.query_one test_sql, a, as: {Int16}
1_u8
end
end
A.new.go
The table it is querying is ~20k entries and the range of values is mimicked by the random hash builder, so could build one that way.
Although after testing, this SQL will also do it (so if boo used this instead, it crashes the same as woo):
test_sql = "SELECT cast(random()*(1-0)+0 as int);"
test_count = TEST_DB.query_one test_sql, as: {Int32}