Still using this at the root and then when classes call to the DB, they use that.
DB_CONNECTION_STRING = "postgres://#{ENV["DB_USERNAME"]}:#{ENV["DB_PASSWORD"]}@#{ENV["DB_HOST"]}:#{ENV["DB_PORT"]}/#{ENV["DB_NAME"]}?max_pool_size=50&retry_attempts=30&checkout_timeout=60"
PG_DB = DB.open(DB_CONNECTION_STRING)
I have modified the spawn code from this:
channel = Channel(nil).new()
@arr_objs.each do |obj|
spawn do
obj.init(p1, p2)
channel.send(nil)
end
end
@arr_objs.size.times { channel.receive }
To this:
workers_count = 5
jobs = Channel(SomeObj).new(workers_count)
channel = Channel(UInt8).new(@arr_objs.size)
workers_count.times do
spawn do
while (obj = jobs.receive)
obj.init(@arr_details, @str_date)
channel.send(1_u8)
end
end
end
@arr_objs.each do |obj|
jobs.send(obj)
end
counter = 0
@arr_objs.size.times do
puts "X: #{counter += 1}/#{@arr_objs.size}"
channel.receive
puts "Y: #{counter}/#{@arr_objs.size}"
end
Then I did show max_connections
on my DB, it is 100. So I added max_pool_size=50
in the DB connection string.
Then while it is in the part where it is using MT and hitting the DB, I monitor select * from pg_stat_activity
and the most connections I ever see in there are my DataGrip connection, the first connection from the DB.open, and then 5-8 others from the MT code.
The X/Y code will frequently just stop 1-3 short from the end and hang infinitely with no errors to the console - but the CPU load is nothing (from them and minor from other running things on the computer) and the DB shows only the first connection active (from the global DB.open).
It seems to have settle down on a pattern here of four things that might happen for each iteration of arr_objs
and/or the program itself (the contents of arr_objs are always the same every time I run this program, and are in the same order - yet where this fails each time I run it seems to be entirely random - it does not always fail in the same spot):
- it works fine
- no exceptions are thrown, but it hangs somewhere around the last 1-3 (e.g. if
arr_objs.size
is 120, then it will hang on 119, if it chooses to hang - no activity in the DB or CPU and will sit there infinitely until killed - it will show the X debug line on the console but never get to the Y - it also sometimes does this 3 short… not sure if relevant, but I haven’t yet seen it do 2 short, but suspect that’s just random vs meaningful)
- it throws an exception saying no results were returned from an aggregate (which is sort of the opposite of what it was doing before some of these adjustments, saying multiple results were returned from an aggregate, which just isn’t possible via the SQL itself) - if I look at the SQL used, it returns as you would expect - so it seems to be executing it oddly vs a bug in the SQL itself
- it throws dual exceptions concurrently (see below):
(these have been truncated to remove the parts of my functions getting called as I don’t think they are relevant)
Unhandled exception in spawn: Expected PQ::Frame::ParseComplete but got PQ::Frame::CommandComplete(@bytes=Bytes[83, 69, 76, 69, 67, 84, 32, 49, 0]) (Exception)
from /Users/user/.cache/crystal/crystal-run-pga.tmp in 'Exception::CallStack::unwind:Array(Pointer(Void))'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'Exception::CallStack#initialize:Array(Pointer(Void))'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'Exception::CallStack::new:Exception::CallStack'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'raise<Exception>:NoReturn'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'raise<String>:NoReturn'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'PQ::Connection#expect_frame<PQ::Frame::ParseComplete.class, Nil>:PQ::Frame::ParseComplete'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'PQ::Connection#expect_frame<PQ::Frame::ParseComplete.class>:PQ::Frame::ParseComplete'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'PG::Statement#perform_query<Tuple(Int32)>:PG::ResultSet'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'DB::Statement+@DB::Statement#perform_query_with_rescue<Tuple(Int32)>:PG::ResultSet'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'DB::Statement+@DB::Statement#query:args<Int32, Nil>:PG::ResultSet'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'DB::PoolStatement+@DB::PoolStatement#query:args<Int32, Nil>:PG::ResultSet'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'DB::Database@DB::QueryMethods(Stmt)#query:args<String, Int32, Nil>:PG::ResultSet'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in
...
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in 'Fiber#run:(IO::FileDescriptor | Nil)'
from /Users/user/.cache/crystal/crystal-run-somecode.tmp in '~proc3Proc(Fiber, (IO::FileDescriptor | Nil))@/opt/homebrew/Cellar/crystal/1.13.3_1/share/crystal/src/fiber.cr:95'
and
Unhandled exception in spawn: Expected PQ::Frame::CommandComplete but got PQ::Frame::ReadyForQuery(@transaction_status=PQ::Frame::ReadyForQuery::Status::Idle) (Exception)
from lib/pg/src/pq/connection.cr:447:7 in 'expect_frame'
from lib/pg/src/pq/connection.cr:430:9 in 'read_next_row_start'
from lib/pg/src/pg/result_set.cr:49:8 in 'move_next'
from lib/pg/src/pg/result_set.cr:208:20 in 'do_close'
from lib/db/src/db/disposable.cr:11:7 in 'close'
from lib/db/src/db/statement.cr:19:23 in 'scalar:args'
from lib/db/src/db/pool_statement.cr:34:30 in 'scalar:args'
from lib/db/src/db/query_methods.cr:284:7 in 'scalar'
...
Given the max_connections is 100 and in practice I appear to be maxing out around 8, I don’t seem to be overwhelming the connections - or any resources on the OS or DB for that matter.
So am I “Just Doing It Wrong” or is there something larger at play out of my control, like a bug in the db pg code around mt?
I am being very lazy and not doing anything about the exceptions and just letting it fall flat - should I more gracefully handle them and retry or anything more sophisticated than that?
Any suggestions on things I should try/test? I would really love to not have to run the DB parts serially - particularly when this will eventually run on a machine with many cores in “production” (this is just personal code that makes me money, but it is nothing commercial).