I’m a bit late to this thread, but this is something I do in a few apps, too. For example, I have a file that patches Fiber
and DB::Statement
to collect all of the queries and stick them onto a property of the current fiber:
require "db/statement"
class Fiber
getter query_counts : Hash(String, Int32) do
Hash(String, Int32).new do |h, k|
h[k] = 0
end
end
end
class DB::Statement
def_around_query_or_exec do |asdf|
yield
ensure
Fiber.current.query_counts[command] += 1
end
end
Then I have an HTTP::Handler
class that collects this over the course of a request and logs them out. I control how much it logs via an env var because logging can get expensive.
require "http"
require "log"
class DB::QueryCounter
include HTTP::Handler
LOG_PERCENTAGE = ENV.fetch("LOG_QUERY_COUNT_PERCENTAGE", "0").to_f
Log = ::Log.for(self.class)
def call(context)
if rand < LOG_PERCENTAGE
call_with_logging context
else
call_without_logging context
end
end
def call_with_logging(context)
Fiber.current.query_counts.clear
begin
call_without_logging context
ensure
Log.info &.emit "Queries",
total: Fiber.current.query_counts.sum { |_, count| count },
counts: Fiber.current.query_counts,
method: context.request.method,
path: context.request.path
end
end
def call_without_logging(context)
call_next context
end
end
So each HTTP request outputs something like this (some parts redacted), which I can use GCP log search for and alert if jsonPayload.data.total
exceeds some threshold for some endpoints:
{
"timestamp": "2024-01-17T02:08:45.351099000Z",
"severity": "info",
"source": "query_counter",
"message": "Queries",
"data": {
"total": 5,
"counts": {
"SELECT accounts.id, accounts.avatar_url, accounts.github_login, accounts.created_at, accounts.updated_at FROM accounts WHERE id = $1 LIMIT $2": 1,
"SELECT oauth_user_tokens.id, oauth_user_tokens.account_id, oauth_user_tokens.value, oauth_user_tokens.expires_at, oauth_user_tokens.refresh_token, oauth_user_tokens.scope, oauth_user_tokens.created_at, oauth_user_tokens.updated_at FROM oauth_user_tokens WHERE id = $1 LIMIT $2": 1,
"SELECT [columns] FROM [table] WHERE [column] = $1 LIMIT $2": 1,
"INSERT INTO github_repositories (\"name\", \"full_name\", \"github_id\", \"node_id\") VALUES ($1, $2, $3, $4) ON CONFLICT (github_id) DO UPDATE SET name = $5, full_name = $6, github_id = $7, node_id = $8 RETURNING github_repositories.id, github_repositories.name, github_repositories.full_name, github_repositories.node_id, github_repositories.github_id, github_repositories.created_at, github_repositories.updated_at": 1,
"SELECT [columns] FROM [table1] INNER JOIN [table2] AS [rel2] ON [table1].[fk]_id = rel2.id INNER JOIN [table3] AS [rel3] ON [rel2].author_id = [rel3].id INNER JOIN accounts AS enqueued_by ON [rel3].enqueued_by_id = enqueued_by.id WHERE [rel2].[fk]_id = $1": 1
},
"method": "GET",
"path": "/the/path/goes/here"
}
}
This way I can filter my logs for exactly which endpoints are running so many queries and see which ones they’re running so many of since it includes the count of each query executed. I also have a similar middleware object that does the same for background jobs.