Something like sql_tracker for Crystal

Hi,

The problem:

  • I want to enable a “query counter” and a query threshold, so I can enable this on CI using an environment variable and the test just explodes if we pass the threshold.

This is good to avoid N+1 issues (and others) that pass unnoticed by code review or just to try to keep the number of SQL queries issued under a safe limit.

Do you guys know any solution for Crystal? Or do I need to write my own shard?

2 Likes

I used this gem extensively in the past in my Rails projects: GitHub - civiccc/db-query-matchers: RSpec matchers for database queries. I haven’t seen one in Crystal though. But I’d think it should be relatively easy with crystal-db interface, right?

I’m not aware of any matchers. But they should be able to be built on top of Add logging for executing queries by bcardiff · Pull Request #134 · crystal-lang/crystal-db · GitHub probably

2 Likes

The project uses Avram as ORM, so initially I though about attach it to avram query log, but I was thinking more about something that attaches to crystal-db itself, so I can de-couple it from the ORM and the database driver at the same time… this PR on crystal-db will make everything possible. Thanks!

2 Likes

Known from what line of code the log was triggered is easy by monkey pacthing the Log class, but this is useless because I would like to known not where the Log entry was created, but where’s the code that triggered the query, and this depends on the ORM being used.

Anyway, this Log monkey patch can be useful for someone.

require "log"

class Log
  {% for method in %w(trace debug info notice warn error fatal) %}
    {% severity = method.id.camelcase %}
    def {{method.id}}(*, exception : Exception? = nil, file = __FILE__, line = __LINE__)
      severity = Severity::{{severity}}
      return unless level <= severity
      return unless backend = @backend

      dsl = Emitter.new(@source, severity, exception)
      result = yield dsl

      case result
      when Entry
        new_context = result.context.extend({file: file, line: line})
        new_entry = Entry.new(result.source, result.severity, result.message, new_context, result.exception, timestamp: result.timestamp)
        backend.dispatch new_entry
      when Nil
        # emit nothing
      else
        backend.dispatch dsl.emit(result.to_s, file: file, line: line)
      end
    end
  {% end %}
end

class Foo
  Log = ::Log.for("Foo")

  def initialize
    Log.info { "ctor" }
    Log.info &.emit("with emit", emit_var: "working")
  end
end

Log.info { "Top namespace" }
Foo.new

Prints:

2024-01-16T22:26:37.932473Z   INFO - Top namespace -- file: "/usr/lib/crystal/log/main.cr", line: 38
2024-01-16T22:26:37.932476Z   INFO - Foo: ctor -- file: "/home/hugo/src/tmp/log_location.cr", line: 32
2024-01-16T22:26:37.932476Z   INFO - Foo: with emit -- file: "/home/hugo/src/tmp/log_location.cr", line: 33

I.e., it’s useful if you ignore that it get the wrong location for top level calls and eats the emit args used :sweat_smile:

I’m a bit late to this thread, but this is something I do in a few apps, too. For example, I have a file that patches Fiber and DB::Statement to collect all of the queries and stick them onto a property of the current fiber:

require "db/statement"

class Fiber
  getter query_counts : Hash(String, Int32) do
    Hash(String, Int32).new do |h, k|
      h[k] = 0
    end
  end
end

class DB::Statement
  def_around_query_or_exec do |asdf|
    yield
  ensure
    Fiber.current.query_counts[command] += 1
  end
end

Then I have an HTTP::Handler class that collects this over the course of a request and logs them out. I control how much it logs via an env var because logging can get expensive.

require "http"
require "log"

class DB::QueryCounter
  include HTTP::Handler

  LOG_PERCENTAGE = ENV.fetch("LOG_QUERY_COUNT_PERCENTAGE", "0").to_f
  Log = ::Log.for(self.class)

  def call(context)
    if rand < LOG_PERCENTAGE
      call_with_logging context
    else
      call_without_logging context
    end
  end

  def call_with_logging(context)
    Fiber.current.query_counts.clear

    begin
      call_without_logging context
    ensure
      Log.info &.emit "Queries",
        total: Fiber.current.query_counts.sum { |_, count| count },
        counts: Fiber.current.query_counts,
        method: context.request.method,
        path: context.request.path
    end
  end

  def call_without_logging(context)
    call_next context
  end
end

So each HTTP request outputs something like this (some parts redacted), which I can use GCP log search for and alert if jsonPayload.data.total exceeds some threshold for some endpoints:

{
  "timestamp": "2024-01-17T02:08:45.351099000Z",
  "severity": "info",
  "source": "query_counter",
  "message": "Queries",
  "data": {
    "total": 5,
    "counts": {
      "SELECT accounts.id, accounts.avatar_url, accounts.github_login, accounts.created_at, accounts.updated_at FROM accounts WHERE id = $1 LIMIT $2": 1,
      "SELECT oauth_user_tokens.id, oauth_user_tokens.account_id, oauth_user_tokens.value, oauth_user_tokens.expires_at, oauth_user_tokens.refresh_token, oauth_user_tokens.scope, oauth_user_tokens.created_at, oauth_user_tokens.updated_at FROM oauth_user_tokens WHERE id = $1 LIMIT $2": 1,
      "SELECT [columns] FROM [table] WHERE [column] = $1 LIMIT $2": 1,
      "INSERT INTO github_repositories (\"name\", \"full_name\", \"github_id\", \"node_id\") VALUES ($1, $2, $3, $4) ON CONFLICT (github_id) DO UPDATE SET name = $5, full_name = $6, github_id = $7, node_id = $8  RETURNING github_repositories.id, github_repositories.name, github_repositories.full_name, github_repositories.node_id, github_repositories.github_id, github_repositories.created_at, github_repositories.updated_at": 1,
      "SELECT [columns] FROM [table1] INNER JOIN [table2] AS [rel2] ON [table1].[fk]_id = rel2.id  INNER JOIN [table3] AS [rel3] ON [rel2].author_id = [rel3].id  INNER JOIN accounts AS enqueued_by ON [rel3].enqueued_by_id = enqueued_by.id  WHERE [rel2].[fk]_id = $1": 1
    },
    "method": "GET",
    "path": "/the/path/goes/here"
  }
}

This way I can filter my logs for exactly which endpoints are running so many queries and see which ones they’re running so many of since it includes the count of each query executed. I also have a similar middleware object that does the same for background jobs.

3 Likes

We can also add some patches to Avram if that would help.

Also, in case you didn’t know, you can subscribe to Avram::Events::QueryEvent to get the queries as well as subscribe to Avram::Events::FailedQueryEvent for the failed queries.

Avram::Events::QueryEvent.subscribe do |event, duration|
  puts event.query
  puts event.args
end

You can store that separately to do some custom tracking if you needed.

2 Likes

All the solutions proposed are nice to discover what queries fired during a request, however I was looking for more information, the location in user code that fired the query, that is what SQLTracker ruby gem does. It may be possible to do the same in Crystal by querying the stack trace, probably slow as hell, but for a development tool it may work.

Ahh, sorry, I misunderstood. Still quite doable with very little code with the same DB::Statement.def_around_query_or_exec trick as above:

class DB::Statement
  def_around_query_or_exec do |asdf|
    yield
  ensure
    # `caller` is expensive, so we only want to do this with a
    # verbose logging setting that we likely wouldn't have enabled
    # in production
    if Log.level <= ::Log::Severity::Debug
     # Filter out this stack frame
      caller.each within: 1.. do |location|
        # Only show application code, no library code
        if location.starts_with?("src/")
          Log.debug &.emit "called from", location: location
        end
      end
    end
  end
end

Result:

2024-01-18T22:54:57.503170Z  DEBUG - db: Executing query -- query: "SELECT [columns] FROM [table] WHERE [column] = $1 LIMIT $2", args: ["5509c6f9-e265-48b0-8d03-39b7ec6c8d7e", 1]
2024-01-18T22:54:57.536851Z  DEBUG - db: called from -- location: "src/routes/organizations.cr:48:26 in 'call'"
2024-01-18T22:54:57.540394Z  DEBUG - db: called from -- location: "src/routes/web.cr:32:32 in 'call'"
2024-01-18T22:54:57.582668Z  DEBUG - db: called from -- location: "src/web.cr:44:7 in 'call'"
2 Likes

I don’t like that the location info is added as separate log messages. I think you’d be better off maybe doing something slightly different. Maybe wrap the yield in a Log.with_context, do your logic to parse the frames and set them on the Log.context? This way they’ll be a part of the Log::Entry the DB shard emits with the query/args.

Related: Improve Exception backtrace API · Issue #10681 · crystal-lang/crystal · GitHub

It depends the log formatter being used and how you consume your logs. With a JSONFormatter and a log reader that renders that well, that’s probably preferable.

But with the default formatter when you’re only using this in development reading it in the terminal output, putting the callstack in the same log entry is not as useful.

This doesn’t set the context before logging. The default implementation logs before yielding. In order to output what I had above, I had to override DB::Statement#emit_log.

1 Like

Ahh gotcha, in that case wonder if you could call previous_def within a Log.with_context block? :thinking:.

I have thought about having like file = __FILE__ and line = __LINE__ defaults on the log methods and pass those through. But that still wouldn’t help as they’d be where the log was emitted, not where the query was made from :confused:. Guess you could do something similar on some sort of repo method, with the actual query being in a Log.context block, but a bit less than ideal yea :/.

1 Like