Anyone know why `Exception#backtrace` uses so much memory?

I’ve noticed that exception handling uses a very different amount of memory based on whether I’m using the backtrace. For example, if I simply log an exception, my service hold steady at 5-6MB of RAM in production for days, but if I log the backtrace, it uses ~50MB of RAM.

I’ve looked at the Exception::CallStack#decode_backtrace method to find out what could be using so much RAM here, but nothing is jumping out at me. When it comes to reporting errors, getting stack traces is useful, but if it uses an order of magnitude more RAM per process as soon as it hits its first error it’s almost not worth it.

3 Likes

libunwind has memory usage issues and I believe that’s what you are facing here in Crystal, as it uses libunwind.

There is an old but still open issue on libunwind repo.

Memory leak and high CPU using unw_step in libunwind-1.1-3

HIH

3 Likes

I had a look and I’m not sure that function would be the source of the problem I’m seeing. The exception has a callstack whether or not I inspect it, so I think that unw_step function would already have been called at that point even if I didn’t inspect it, right? Assuming I’m understanding the purpose of that function correctly.

I feel like this may be a separate issue with decoding the stack frames referenced by the elements in this array.

Hunch (without checking) is that it loads all the dwarf info into an internal structure (@@dwarf_line_numbers), maybe it only triggers that when you display the stack trace?

I have wondered before if some improvements could be made, like only looking up/cacheing the memory locations that we “actually use” but that’s what it does today…the only option is CRYSTAL_LOAD_DEBUG_INFO=0
ref: crystal/kernel.cr at 1c33a0b1d14a1e2c8835bb1b7381789d648624ea · crystal-lang/crystal · GitHub

I thought there were some discussions about it somewhere but can’t seem to find them…good luck!

1 Like

@rogerdpack Nice find, this seems to be exactly what it is. The load_debug_info method is called from several methods invoked inside Exception::CallStack#decode_backtrace to memoize the debug info, so setting CRYSTAL_LOAD_DEBUG_INFO=1 just changes when that method is called and the data loaded. I set CRYSTAL_LOAD_DEBUG_INFO=1 and as soon as the process started it was using ~50MB of RAM, and getting an exception made no meaningful difference.

Unfortunately, this means that there is a choice between using an order of magnitude more memory for small services vs making exceptions slower any time the app reads backtraces as it reloads the data from disk every time. Classic space-vs-time tradeoff.

On the bright side, the monkeypatch to release this memory (on Linux/macOS/BSD, at least) probably looks something like this:

struct Exception::CallStack
  private def decode_backtrace
    previous_def.tap do
      @@dwarf_loaded = false
      @@dwarf_line_numbers = nil
      @@dwarf_function_names = nil
    end
  end
end

I’ll give it a try in the morning.

2 Likes

Is the debug info reloaded from disk every time? I thought we cached that? :thinking:

I guess jgaskins is referring to monkey patching it to make it release somehow…which is an interesting stop gap measure until something else can be created like cacheing what is useful? :slight_smile:

1 Like

It is indeed being cached. The cache takes up 10x as much memory as the rest of the program uses. That’s the tradeoff I was mentioning. :slightly_smiling_face:

The monkeypatch I mentioned was to clear that cache to hopefully release that memory. I don’t know if it’ll be a good idea, though. Specifically, I don’t know if it’ll release enough memory back to the OS to be worth it or if the GC will hold onto that memory since those heap pages may not be empty anymore and Boehm doesn’t compact, but also it may be prohibitively slow if an app sees a lot of exceptions that have their call stacks inspected — for example, when reporting it to an error tracker like Honeybadger.

Ideally, apps should see very few of them (they’re supposed to be exceptional), and a slow disk read is probably fine when they’re few and far between, but you also don’t want a buggy deploy to wreck performance. Hard to know. I’ll run some benchmarks on it.

1 Like

Another dimension for trade-offs is reducing the amount of debug information in the binary. If you have less debug info, there’s less to load. But as a result, you have less debug info for debugging.

Then I do wonder if all of this is done optimally. Is there any other language that loads this info at runtime and stores it in memory? It might be the case that the data is arranged in the file in a way to make access to it fast (for example if you need to compute a backtrace given some bits you got from unwinding) so there might never be a need to store the entire thing in memory.

1 Like

This is a really good question. I just checked this Go code:

package main

import (
  "fmt"
  "net/http"
)

func main() {
  http.HandleFunc("/oops", Oops)
  http.HandleFunc("/", Empty)
  http.ListenAndServe(":8080", nil)
}

func Empty(w http.ResponseWriter, r *http.Request) {
}

func Oops(w http.ResponseWriter, r *http.Request) {
  panic("OOPS!")
}

The root path does nothing so I can get a baseline for memory overhead. The /oops path panics. Both hit a plateau of just under 12MB of RAM after 10k requests, and the panic does show the source location of each stack frame:

2022/12/08 15:45:15 http: panic serving 127.0.0.1:52028: OOPS!
goroutine 34490 [running]:
net/http.(*conn).serve.func1()
	/usr/local/Cellar/go/1.19.3/libexec/src/net/http/server.go:1850 +0xbf
panic({0x1224380, 0x12da040})
	/usr/local/Cellar/go/1.19.3/libexec/src/runtime/panic.go:890 +0x262
main.Oops({0x3b?, 0xffffffffffffffff?}, 0x10af4f3?)
	/Users/jamie/Code/go_panic_example/main.go:17 +0x27
net/http.HandlerFunc.ServeHTTP(0xc0001a8af0?, {0x12dccd0?, 0xc000148e00?}, 0x0?)
	/usr/local/Cellar/go/1.19.3/libexec/src/net/http/server.go:2109 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0x12dccd0, 0xc000148e00}, 0xc000119000)
	/usr/local/Cellar/go/1.19.3/libexec/src/net/http/server.go:2487 +0x149
net/http.serverHandler.ServeHTTP({0xc0001a15f0?}, {0x12dccd0, 0xc000148e00}, 0xc000119000)
	/usr/local/Cellar/go/1.19.3/libexec/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc00053ea00, {0x12dd0a0, 0xc0000a8d80})
	/usr/local/Cellar/go/1.19.3/libexec/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
	/usr/local/Cellar/go/1.19.3/libexec/src/net/http/server.go:3102 +0x4db

This seems to indicate that there is a more efficient way of looking up this info without holding it all in RAM. Looking into the Go panic code to see how it gets this info.

8 Likes