Invalid memory access when pp a hash of IO::FileDescriptor => SomeObjectInstance

I can reproduce the Invalid memory access error if add a pp @readers or puts @readers when uncomment exactly this line.

following is backtrace, test on Arch linux with Crystal 1.14.0

backtrace
[0x6246abb3d6d6] *Exception::CallStack::print_backtrace:Nil +118 in bin/procodile
[0x6246abb23726] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +310 in bin/procodile
[0x7ceca40691d0] ?? +137355806020048 in /usr/lib/libc.so.6
[0x6246abc589cb] *Time::Location#lookup<Int64>:Time::Location::Zone +27 in bin/procodile
[0x6246abc58988] *Time::Location#lookup<Time>:Time::Location::Zone +72 in bin/procodile
[0x6246abc5439e] *Time#zone:Time::Location::Zone +30 in bin/procodile
[0x6246abc54361] *Time#offset:Int32 +17 in bin/procodile
[0x6246abc54321] *Time#offset_seconds:Int64 +17 in bin/procodile
[0x6246abc55eaf] *Time#year_month_day_day_year:Tuple(Int32, Int32, Int32, Int32) +47 in bin/procodile
[0x6246abc56606] *Time#year:Int32 +22 in bin/procodile
[0x6246abd197ee] *Time::Format::Formatter#year:Nil +14 in bin/procodile
[0x6246abd199aa] *Time::Format::Formatter +10 in bin/procodile
[0x6246abd1855e] *Time::Format::Formatter +798 in bin/procodile
[0x6246abd18184] *Time::Format::Formatter +308 in bin/procodile
[0x6246abd17f0b] *Time::Format#format<Time, IO::FileDescriptor>:Nil +203 in bin/procodile
[0x6246abc56cb6] *Time#to_s<IO::FileDescriptor, String>:Nil +70 in bin/procodile
[0x6246abc56b7a] *Time#inspect<IO::FileDescriptor, Bool>:Nil +42 in bin/procodile
[0x6246abc56b47] *Time#inspect<IO::FileDescriptor>:Nil +39 in bin/procodile
[0x6246abd119b8] *Procodile::Supervisor +408 in bin/procodile
[0x6246abd263d1] *Procodile::Instance +1217 in bin/procodile
[0x6246abd378d1] *Hash(IO::FileDescriptor+, Procodile::Instance) +929 in bin/procodile
[0x6246abc368bb] *IO::FileDescriptor +27 in bin/procodile
[0x6246abc3688a] *IO::FileDescriptor +10 in bin/procodile
[0x6246abb2794d] *puts<Hash(IO::FileDescriptor+, Procodile::Instance)>:Nil +45 in bin/procodile
[0x6246abd1176a] *Procodile::Supervisor#log_listener_reader:Nil +74 in bin/procodile
[0x6246abd1382a] *Procodile::Supervisor#watch_for_output:Nil +74 in bin/procodile
[0x6246abd11e7d] *Procodile::Supervisor#start<Proc(Procodile::Supervisor, Nil)>:NoReturn +461 in bin/procodile
[0x6246abc6a23d] *Procodile::CLI::start_supervisor<Procodile::Config, Procodile::CliOptions, &Proc(Procodile::Supervisor, Nil)>:Nil +1965 in bin/procodile
[0x6246abc69920] *Procodile::CLI +2688 in bin/procodile
[0x6246abb26429] ~proc3Proc(Nil) +9 in bin/procodile
[0x6246abc6dbca] *Procodile::CLI#dispatch<String>:Nil +202 in bin/procodile
[0x6246abb1255e] __crystal_main +7070 in bin/procodile
[0x6246abbc3006] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6 in bin/procodile
[0x6246abbc2f7a] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +58 in bin/procodile
[0x6246abb20016] main +6 in bin/procodile
[0x7ceca4051e08] ?? +137355805924872 in /usr/lib/libc.so.6
[0x7ceca4051ecc] __libc_start_main +140 in /usr/lib/libc.so.6
[0x6246abb108e5] _start +37 in bin/procodile
[0x0] ???

this issue happen when read the hash values of @readers, because if i replace it with: pp @readers.keys, no this error, but it happen again when change to pp @readers.values, somethings related to Time location cause this?

Can anyone try reproduce it use linux or macOS ?

following is the process:

  1. git clone GitHub - crystal-china/procodile_cr: 🐊 Run processes in the background (and foreground) on Mac & Linux from a Procfile (for production and/or development environments)
  2. cd procodile_cr && git checkout refactor_2
  3. shards build
  4. run bin/procodile start -d, it should work as exepcted. like following:
 ╰──➤ $ bin/procodile start -d
Removed app1.1.pid because process was not active
02:12:17 system             | Procodile supervisor started with PID 160490
02:12:17 system             | Application root is /home/zw963/Git/procodile_cr
02:12:17 system             | Automatic respawning is disabled
02:12:17 system             | Reloading configuration
02:12:17 control            | Listening at /home/zw963/Git/procodile_cr/pids/procodile.sock
02:12:17 app1.1             | Started with PID 160491
02:12:17 app1.1             | => Fri Nov  8 02:12:17 AM CST 2024 - Hello, World! foo
02:12:18 app1.1             | => Fri Nov  8 02:12:18 AM CST 2024 - Hello, World! foo
02:12:19 app1.1             | => Fri Nov  8 02:12:19 AM CST 2024 - Hello, World! foo
02:12:20 app1.1             | => Fri Nov  8 02:12:20 AM CST 2024 - Hello, World! foo
02:12:21 app1.1             | => Fri Nov  8 02:12:21 AM CST 2024 - Hello, World! foo
02:12:22 app1.1             | => Fri Nov  8 02:12:22 AM CST 2024 - Hello, World! foo
  1. uncomment this line, then run following command again to reproduce this issue.
  2. shards build && bin/procodile start -d

In fact there is another random happened issue probably more hard to investigate, but related to this? it’s seem like this issue only happen when shards build, but never happen when build with shards build --release.

  1. Open two console
  2. Run bin/procodile start -d on first console log should output correctly.
  3. Run bin/procodile restart on second console if work normal, managed process should be restart in first terminal, you can seeing log output.
  4. There is a very high probability second console output Error: Control server disconnected. data: nil, with the procodile supervisor process terminated in first console.
  5. this issue not happen if build with --release.

I trace deeply to code where the issue happen, but, no any clue, because app just exit immediately(in the forked process) without saw any exception backtrace. (rescue on Exception not work), i guess it caused this same Invalid memory access issue.

Please help on tracing this bug in the compiler or my code, all method args and return value add the expected type, you can jump to method/class/struct definition use crystalline.

This is the source of your issue:

@started_at = uninitialized Time

uninitialized literally means the memory is uninitialized. The data is random rubbish and you persuade the compiler to accept it contains a valid instance the first time it’s accessed. But this promise is not fulfilled.
The inspector is trying to access the Time instance and print it. So when it dereference uninitilized data as the address of the Time::Location instance, you get an invalid memory address.

Don’t use uninitialized unless you are aware of the consequences.
For this use case it’s best to designate an unassigned instance var as nil value.

1 Like
@started_at = uninitialized Time

Good catch, thanks a lot for point it out, I almost forgot I wrote this before.

I really should use uninitialized more carefully.


BTW: the secondary issue still not fixed after fix above bug, even it happens when build with --release too, will keeping investigate more on it, although this is a very small use case.

Still could you figure out the above second issue, i removed all unsafe usage, e.g. uninitialized Type, and not_nil!, but still could find out why this issue happen.

I still consider it caused by Invalid memory access, following is the evidence:

1. pull latest code and switch to refactor_3 branch
2. if you open file src/procodile/control_server.cr:33, you will see code with some pp like following screenshot

3. following above quoted steps of from 1 to 4:

1. Open two console
2. Run `bin/procodile start -d` on first console log should output correctly.
3. Run `bin/procodile restart` on second console if work normal, managed process should be restart in first terminal, you can seeing log output.
4. There is a very high probability second console output `Error: Control server disconnected. data: nil`, with the procodile supervisor process terminated in first console.

When above issue not happen, all work as expected, you can saw print like:

line # => "restart {\"processes\":null,\"tag\":null}"
# ...
# many log
# ...
response # => "200 [[{\"description\":\"app1.5\",\"pid\":217120,\"respawns\":0,\"status\":\"stopping\",\"running\":false,\"started_at\":1731054436},{\"description\":\"app1.6\",\"pid\":217449,\"respawns\":0,\"status\":\"running\",\"running\":true,\"started_at\":1731054438}],[{\"description\":\"app2.5\",\"pid\":217121,\"respawns\":0,\"status\":\"stopping\",\"running\":false,\"started_at\":1731054436},{\"description\":\"app2.6\",\"pid\":217448,\"respawns\":0,\"status\":\"running\",\"running\":true,\"started_at\":1731054438}],[{\"description\":\"app3.5\",\"pid\":217119,\"respawns\":0,\"status\":\"stopping\",\"running\":false,\"started_at\":1731054436},{\"description\":\"app3.6\",\"pid\":217450,\"respawns\":0,\"status\":\"running\",\"running\":true,\"started_at\":1731054438}]]"

but once Control server disconnected error happen on second panel, we can only saw the line print on first console, but no the response, like following:

line # => "restart {\"processes\":null,\"tag\":null}"

That is, somethings wrong happened on following code, but, the rescue Exception never work.

         pp! line   # <== code run until here and exited without any backtrace.
          begin
            response = session.receive_data(line.strip)
          rescue Exception
            STDERR.puts "2"*100
            response = "200 ok"
          end
          pp! response

EDIT:

Please ignore above post, I have located the source of this issue.

When add a sleep 1 or wg.wait(or both) here, there’s a chance cause exit app without any backtrace.

If not add sleep 1 or wg.wait here, will cause no log output because @readers never any chance to set correct value, which make listener blocked forever.

Alright, I still have no idea what happens, this issue on happen when run supervisor in the foreground, but never happen when run supervisor in the background.


So, @straight-shoota , could you please help on explain why this happen? Is there any other situation that could cause Exception can’t be rescue?

Oops, there is a global rescue in the outermost layer of code make this exception never raised, A very low-level error.

Thanks, at least, i can confirm, this is not an Invalid Memory Access.

I can 100% confirm, this issue really caused by sleep or wg.wait

1 Like