Intermittent DNS Errors inside Crystal

Happening with both resolving “localhost” to connect Redis and also with my Imgur API endpoint requests (api.imgur.com). Most of the time it works, sometimes it fails to resolve and breaks the action.

Error 500 at GET /canvas
Socket::Addrinfo::Error: Hostname lookup for localhost failed: System error
See raw message
Redis::Connection#initialize<String, Int32, (String | Nil), (OpenSSL::SSL::Context::Client | Nil), (Time::Span | Nil), (Time::Span | Nil), (Time::Span | Nil)>:Redis::SocketWrapper
No code available.
 ??? Redis::Connection#initialize<String, Int32, (String | Nil), (OpenSSL::SSL::Context::Client | Nil), (Time::Span | Nil), (Time::Span | Nil), (Time::Span | Nil)>:Redis::SocketWrapper
 ??? Redis::Connection::new<String, Int32, (String | Nil), (OpenSSL::SSL::Context::Client | Nil), (Time::Span | Nil), (Time::Span | Nil), (Time::Span | Nil)>:Redis::Connection
 ??? Redis#connect:(Array(Redis::RedisValue) | Int64 | Redis::Future | String | Nil)
 ??? Redis#initialize:(Array(Redis::RedisValue) | Int64 | Redis::Future | String | Nil)
 ??? Redis::new:Redis
 ??? LivepixelController#canvas:String
 ??? ~procProc(HTTP::Server::Context, Nil)
 ??? Amber::Route#call<HTTP::Server::Context>:Nil
 ??? HTTP::Server::Context#process_request:Nil
 ??? Amber::Pipe::Controller#call<HTTP::Server::Context>:Nil
 ??? Amber::Pipe::CSRF
 ??? Amber::Pipe::CSRF#call<HTTP::Server::Context>:(Array(Log::Entry) | Bool | Channel(Tuple(Log::Entry, Log::Backend+)) | HTTP::Headers | HTTP::Server::Context | IO+ | Int32 | Int64 | Proc(IO+, Nil) | Nil)
 ??? Amber::Pipe::Flash
 ??? Amber::Pipe::Flash#call<HTTP::Server::Context>:(Array(Log::Entry) | Bool | Channel(Tuple(Log::Entry, Log::Backend+)) | HTTP::Headers | HTTP::Server::Context | IO+ | Int32 | Int64 | Proc(IO+, Nil) | Nil)
 ??? Amber::Pipe::Session
 ??? Amber::Pipe::Session#call<HTTP::Server::Context>:(HTTP::Headers | Nil)
 ??? Amber::Pipe::Logger
 ??? Amber::Pipe::Logger#call<HTTP::Server::Context>:HTTP::Server::Context
 ??? Amber::Pipe::Error
 ??? Amber::Pipe::Error#call<HTTP::Server::Context>:(Array(Log::Entry) | Bool | Channel(Tuple(Log::Entry, Log::Backend+)) | HTTP::Headers | HTTP::Server::Context | IO+ | Int32 | Int64 | Proc(IO+, Nil) | Nil)
 ??? Citrine::I18n::Handler
 ??? Citrine::I18n::Handler#call<HTTP::Server::Context>:(Array(Log::Entry) | Bool | Channel(Tuple(Log::Entry, Log::Backend+)) | HTTP::Headers | HTTP::Server::Context | IO+ | Int32 | Int64 | Proc(IO+, Nil) | Nil)
 ??? Amber::Pipe::Pipeline#call<HTTP::Server::Context>:(Array(Log::Entry) | Bool | Channel(Tuple(Log::Entry, Log::Backend+)) | HTTP::Headers | HTTP::Server::Context | IO+ | Int32 | Int64 | Proc(IO+, Nil) | Nil)
 ??? HTTP::Server::RequestProcessor#process<IO+, IO+>:Nil
 ??? HTTP::Server#handle_client<IO+>:Nil
 ??? ~procProc(Nil)
 ??? Fiber#run:(IO::FileDescriptor | Nil)
 ??? ~proc2Proc(Fiber, (IO::FileDescriptor | Nil))
 ??? ???
Params
room
""
Request info
Headers

OS can resolve those domain names fine. Happens on dev and production in Ubuntu 20.04. First happened when I ported the Imgur Uploader for my canvas drawing app, I would notice that the upload would fail sometimes due to DNS lookup issues.

“System error” (EAI_SYSTEM) indicates an error not specific to getaddrinfo and it seems that is currently not handled correctly. This prevents further diagnosis of the error.

You can apply this patch to stdlib’s Addrinfo implementation and it should hopefully provide more specific error information:

diff --git i/src/socket/addrinfo.cr w/src/socket/addrinfo.cr
index c3deff908..36fdb4a90 100644
--- i/src/socket/addrinfo.cr
+++ w/src/socket/addrinfo.cr
@@ -137,6 +137,9 @@ class Socket
         raise Error.new(ret, "The requested socket type #{type} protocol #{protocol} is not supported", domain)
       when LibC::EAI_SERVICE
         raise Error.new(ret, "The requested service #{service} is not available for the requested socket type #{type}", domain)
+      when LibC::EAI_SYSTEM
+        errno = Errno.value
+        raise Error.new(errno.value, errno.message, domain)
       else
         raise Error.new(ret, domain)
       end
4 Likes

I applied your patch @straight-shoota I’m trying to move to 1.1-dev where the bug seems to be dead at all.

Oh, that’s weird. The changes regarding network connections since 1.0 are mostly just refactorings and adding some win32 support. I’m wondering how that would affect your error.

Maybe I didn’t test for long enough. I didn’t get to the root of the DNS error yet since it hits so randomly.

I’m observing this behavior, it happens using bin/amber watch in the Amber framework using Crystal 1.0.0. Yet when I switch to using the compiled compiler, Cystal 1.1.0-dev the issue immediately disappears.

Are you aware that you don’t need a custom compiler build to test the addrinfo error patch?

No I’m not. Is there another way?

You just need to apply that patch to the stdlib sources that you use for building. The compiler itself doesn’t need any update for that.