WebSocket and silent disconnections (solved)

Hello everyone!

First, a little bit of context: I use Crystal for a websocket daemon. For that, I use the WebSocket class but I don’t want to rely on the #run function (I have my own loop). I store WebSocket instances and file descriptors that I use in a select function (the one in the C standard library), since I also keep track of other sockets than the websocket’s ones.

(FYI, I extended the WebSocket class too, I did a #read function which could be added in the upstream class if anyone’s interested.)

Now, the problem is that I have silent disconnections: TCP doesn’t send keep-alive messages, and no websocket ping is sent. At some point, my application doesn’t respond anymore, the socket seems closed. I don’t see any option to send periodic messages, or to use TCP keep-alive messages in the WebSocket class. What should I do?
Should I use the TCPSocket class and use the #tcp_keepalive_idle function? This seems rather low-level, I’m not sure a web developer should have to dive into this to play with websockets.
Should I force ping messages (from the client or the server)?

Besides, since I don’t use the #run function in the WebSocket class, #on_ping, #on_pong and #on_message functions aren’t called. That’s not much of a problem, I could add another function to see if the received message is a ping, pong or a data message. But, before doing anything, I want to know if it’s necessary, or if I can just have standard, RFC-compliant, TCP keep-alive messages.

Thanks for your time.

As far as I understand, some form of activity is required to detect dead connections at the listening side. In this case it may be either TCP’s internal keepalives, Websocket level keepalives or some explicit data at the top level.

Here’s the top result from a quick search: https://stackoverflow.com/questions/23238319/websockets-ping-pong-why-not-tcp-keepalive, and the point on TCP keepalives being host-to-host, not end-to-end is a very valid one imo.

1 Like

How are you using select? I thought it wasn’t an option anymore…

Unfortunately I too recently had to dive deep into the entrails of TCP land.


but it’s not tooo ugly. I guess it depends on what you anticipate your workload being. The other thing that comes to mind is read and write timeouts.

My thought/hunch is that this isn’t a problem particular to crystal but more of a general “socket” type problem…do you wait forever for them to ACK back or do you set timeouts or keep_alive packets? FWIW… :)

Seems fair. OK I have to use ping and pong messages and forget about TCP keep-alives.

I guess I have to write a function to periodically send pong messages to all clients. I hope that’s why my client is disconnected from the server.

I’m sure we can improve the standard WebSocket class with a few functions not to force a blocking call as #run. I’ve done a #read function, I’ll write some others and see where it goes. I hope to provide these to you in a near future.

FWIW, heartbeat checks are normal, you’ll be fine! Just have the client periodically send something every xx seconds (no later than the read/write_timeout value or it will be too late).

For my game servers, I use every 5 seconds and have read/write_timeout values at 10. All connections stay alive indefinitely.

I want to implement a periodical ping message, but it seems that I cannot spawn a fiber with #delay along with my call to the select function I use. This is frustrating. How to send these messages?

My guess is that I have to change my call to select and add a timer. I would rather play with fibers if I can, to avoid modifying a lot of code.

I did implement a timer in my select function, and now I send a PING message every few seconds.
Silent disconnections stopped.

So: I have a #read function that could be added to the WebSocket class. I’ll do a PR.

Thanks everyone.