SSL Error, but only in a Fibre

Weird situation I got here, and I can’t quite figure it out. I would appreciate any ideas. Operating system is Ubuntu Server, processor is ARM. I’m cross compiling it for ARM from my dev machine. The error is this:

Unhandled exception in spawn: SSL_connect: error:0D0C5006:asn1 encoding routines:ASN1_item_verify:EVP lib (OpenSSL::SSL::Error)
Failed to raise an exception: FAILURE
[0x561f80] ???
[0x4dfbd8] __crystal_raise +36
[0x4ec79c] ???
[0x4e89e0] ???

The code that throws the error is this:

spawn do
      HTTP::Client.get ""

Here’s what’s weird. If I remove the thread, and run this, it doesn’t throw an error.

HTTP::Client.get ""

What’s even weirder, if I add a puts in front of it, it also doesn’t throw an error:

spawn do
      puts HTTP::Client.get ""

Does anyone else have any ideas what could be causing this?

Is ARM already fully supported? If not, some bindings to whatever we are using (libevent, or anything) could be wrong and the stack could be corrupted.

When talking about ARM, let’s always talk about the specific architecture to reduce confusion :)

Here I guess we’re talking about AArch64, aka ARM64; CI seems to be very stable for it and runs the full test suite, sans variadic arguments in C bindings support, which should fail the compile if you try to use it. But then this probably doesn’t exercise the OpenSSL bindings to a high degree, so there might be another bug lurking in the ABI implementation. Or maybe the fiber context switch implementation doesn’t do the quite right thing yet in all scenarios.

This seems even like two separate errors, one with OpenSSL and the other with libunwind.

@straight-shoota Where do you see the libunwind error?

I should note that I don’t have libunwind-dev installed on this machine.

@jhass it’s a Raspberry Pi 2B. I believe it’s an ARM7.

The first line in the error output tells us about an unhandled exception. After that, the error handler should print the backtrace of that exception. But somehow hat fails (Failed to raise an exception: FAILURE).

After a nightly build might very well be able to print the exception in full. Not that that’s super interesting here I guess.

But then as Ary says it probably just corrupted the stack in some way, causing both symptoms.

1 Like

So is this lack of backtrace being caused by a Crystal issue fixed in that PR, or by the lack of libunwind on the host machine?

I should also note that I’m building with --release --no-debug arguments.

Is there a possible solution to this, if this is the case?

There should be a solution to this, definitely. But finding it will not be easy, as you need to find what exactly and how the corruption is caused.

It’s caused by the stack being corrupted, most likely. The PR just tries more to print the original trace even in a dire situation.

1 Like

This should be reported to the bug tracker.
It would probably help if we can reduce the failing example even more. The code is already pretty short, but there’s a lot going on in HTTP::Client. So ideally it would just be a couple of LibSSL calls (directly or using OpenSSL binding) that triggers this.

So even though this isn’t an officially supported platform, we can submit this as a bug? I’m not complaining, I just want to be certain I won’t annoy anyone.

I guess I can dig through the HTTP::Client source code and narrow this down further.

1 Like

Sure, we’ll definitely need AArch64 as a fully supported platform sooner or later anyways :)

1 Like

… and it’s those bugs that prevent it from being fully supported. If we don’t get to fix them, it can never be.

1 Like

In that case, I guess I’ve got my work cut out for me. Is the bug tracker just another name for the Github issues on the language repo, or is it something else?

Yes, Github issues.

1 Like

@jhass was right. The nightly build gave me a much clearer error:

Unhandled exception in spawn: SSL_connect: error:1417B07B:SSL routines:tls_process_cert_verify:bad signature (OpenSSL::SSL::Error)
  from ../../snap/crystal/539/share/crystal/src/openssl/ssl/ in 'initialize'
  from ../../snap/crystal/539/share/crystal/src/openssl/ssl/ in 'new:context:sync_close:hostname'
  from ../../snap/crystal/539/share/crystal/src/http/ in 'io'
  from ../../snap/crystal/539/share/crystal/src/http/ in 'send_request'
  from ../../snap/crystal/539/share/crystal/src/http/ in 'exec_internal_single'
  from ../../snap/crystal/539/share/crystal/src/http/ in 'exec_internal'
  from ../../snap/crystal/539/share/crystal/src/http/ in 'exec'
  from ../../snap/crystal/539/share/crystal/src/http/ in 'exec'
  from ../../snap/crystal/539/share/crystal/src/http/ in 'exec'
  from ../../snap/crystal/539/share/crystal/src/http/ in 'get'
  from ../clarsen/workspace/test/ in '->'
  from ../../snap/crystal/539/share/crystal/src/ in 'run'
  from ../../snap/crystal/539/share/crystal/src/ in '->'

I don’t think that’s the same error unfortunately. Using the nightly build probably just happened to shuffle things around in the stack enough so you would not hit the corruption or hit it differently.

It’s happening in the same pattern. I did confirm that it only happens inside of the fibre.