SSL Error, but only in a Fibre

HCLarsen · November 29, 2020, 7:45am

Weird situation I got here, and I can’t quite figure it out. I would appreciate any ideas. Operating system is Ubuntu Server, processor is ARM. I’m cross compiling it for ARM from my dev machine. The error is this:

Unhandled exception in spawn: SSL_connect: error:0D0C5006:asn1 encoding routines:ASN1_item_verify:EVP lib (OpenSSL::SSL::Error)
Failed to raise an exception: FAILURE
[0x561f80] ???
[0x4dfbd8] __crystal_raise +36
[0x4ec79c] ???
[0x4e89e0] ???

The code that throws the error is this:

spawn do
      HTTP::Client.get "https://restcountries.eu/rest/v2/alpha/can"
end

Here’s what’s weird. If I remove the thread, and run this, it doesn’t throw an error.

HTTP::Client.get "https://restcountries.eu/rest/v2/alpha/can"

What’s even weirder, if I add a puts in front of it, it also doesn’t throw an error:

spawn do
      puts HTTP::Client.get "https://restcountries.eu/rest/v2/alpha/can"
end

Does anyone else have any ideas what could be causing this?

asterite · November 29, 2020, 10:20am

Is ARM already fully supported? If not, some bindings to whatever we are using (libevent, or anything) could be wrong and the stack could be corrupted.

jhass · November 29, 2020, 10:28am

When talking about ARM, let’s always talk about the specific architecture to reduce confusion :)

Here I guess we’re talking about AArch64, aka ARM64; CI seems to be very stable for it and runs the full test suite, sans variadic arguments in C bindings support, which should fail the compile if you try to use it. But then this probably doesn’t exercise the OpenSSL bindings to a high degree, so there might be another bug lurking in the ABI implementation. Or maybe the fiber context switch implementation doesn’t do the quite right thing yet in all scenarios.

straight-shoota · November 29, 2020, 3:07pm

This seems even like two separate errors, one with OpenSSL and the other with libunwind.

HCLarsen · November 29, 2020, 6:16pm

@straight-shoota Where do you see the libunwind error?

I should note that I don’t have libunwind-dev installed on this machine.

HCLarsen · November 29, 2020, 6:36pm

@jhass it’s a Raspberry Pi 2B. I believe it’s an ARM7.

straight-shoota · November 29, 2020, 9:06pm

The first line in the error output tells us about an unhandled exception. After that, the error handler should print the backtrace of that exception. But somehow hat fails (Failed to raise an exception: FAILURE).

jhass · November 29, 2020, 9:46pm

After https://github.com/crystal-lang/crystal/pull/9220 a nightly build might very well be able to print the exception in full. Not that that’s super interesting here I guess.

But then as Ary says it probably just corrupted the stack in some way, causing both symptoms.

HCLarsen · November 29, 2020, 9:57pm

So is this lack of backtrace being caused by a Crystal issue fixed in that PR, or by the lack of libunwind on the host machine?

I should also note that I’m building with --release --no-debug arguments.

HCLarsen · November 29, 2020, 10:26pm

Is there a possible solution to this, if this is the case?

jhass · November 30, 2020, 7:33am

There should be a solution to this, definitely. But finding it will not be easy, as you need to find what exactly and how the corruption is caused.

It’s caused by the stack being corrupted, most likely. The PR just tries more to print the original trace even in a dire situation.

straight-shoota · November 30, 2020, 12:12pm

This should be reported to the bug tracker.
It would probably help if we can reduce the failing example even more. The code is already pretty short, but there’s a lot going on in HTTP::Client. So ideally it would just be a couple of LibSSL calls (directly or using OpenSSL binding) that triggers this.

HCLarsen · December 1, 2020, 12:44am

So even though this isn’t an officially supported platform, we can submit this as a bug? I’m not complaining, I just want to be certain I won’t annoy anyone.

I guess I can dig through the HTTP::Client source code and narrow this down further.

jhass · December 1, 2020, 8:10am

Sure, we’ll definitely need AArch64 as a fully supported platform sooner or later anyways :)

straight-shoota · December 1, 2020, 1:15pm

… and it’s those bugs that prevent it from being fully supported. If we don’t get to fix them, it can never be.

HCLarsen · December 2, 2020, 12:49am

In that case, I guess I’ve got my work cut out for me. Is the bug tracker just another name for the Github issues on the language repo, or is it something else?

straight-shoota · December 2, 2020, 1:54am

Yes, Github issues.

HCLarsen · December 5, 2020, 9:27pm

@jhass was right. The nightly build gave me a much clearer error:

Unhandled exception in spawn: SSL_connect: error:1417B07B:SSL routines:tls_process_cert_verify:bad signature (OpenSSL::SSL::Error)
  from ../../snap/crystal/539/share/crystal/src/openssl/ssl/socket.cr:34:11 in 'initialize'
  from ../../snap/crystal/539/share/crystal/src/openssl/ssl/socket.cr:3:5 in 'new:context:sync_close:hostname'
  from ../../snap/crystal/539/share/crystal/src/http/client.cr:784:5 in 'io'
  from ../../snap/crystal/539/share/crystal/src/http/client.cr:664:5 in 'send_request'
  from ../../snap/crystal/539/share/crystal/src/http/client.cr:599:5 in 'exec_internal_single'
  from ../../snap/crystal/539/share/crystal/src/http/client.cr:586:5 in 'exec_internal'
  from ../../snap/crystal/539/share/crystal/src/http/client.cr:581:7 in 'exec'
  from ../../snap/crystal/539/share/crystal/src/http/client.cr:706:5 in 'exec'
  from ../../snap/crystal/539/share/crystal/src/http/client.cr:738:7 in 'exec'
  from ../../snap/crystal/539/share/crystal/src/http/client.cr:406:3 in 'get'
  from ../clarsen/workspace/test/ssltest.cr:8:3 in '->'
  from ../../snap/crystal/539/share/crystal/src/primitives.cr:255:3 in 'run'
  from ../../snap/crystal/539/share/crystal/src/fiber.cr:92:34 in '->'

jhass · December 5, 2020, 9:40pm

I don’t think that’s the same error unfortunately. Using the nightly build probably just happened to shuffle things around in the stack enough so you would not hit the corruption or hit it differently.

HCLarsen · December 5, 2020, 10:05pm

It’s happening in the same pattern. I did confirm that it only happens inside of the fibre.

Topic		Replies	Views
Alpine static compiled binary failed with Routines:tls_process_server_certificate:certificate verify failed (OpenSSL::SSL::Error) when run it on CentOS 8 Help & Support	4	356	September 21, 2022
HTTP::Client - Connection reset by peer Help & Support	8	615	June 11, 2020
Get into https webpage Help & Support	3	458	April 8, 2019
I feel so helpless right now	8	533	September 10, 2019
Trouble with fibers Help & Support	2	388	April 1, 2019

SSL Error, but only in a Fibre

Related topics