URI absolute vs relative

I am reading the RFC for URI because I did not know the term “absolute”. It seems to me that URI#absolute should check against a formatter to make sure the URI complies to the schema specifications. Do we need fomatters for each schema and do some check before returning true?

An URI is considered absolute when it has at least a scheme (and technically, a non-empty hier-part).
I don’t see a need to take schema specifications into account. That’s a complex task and best suited for specific schema implementations.

Ok I have re read this a few times and I think I understand it more now. So URI.absolute should check for a schema and possible a non empty hier-part but something like a formatter should be moved to a valid? method or something.

Not sure what kind of formatter you’re talking about, but yes, URI#absolute? is just about having a fully qualified URI with a scheme.

I guess the formatter is a solution to the schema specification. So really what I mean is an absolute URI a URI that complies to the schemas specification.

   URI scheme specifications must define their own syntax so that all
   strings matching their scheme-specific syntax will also match the
   <absolute-URI> grammar

Ah I see. I suppose you probably misinterpreted scheme-specific validation as a property of an absolute URI, when it is actually the other way around.

However, I’d add to that quote that relative URIs are not exemt from scheme-specific syntax. They are to be resolved in the context of an absolute base URI and the resulting absolute URI is supposed to match the scheme-specific syntax. So a relative URI is meant to match a subset of that syntax.

1 Like

I have been digging into the Crystal URI parser as I have tried adding examples and docs.

first being “g:h” I think should have a schema of “g” and an authority of “h”. Crystals parser makes it a “h” a path.
second being “mailto:urbu@orbi.va” which I think should have a host of “orbi.va”. Crystals parser makes “orbi.va” a path not a host.

I think the parser in general biases to making things paths and that is throwing off my little project. I am trying to find a source of truth for this but it is hard reading the FRCs. I would like a tool like https://regex101.com/ so I can see how the parsing actually works out.

code samples: Carcin

1 Like

The parser behaves as expected and in full accordance to the RFC.

An authority is always prefixed by double forward slash after the scheme (and colon). If there is no double forward slash, the URI doesn’t have any authority component and the path follows directly.
The Wikipedia page Uniform Resource Identifier has a nice syntax diagram and helpful explanatory notes on the individual components of an URI.

In the case of a mailto scheme it might be confusing that part after the scheme looks like the userinfo and host components of a URI authority. But there is no hierarchical information in that URI. There is just an opaque path in the terms of generic URI syntax. Only the scheme-specific syntax defines that this path is actually a mail address consisting of a user and host component.

1 Like

That wiki is the best source I have seen for what the parser should do. I guess some of the results are not what I expected. I guess I am still on my journey of learning more about this. Thank you for all the info.