RFC: Extending API Docs and Compiler Semantic Output

nobodywasishere · October 16, 2024, 2:27pm

This is a discussion about this PR which adds support for lib bindings and private/protected objects to show up in API docs, as well as a helper method that enables generating docs for the entire namespace. @straight-shoota recommended there be a deeper discussion which turns into an RFC, before a PR like this is merged, to make sure we don’t miss anything important.

Lib bindings are really useful for libraries (like GitHub - sol-vin/raylib-cr: LibRaylib Bindings for Crystal.) which are meant to be used directly, as well as for the developer themselves to reference. Even if they aren’t “supposed” to be used directly, removing them from the API docs removes documentation for a lot of valid use-cases. Not every lib binding can have or needs a crystallised wrapper over it.

As well, the documentation generator is currently the best method for exporting semantic information from the compiler, which is really useful for tooling. For this use-case, it would be more beneficial if everything could be included in the semantic output, such as protected/private methods/classes/etc. This would enable autocomplete / other IDE-like features (as an example) for everything, not just the publicly accessible methods. Whether this includes :no_doc:, I am personally in favor for completeness.

Some discussion points:

Does this miss any important use-cases?
Is there a better way to fix these issues other than this approach?

kojix2 · October 16, 2024, 2:55pm

I’ve been wanting this feature! Although users probably shouldn’t touch the c binding functions directly, I think it would be useful to be able to generate documentation. Good luck, meow.

straight-shoota · October 16, 2024, 3:24pm

Maybe we should talk very generally about what’s supposed to be in the API docs in the first place.
The primary target are users of the library (e.g. a shard), so it should contain everything that’s important for using it.
Implementation details like helper methods are usually not relevant.
So far lib bindings have been considered an implementation detail in the doc generator. I understand there are reasons why you would want to still expose them, so making more flexible rules seems like a good idea. I suppose it’s up to debate whether that’s configuration in the code or for the doc generator, though.

Flexibility is also useful when it comes to restrictions based on visibility. Currently the docs generator consideres all private or protected methods as undocumented implementation details. But I’m not sure that’s necessarily true. These visibility modifies only declare from which scope the method can be called, i.e. if it’s part of the external interface. Usually external and documented are identical, but there are can be exceptions.
For example for types that are supposed to be inherited, the internal API may include methods that are relevant for inheriting types, but are not exposed in the external API. An example for this in stdlib would be IO::Buffered#unbuffered_read etc. which implementing types usually declare as private. A user of this type should not call these methods directly, but implementors of subtypes explicitly need them.
So I think we might need more flexibility to say for example a private method is considered documented API.

IMO targeting developers of the library is not a very important use case for the API docs. They’re usually interested in more details which are typically better provided in the source code directly. Still, there might be some merit in having an option to generate documentation for the entire codebase without excluding anything.
And this is certainly true when it comes to semantic output that enables autocomplete and other IDE features.

sol.vin · October 16, 2024, 3:41pm

Thank you for referencing my lib specifically, this is a feature I have definitely put input on in the past. Really all I want is a crystal docs --include-lib Raylib style feature and I’m stoked.

A couple points:

When newbies ask for docs, it’s awful to recommend they read the source code. While it isn’t THAT hard, they completely freeze when they see things they skipped in the gitbook cause they didn’t need them.
It’s really nice for me to not have to open my source code and just read and search my docs via the web browser.
Names get crystalized/changed. That means we end up with stuff like fun save_file_data? = SaveFileData : Bool or it uses a Crystal protected keyword (like end/begin/rescue etc) and it may not be apparent what the name should be to a newbie, having docs would fix this significantly.
To expand on some reasons wrappers are not always a good fit

Some libraries are small and don’t need to be wrapped. If it takes only simple data types/hides pointers from the user you don’t really need to wrap anything. If the returned struct references something that isn’t in memory then you probably shouldn’t wrap it.
For example: Raylib has a Texture struct. This struct does not contain any pointers to any data and is entirely simple datatypes, however the id of the struct references something in the GPU memory. I can’t really wrap this, I don’t want it to accidentally fall out of scope and get garbage collected and unloaded when it shouldn’t, and really only the user should say when to unload and load textures anyways. I can’t wrap it with something like a block because you may need to load many textures. Another wrapping method is to wrap the load and unload in a block like:

def load_texture
  t = load_the_texture
  yield t 
  unload_the_texture t
end

load_texture do |t|
  # do texture stuff
end

But you don’t really want to do that either since this is gross when you are loading large amount of textures.

Wrapping creates a much much larger code base. If I were to wrap raylib it would probably quadruple the amount of code I have to maintain. Now, if this wrapping process was more automatic I might consider it (maybe there are some macros that could be written to speed things up), but it’s not. This kind of goes hand in hand with the previous 2 points as well.
Consider the following:

lib MathLib
  fun some_operation = SomeOperation(a : LibC::Int, b : LibC::Int) : LibC::Int
end

Should we wrap this, I would say no, it produces a simple result and takes simple arguments, why complicate something like this? Most of the raylib functions are like this. The only parts that were wrapped was RayMath which contains all the Vector2/Matrix stuff attached directly to the structs for things like operator overloading (v1 + v2 instead of RayMath.vector2_add(v1, v2)) since doing that makes writing anything math related much easier to read. Doing this however ballooned RayMath from a 105 line file to a 661 line file. This is just hooking up 4 structs to the appropriate mathematical functions. Imagine how this looks on the much more massive 1100 line Raylib definition. Now imagine what that would be like for something like vulkan.cr/src/vulkan.cr at master · malte-v/vulkan.cr · GitHub . What a nightmare.

I am 100% on board with this PR.

straight-shoota · October 16, 2024, 3:54pm

According to your description, you would want Raylib to always be part of the API docs. For that purpose it would probably be more benefitial to explicitly declare the lib type as documented in code (:doc: maybe?), rather than a CLI option for the doc generator.

sol.vin · October 16, 2024, 3:59pm

I’m ok with this, however, I don’t know if this would include it in the docs of any project that uses Raylib. There may be a game or something that may not want the raylib docs polluting their games docs.

nobodywasishere · October 16, 2024, 3:59pm

It wouldn’t be included as the docs generator by default doesn’t include docs for shards

nobodywasishere · October 16, 2024, 4:01pm

This may be fine but may be annoying to update every shard that wants to document their libs, and would mean releasing new versions to have updated docs (plus no libs for historical versions).

straight-shoota · October 16, 2024, 4:28pm

What you’re mentioning are specific transition problems. And I’m sure we can find solutions for them (or not if we don’t consider it relevant enough).
But we should not let concerns for legacy behaviour be an obstacle to make the best for the future.

sol.vin · October 16, 2024, 5:02pm

Just wondering what the issue with crystal docs --include-lib Raylib would be? Basically, if ran on raylib-cr it would put the Readme in the docs and include all methods/functions/structs/etc in Raylib in the docs. If there are other files that are including documentation (other classes than Raylib) should just be added to the docs with Raylib

I’m all for allowing us to specify if private/protected methods should be exposed via docs, but the lib thing is entirely different right? I thought it was more of a thing where it should only be included at the running of the docs generator, rather than something the lib author should explicitly say “I’d like this lib’s docs to output EVERYTIME anyone requires it into anything”. I’d much rather just specify what libs should be output to the docs at the command line instead.

I might be misunderstanding what is wrong here though but basically I don’t want my lib to pollute other peoples docs, just document itself in it’s own project.

nobodywasishere · October 16, 2024, 5:19pm

If we go with the --include-lib flag, I would prefer it to be a binary include/exclude for all libs in the current project, versus needing to specify which libs to include (coming from the use-case of crystaldoc.info). Using this flag won’t include docs for libs that are dependencies of the current project, unless you also use the equivalent of --include-all, so I wouldn’t worry about pollution.

straight-shoota · October 16, 2024, 5:30pm

It’s not polluting. If your project considers a lib type as part of its documented, external API, then that’s fine. And the source code should state that. I think it’s exactly the same as with private methods, and the inverse of :nodoc:.

nobodywasishere · October 16, 2024, 5:35pm

Could we use :doc: for private/protected methods as well?

And as far as semantic output for tooling, we could have the --include-all flag bypass the doc/nodoc comments. Another thought is having a separate compiler tool (crystal tool semantic?) that’s more dedicated for this task / is separate from this discussion.

Blacksmoke16 · October 16, 2024, 5:42pm

I’d be a fan of this as well. Mainly for this use case, as it would be nice for readers of the API docs to see that to know they can/have to implement some protected methods too.

It was brought up in the past as maybe calling it :showdoc: would pair well with :nodoc:.

sol.vin · October 16, 2024, 6:39pm

I’m guessing that Crystal would ignore the lib directory for crystal docs? If so I am 100% on board with this method!

bcardiff · October 16, 2024, 6:50pm

having a separate tool would be better IMO. The codebase could be shared but the cli options would be less complex.

I think the discussion for a separate tool would intersect this one though regarding when to suggest/autocomplete private/nodoc definitions. When developing the shard is fine, but private/nodoc of dependencies should not appear probably.

nobodywasishere · October 16, 2024, 8:34pm

I think this is only true unless you’re inheriting from a class in a shard, in which case having those private methods is important. As I don’t expect many people to go around adding :doc: to their private methods, it’s better to be on the safe side and include everything possible in the semantic output, for the final tool to make determinations about what to do. The user may try to go to the definition of a private method in another shard, for example.

ysbaddaden · October 18, 2024, 10:17am

I agree that documenting protected and private methods as well as library bindings is often pointless, but just like we can hide things from being documented (i.e. :nodoc:), we should be able to expose anything we want documented (e.g. :doc:).

A library doesn’t have to only expose a public interface. It may propose a private interface API that can be implemented, swapped or overridden, without having them be accessible publicly.

Here are a couple use cases to give more weight to the proposal:

in my Earl shard we create actor classes by including Earl::Agent that defines an abstract #call method that must only be called internally from the public methods (either #start or #spawn that manipulate the state). It should be protected or even private, but for the method to be documented I must mark the method as public.
If Crystal::Evented::EventLoop from the evloop PR was a shard, then I wouldn’t be able to document the abstract system methods nor the rest of the protected methods that can be overriden by each implementation.
If I recall correctly, when we alias a lib enum into a public crystal type, then the enum values aren’t documented. We could define the enum outside of the lib, but that doesn’t work well when the bindings are autogenerated.

nobodywasishere · October 18, 2024, 10:06pm

Another thing to keep in mind is what happens when we have something with :doc: nested inside a lib (or a different namespace that’s :nodoc:), like:

# doc comments for MyLib
lib MyLib
  
  # :doc:
  fun my_func : Void

  fun my_other_func : Void

  enum MyEnum
    FirstValue
    SecondValue
  end
end

I think my_func should be shown, and this necessitates that MyLib be shown as well, but not my_other_func or MyEnum.

If MyLib is compelled to be shown instead of explicitly being shown, do we include its doc comments? I’m not sure.

straight-shoota · October 19, 2024, 9:29am

Conceptually it would be easier not to transfer doc status from nested elements to their parents. I think this could get quite nasty.

If MyLib.my_func is to be documented, both MyLib and my_func need to be documented. The latter would obviously be implicitly documented when its parent namespace is.

Maybe a lib function is not an ideal example because its nodoc by default and if it gets documented, it probably doesn’t make much sense to expose only a subset of its fun definitions.

Let’s take a different example with explicit :nodoc::

# :nodoc:
module Foo
  # :doc:
  private def self.foo
  end
end

IMO Foo.foo should not be documented because the entire namespace is undocumented.
The doc generator might produce an error or warning on this constellation (or leave it to a linter).

Topic		Replies	Views
How to generate documentation for Crystal's compiler? Help & Support	7	213	December 8, 2023
Crystal Lib Support Community	5	556	August 29, 2022
Generated API docs for the compiler Learning Resources	0	430	April 18, 2021
New documentation site generator: mkdocstrings-crystal News	3	3667	January 13, 2021
Question about git documentation Help & Support	9	478	January 20, 2023

RFC: Extending API Docs and Compiler Semantic Output

Related topics