Training ChatGPT on Crystal's standard lib and syntax

crimson-knight · April 7, 2023, 1:56pm

Has anyone tried creating a way to use all of the std lib documentation and train OpenAI/ChatGPT on Crystal’s standard library?

I know that when I ask for examples in Crystal I currently get a lot of mixed ruby syntax in there, so why not take the thousands of lines of code and comments and turn it into good training material for ChatGPT?

I’m just thinking about how we can lean into the AI revolution that is happening and augmenting developers around the world right now.

Being a “younger” and smaller language compared to others, we have the major advantage of a well documented std lib that’s readily available and strongly versions. This is a huge strength because we can more easily train AI models how to best write this language. Has anyone tried it yet?

I’m going to experiment with it. Part of Amber 2.0s philosophy is that I want to lean into AI tooling augmenting developer productivity and the standard lib is critical to getting ideas ramped up and prototyped/outlined quickly. Having good documentation that can also be used to create training material for version releases is going to be incredibly important in the future, and I want to see this community get established quickly and early.

kojix2 · April 9, 2023, 10:59pm

Yes, it would be wonderful if that could happen.

I recently learned that you can do fine tuning of models using OpenAI’s API. However, the training data would need to be in the format shown below.

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

I don’t know how to generate such training data from the documentation. How can we easily generate training data from documents? Or maybe ChatGPT can be used for that as well. (Prompt: Please read the given Crystal documentation and generate a collection of possible questions and answers from its content)

crimson-knight · April 10, 2023, 1:39pm

@kojix2 I’ve got that info as well, I was thinking of starting and experimenting with Amber since we have a lot of help docs that could more easily be turned into the “prompt” and “completion” JSON. I also think GPT 4 could fairly easily be used to take explanation/comment text and turned into prompt/completion text as well.

I need to experiment with it some more, probably this weekend depending on the weather and my wife’s demands for yard projects.

But big picture and long-term, I think adding something into either Amber or making a PR on crystal-lang repo to expand the docs command to output docs into a json/csv format that would allow training models including but not limited to ChatGPT. Other models require a format more like:

[
  {
    "method": "get",
    "example": "response = HTTP::Client.get(\"https://api.example.com/products/42\")"
  },
  {
    "method": "get",
    "example": "response = HTTP::Client.get(\"https://api.example.com/comments?post_id=7\")"
  },
  {
    "method": "get",
    "example": "response = HTTP::Client.get(\"https://api.example.com/search?q=crystal+language\")"
  }
]

I generated this example by giving ChatGPT this link HTTP::Client - Crystal 1.7.3 and asked it to provide examples of training and validation data for training itself.

This doesn’t link up with the example format provided by OpenAI in their docs, but their docs also say the CLI tool can convert common training model formats into what it needs. I want to play around with this more, I can’t wait to figure this out

kojix2 · March 17, 2025, 2:22pm

I am not a professional engineer, but I have been enjoying generating simple Crystal language code using recent AI assistants like “Cline” and “Claude sonnet.” These AI tools enable me to casually create web applications or Android applications, providing both personal enjoyment and educational benefits. However, due to security concerns, my use of these tools remains strictly personal.

The macro feature of the Crystal language is not fully documented officially. To gain a deeper understanding of macros, one often needs to directly refer to the source code of the standard library. The reason macros are not extensively documented could be intentional, as they might currently lack stability or there may be an effort to discourage their misuse. Nonetheless, uploading Crystal’s official documentation and standard library into an AI service like Perplexity—where uploaded files can be searched and utilized—could potentially help facilitate understanding and practical usage of macros.

On another note, while the Crystal community typically avoids overly flashy promotions, someone with strong communication skills could create and share engaging content such as a “Build an app in 10 minutes” video using the Crystal language and the Kemal web framework on platforms like YouTube (though I myself am not suited for such a task).

(Translation from Japanese using ChatGPT)

kojix2 · March 17, 2025, 2:34pm

The following post was recently discussed in Ruby.
Ruby’s Renaissance in the AI Era by Yacine Petitprez

Blacksmoke16 · March 17, 2025, 2:51pm

I don’t think this is entirely true. The reference book has Macros - Crystal which is the high level intro to macros, and the macro API is found under Crystal::Macros - Crystal 1.15.0. Is there a specific part that you think is missing/lacking?

kojix2 · March 17, 2025, 2:56pm

What I had in mind here is around method_missing.
I’m certain this will intentionally remain undocumented in the future, as its use is generally discouraged, but when you look at actual usage examples in the standard library, you find that it’s capable of richer applications than you’d expect. I suspect there are probably several other cases like this as well.

Blacksmoke16 · March 17, 2025, 3:04pm

It’s documented within the Hooks - Crystal section. But yes my understanding is its usage is somewhat discouraged as there are likely better ways to handle it. But as of now it’s not explicitly deprecated or anything.

kojix2 · March 17, 2025, 3:09pm

All it actually says on that page is that there is method_missing. You need to actually look at the code to see how the arguments and keyword arguments are handled. And my understanding is that it is not recommended.

Blacksmoke16 · March 17, 2025, 3:18pm

I’m not sure I follow. method_missing is a macro that you implement yourself that only has a single Crystal::Macros::Call parameter. What you do with it is entirely up to you. What code are you having to look at? Or do you mean like if a shard is using it there’s no way to see what they’re doing with it?

kojix2 · March 17, 2025, 3:35pm

Yes, you can use Crystal::Macros::Call to use method_missing more finely, but it is not possible for me to understand this from the description on the specification page above.

kojix2 · March 23, 2025, 11:02am

I actually tried it and wrote a blog post. It didn’t take 10 minutes—it took me a whole day. But it works. If you’re into Crystal and web development, you can probably do it much better and faster.

Topic		Replies	Views
Would anyone like to help create an opensource AI training dataset for Crystal? Community	5	430	July 16, 2024
This use of ChatGPT could be interesting in relation to Crystal	0	241	August 1, 2023
AI library for Crystal Help & Support	60	787	August 2, 2025
Prototype BlingFire's Crystal binding, a tokenizer for Crystal News	1	391	August 21, 2023
ChatGPT Learning Resources	1	496	December 19, 2022

Training ChatGPT on Crystal's standard lib and syntax

Related topics