Prototype BlingFire's Crystal binding, a tokenizer for Crystal

Hello. I have been fascinated with ChatGPT for the past month and working on a command line tool to use ChatGPT in Crystal. This is very useful, in my opinion, but I have not completed it and would like to present it another time.

While building this tool, I realized that there was no way to pre-count the number of Tokens. So I used Sunday to create a binding for Blingfire. My goal was to finish implementing it in 2 hours, but it took me 6 hours. I just rewrote ankane’s blingfire-ruby code in Crystal.

It now passes the minimum test.
With the worldwide ChatGPT boom, there must be a need for a library that can convert strings and tokens in the Crystal language world. I have taken the first steps, but I believe this library still has many bugs.

I usually don’t report that we have created such a trivial library, but tokenizers are important and requested everywhere. I hope someone will develop a more useful library. For example, huggingface/tokenizers bindings would be nice, but it was difficult for me to implement bindings because they are written in Rust.

Thank you.

8 Likes

I have created a Crystal binding for tiktoken.
Please send me a pull request if you find any bugs.

3 Likes