AI library for Crystal

Fulgurance · July 18, 2024, 1:59pm

I used the one you put as screenshot on your previous post.

For the number of threads, on Linux you can get the number by just running that command:

nproc

Fulgurance · July 18, 2024, 2:07pm

So crimson, just to let you know. Basically what I would like to do (if I can get an example), it’s I send a question to the AI by setting a prompt (I would like to load it from a file), then get the answer, and save it in a file, or in a var inside my crystal code. And then close. It’s what I want to do basically.

I want that var or that file just contain the answer, nothing else.

I don’t know if you get my point ?

crimson-knight · July 18, 2024, 2:30pm

Okay I had assumed you were using Apple hardware. The way llama.cpp works is very dependent on your hardware, and I haven’t had a chance to thoroughly test on non-Apple hardware.

nproc lets you know the number of CPU cores available for threads, but you will want to set the threads to the number of threads available for your GPU. Also, depending on the amount of VRAM you have, the model will get loaded only partially into memory and processing gets shared between CPU and GPU. Performance takes a big degradation between CPU and GPU, so if you’re using CPU only I would recommend using a smaller model like Llama3 8b.

You can definitely do what you’re trying to do, I think my last code example illustrates that, you’ll just need to read from a file and create the prompt you want. Then read from the response to save your file. I do this know with creating an OpenAPI spec. I can probably update my example later today or tomorrow morning for you. Do you have an example of a command you want it to run and I can aim to demo that?

I’ll try to see if I can figure out why the Mixtral model is working on my work computer but not my personal computer (both Apple M-series) and see if I can offer some settings to help.

Fulgurance · July 18, 2024, 2:35pm

I find how to do ah ah. I just passed a context file, and redirect the answer to another file, in completely quiet mode to get the answer without any other text:

./llama-cli -m ../dolphin-2.7-mixtral-8x7b.Q4_K_M.gguf --chat-template vicuna-orca --n-predict 512 --threads 16 --ctx-size 2048 --temp 0.9 --top-k 80 --repeat-penalty 1.1 --no-display-prompt -f context.txt  > result.txt

Context file:

The user interact with the assistant named epsilon. If the user ask for the time, you must just answer COMMAND:time --show, nothing else. When you answer, you must never continue any conversation.

### User: what's time is it ?

File with result:

COMMAND:time --show

So basically I just need to use Process.run and that it …

Fulgurance · July 18, 2024, 2:41pm

It’s crazy but I am thinking actually, probably soon most of the linux distribution component we know actually will become obsolete with AI. Like the init system, the shell… etc

Fulgurance · July 18, 2024, 9:12pm

I have one question. Why on my laptop the model take quite a bit of time to answer ? When I test llama online it’s very quick. And my laptop is quite powerful …

kojix2 · July 19, 2024, 3:03am

Oh, I usually try to avoid making negative comments, but I can’t hold back on this topic. In the AI programming, I think the most important libraries are those for matrix computation. In Python, this is NumPy. The Ruby community has also struggled a lot in this area.

Ruby has Numo::NArray and Cumo, which work on GPUs. There used to be a library called NMatrix, but in the end, Numo became the de-facto standard library. In Ruby, ankane and yoshoku have created many AI-related libraries 1 2, which are built using C library bindings and Numo.

In Crystal, developing matrix computation libraries is even rarer than in Ruby. Still, there is num.cr.

Maintaining foundational libraries like NumPy or Numo is very difficult. Languages other than Python and C++ have not been very successful in doing this. Even popular languages like Rust seem to face challenges. (It may be hard for statically typed languages to handle matrix computations when the dimensions and sizes of the matrices are not known. rust-ndarray)

Therefore, making bindings for libraries like ONNXRuntime or connecting to API servers provided by tools like Ollama seems to be the practical approach.

However, this is not interesting at all! Of course, wrapping llama.cpp or re-implementing it in Crystal would be much more amazing. But, oh, it would feel as challenging as climbing Everest. With my skills, not only is climbing Everest impossible, but even climbing the hills around me is difficult. Still, I understand how tough it is…

(This text was written with the help of ChatGPT)

zw963 · July 19, 2024, 9:01am

Hi, @crimson-knight , is there possible to use AMD [rocm]GitHub - rocm-arch/rocm-arch: A collection of Arch Linux PKGBUILDS for the ROCm platform to speed up the process of talking with AI?

it so slow when i running following code on my Arch Linux, with AMD 7480hs (780M integrated GPU), 64G memory, and use all my 16 core of CPU, temp to 86, but GPU is free, waiting a long time, still could get the answer.

require "llamero"

model = Llamero::BaseModel.new(
  model_root_path: Path.new("/home/zw963/models/"),
  model_name: "Meta-Llama-3-8B-Instruct-Q5_K_M.gguf"
)

puts model.quick_chat([{role: "user", content: "Hi!"}])

 ╰─ $ ./ai_example
2024-07-19T08:54:48.274588Z   INFO - Interacting with the model...
2024-07-19T08:54:48.285018Z   INFO - The AI is now processing... please wait

Fulgurance · July 19, 2024, 7:29pm

I think I will try to find another AI optimized for Linux. I thought Llama was more for Linux… Because we loose a lot of performance

Fulgurance · July 19, 2024, 8:15pm

That one is really impressive: https://pi.ai/desktop

Do you know how can I just simply send a request in Crystal to that AI with http client, and get the response ? I am just extremely bad with network

kojix2 · July 20, 2024, 12:56am

Actually, calling HTTP APIs is not very difficult. You can use Crystal’s standard library, but the standard library is often not user-friendly, so I prefer Crest. I have made some command-line tools to call APIs. Basically, you open the API reference page and write the query in Crest as described there. (I couldn’t find the API page for pi.ai right away. Do I need to register for the wishlist?)

I have created several command-line tools that call HTTP APIs. These may not be very useful, but here are the links:

Based on the following Ruby code, it should be easy to write Crystal code to call Ollama’s API.

GitHub - gbaptista/ollama-ai: A Ruby gem for interacting with Ollama's API that allows you to run open source AI LLMs (Large Language Models) locally.

This kind of work is not very difficult. What is really important for the Crystal community is developing basic libraries for matrix calculations. Another important task is creating binding libraries for C and C++ libraries. I would greatly appreciate such efforts.

(This text was written with the help of ChatGPT)

Fulgurance · July 20, 2024, 8:45am

This is the documentation . If I can just have on example

Fulgurance · July 20, 2024, 12:49pm

I think I find the documentation:

https://docs.aveva.com/bundle/pi-web-api-reference/page/help.html

Fulgurance · July 20, 2024, 3:57pm

So I finally find why it was lagging. You must set the number of the cpu cores for ONE of the processor.

For example in my case, if I get the information about /proc/info, I should use the number of cpu cores for a single core.

In my case it’s 8, not 16.

Now the answer is really fast.

So I will back to llama now !

Look at that post crimson-knight for the explanation:

https://www.reddit.com/r/LocalLLaMA/comments/190v426/llamacpp_cpu_optimization/

kojix2 · March 13, 2025, 8:34am

Over the past few weeks, there’s been a sensation about programming by AI agents. I, too, am amazed by the power of this technology.

Today, I asked Cline (claude-sonnet) to implement either NumPy or Numo::NArray (for those who aren’t familiar, this is the standard library in Ruby equivalent to NumPy) using the Crystal language, and the result was quite impressive.

In just another two years, regardless of how obscure the language may be, we may soon be able to simply request an implementation of a matrix computation library like NumPy.

Before long, I believe we will discover new technological gaps or areas where human expertise is indispensable, but for now, I am simply in awe of this tremendous technological revolution.

(Translated from Japanese by ChatGPT)

kojix2 · March 14, 2025, 2:52pm

I realized something extremely obvious: using a matrix library in Crystal doesn’t provide any performance boost. In Python or Ruby, using NumPy or Numo::NArray naturally results in a speedup, so this feels strange (though it’s not strange at all). It seems that in a language like Crystal, matrix computation libraries exist primarily for readability and ease of writing rather than for performance improvements.

HertzDevil · March 14, 2025, 3:26pm

This is because the parts of NumPy like linalg are written in C translated from Fortran code that has been continually improved for over 45 years (BLAS was released in 1979), rather than textbook implementations that LLMs have no issues regurgitating.

It might sound counterintuitive that Python and Ruby are faster than Crystal, but this is because the fast parts are written in neither Python nor Ruby (nor in this case, C) to begin with. You are better off using something like GitHub - konovod/linalg: Linear algebra library based on LAPACK instead.

kojix2 · March 14, 2025, 4:14pm

I’m not aiming to solve a real-world problem with this, so I prefer a simple, textbook-style library implemented in Crystal rather than a LAPACK binding.
But apart from that, I feel that things that seemed impossible a few months ago—like AI creating a faster BLAS in 20 minutes or LLVM making a huge performance leap—are now quite possible in the future. No one knows what the future holds, though.

Topic		Replies	Views
Would anyone like to help create an opensource AI training dataset for Crystal? Community	5	421	July 16, 2024
Open AI Client library v0.1.0 News release	19	835	June 21, 2023
Working libraries Help & Support	5	433	August 31, 2019
I want to integrate Ai with crystal Help & Support	7	160	December 20, 2024
Usable GUI library Community	5	3142	February 9, 2022

AI library for Crystal

Related topics