AI library for Crystal

Fulgurance · July 15, 2024, 7:07pm

Hi guys, I have a question. Maybe it’s an very early question for a young language like crystal, but I would like to know if there is any library to integrate IA in a desktop application (graphic or CLI) ?

Blacksmoke16 · July 15, 2024, 7:13pm

IA? Do you mean AI? Or?

Fulgurance · July 15, 2024, 7:25pm

AI sorry, I wrote in the french way

Blacksmoke16 · July 15, 2024, 7:32pm

I know @crimson-knight has some stuff he’s been working on, if it’s ready to share.

crimson-knight · July 15, 2024, 9:09pm

@Fulgurance I wrote a shard for exactly this

I have more docs and more examples on the way. But for now if you use Cursor and add the ai_docs for each of the categories it’ll let the AI coding assistant use the lib as pictured in the screen shot attached

Fulgurance · July 15, 2024, 9:37pm

I have one question . When you send a request to llama like the example:

require "llamero"

model = Llamero::BaseModel.new(model_name: "meta-llama-3-8b-instruct-Q6_K.gguf")

puts model.quick_chat([{ role: "user", content: "Hey Llama! Tell me your best joke about programming" }])

The request is just executed locally, or it need to interact with a server ? I mean like chagpt, request are sent online (if I am not wrong)

jgaskins · July 16, 2024, 4:52am

This is fantastic! I was just checking this thread last week to see if you’d updated it.

In the interim, I’ve been working on some other Crystal things for hosted AI platforms and tooling instead:

jgaskins · July 16, 2024, 4:57am

No server needed for that shard. It loads the AI model into your Crystal process. But you do need to have the model file available to your Crystal program at runtime.

Fulgurance · July 16, 2024, 5:08am

Okay nice ! The same for your Claude shard ?

I have too much choice now x) So exciting !

naqvis · July 16, 2024, 6:12am

I’ve also created an open-ai integration shard crystal-openai.

Unofficial Crystal language shard for OpenAI API and Microsoft Azure Endpoints. Shard supports:

ChatGPT
GPT-3, GPT-4
DALL·E 2
Whisper

HIH

crimson-knight · July 16, 2024, 12:08pm

Taking my experiments from the original idea in March to the full shard took longer than I expected. Since it’s currently just wrapping llama.cpp, there was a lot of undocumented behavior that I had to reverse engineer.

Long term, I’ll be moving the inference entirely off of llama.cpp and into Llamero so it becomes a 100% crystal application.

I need to push a patch for the tokenizer, and then I can cut a release version for a stable early release. I will do that within a day or two.

I’m currently using Llamero in a production beta app that does static analysis of our code base (for the primary product) to generate an Open API Spec so we can scan with Bright. It’s a legacy Rails monolith that’s been updated and passed through multiple generations of devs over the last decade, so having the AI do the hard work has been convenient.

Grammars are a super power that are totally being slept on right now. I don’t understand why. But with more hardware being developed specifically to run models, and Crystal being a compiled language, we are going to be able to write pure Crystal applications and run them as embedded AI very soon. I’d guess within 18 months or less. It depends on when companies like Etched get their hardware generally available.

crimson-knight · July 16, 2024, 12:46pm

Just to be clear, there is not a “request” that happens. The prompt is sent directly to the model and executed on your machine.

If you have an M1 Pro or better laptop, you can run some of the medium size models on your local machine and have the AI influence the flow of your app at run time.

jgaskins · July 16, 2024, 3:17pm

I wish that were possible. :-D The Claude models aren’t available for download (as far as I know) and probably require far more powerful hardware than any of us have — they’re probably on the order of 400B-parameter models. So my shards require sending requests to remote servers, including providing API keys.

Fulgurance · July 16, 2024, 5:02pm

So I did some test with the llamacpp version first, because I want to understand how it work.

I am using that model as suggested to me someone: TheBloke/Llama-2-13B-chat-GGUF · Hugging Face

I have 2 problems. First why when I ask a question, I get a lot of extra text I don’t want. Like it show a virtual conversation between the bot and a fake user. I guess I need to pass some extra parameter. My second question is probably more tricky. How can I allow llama to run some bash command in my system ?

I passed that command:

./llama-cli -m llama-2-13b-chat.Q6_K.gguf -cnv --color --chat-template vicuna-orca -p "Your name is Nexus and your goal is to help the user to manage it system" -t 13 --no-display-prompt

zw963 · July 16, 2024, 6:07pm

beta-ziliani · July 16, 2024, 6:41pm

Fixed for future reference

jgaskins · July 16, 2024, 7:17pm

Machine-learning models are all basically about inference — they infer missing information based on the information they do have. These are specifically large language models, so their inference is basically “what comes next in this text?”. If you’ve ever heard of Markov chains, you can think of these models as kind of a more powerful implementation of that idea.

Simon Willison gave a great talk at PyCon US this year where he breaks down why LLMs do it this way (link goes directly to that point in the presentation). The entire talk is great and I would recommend watching it to understand this kind of model a bit better.

For the talk I mentioned above, the speaker wrote a script that runs a model in the background to show on his screen the number of times he says “AI” during the talk. That script is available on GitHub (he mentions the repo in the talk) if you want to experiment with running code suggested by the model.

Fulgurance · July 16, 2024, 9:58pm

I can’t wait ! Can you let me know when it will be ready ?

Fulgurance · July 16, 2024, 10:00pm

So basically, even I have got a model, I need to train it to interact better basically to do this kind of system maintenance.

I will have a look, because I need to understand how to give access to the AI of my system. Actually it’s weird, if you run it locally, even when you just ask simple question like what’s time is it, it answer wrong

jgaskins · July 16, 2024, 10:40pm

Large language models only generate text. They don’t have the ability to run commands. To do that, you have to tell the model what tools (“tools” is a generic term for an action to be taken) are available and then the model will tell you, based on information in the prompt, what tools to run and what arguments to pass to them. Then you have to run the tool and, if it makes sense for the given command, you pass the output back to the model.

So if you want it to tell you the current time, you have to tell it that you can figure out the current time for it and pass it back in. For example, with the Claude client I linked above:

require "anthropic"

claude = Anthropic::Client.new # assumes API key is in the ANTHROPIC_API_KEY env var

response = claude.messages.create(
  messages: [Anthropic::Message.new("What is today's date in Kansas City? Use a format like 'Tuesday, January 23, 2023'.")],
  model: Anthropic.model_name(:haiku),
  # We expect this to be a short response so we give it a small token limit
  max_tokens: 100,
  temperature: 0.1,
  tools: Anthropic::ToolHandlers{
    # We need to be able to invoke a tool to get the time and date because LLMs
    # have no idea what today is or any other current information. They only know
    # facts from before they were trained.
    GetCurrentTime,
  },
  system: <<-SYSTEM
    Be concise in answering the user's questions. Place the value, and only the value, inside <answer> tags.
    SYSTEM
)

if content = response.content.find_first(&.as?(Anthropic::Text))
  # We asked Claude to put the answer in <answer>...</answer> tags, so we pull
  # that out and print it no matter how much extra text it returned.
  puts content.text.match(%r{<answer>(.*)</answer>}).not_nil![1]
else
  puts response
end

struct GetCurrentTime < Anthropic::Tool::Handler
  # This description is passed to the model so it knows how to select this tool.
  def self.description
    <<-EOF
      Gets the current time and date in the UTC time zone. You can translate the
      timestamp to any other time zone.
      EOF
  end

  def call
    Time.utc
  end
end

This code uses an `Enumerable#find_first` method I monkeypatched into my app. If you'd like to do the same, the code is here.

module Enumerable
  def find_first
    each do |item|
      if result = yield item
        return result
      end
    end
  end
end

What makes it work there is the tools parameter. We pass in a GetCurrentTime tool, which tells Claude that I know how to get the time, so if it needs that information, it asks my code for it. That code executes inside my program and the current time is then passed back to Claude. This way, it offers the appearance that the model is calling your code. But it isn’t actually doing that, it’s just generating text. The client is calling it because Claude told it which tool it needs to run to get that information.

Topic		Replies	Views
I want to integrate Ai with crystal Help & Support	7	170	December 20, 2024
Training ChatGPT on Crystal's standard lib and syntax Community	11	606	March 23, 2025
Open AI Client library v0.1.0 News release	19	838	June 21, 2023
Would anyone like to help create an opensource AI training dataset for Crystal? Community	5	430	July 16, 2024
High level LLM libraries? Help & Support	7	181	July 17, 2025

AI library for Crystal

Related topics