Hi guys, I have a question. Maybe it’s an very early question for a young language like crystal, but I would like to know if there is any library to integrate IA in a desktop application (graphic or CLI) ?
IA? Do you mean AI? Or?
AI sorry, I wrote in the french way
I know @crimson-knight has some stuff he’s been working on, if it’s ready to share.
@Fulgurance I wrote a shard for exactly this
I have more docs and more examples on the way. But for now if you use Cursor and add the ai_docs for each of the categories it’ll let the AI coding assistant use the lib as pictured in the screen shot attached
I have one question . When you send a request to llama like the example:
require "llamero"
model = Llamero::BaseModel.new(model_name: "meta-llama-3-8b-instruct-Q6_K.gguf")
puts model.quick_chat([{ role: "user", content: "Hey Llama! Tell me your best joke about programming" }])
The request is just executed locally, or it need to interact with a server ? I mean like chagpt, request are sent online (if I am not wrong)
This is fantastic! I was just checking this thread last week to see if you’d updated it.
In the interim, I’ve been working on some other Crystal things for hosted AI platforms and tooling instead:
No server needed for that shard. It loads the AI model into your Crystal process. But you do need to have the model file available to your Crystal program at runtime.
Okay nice ! The same for your Claude shard ?
I have too much choice now x) So exciting !
I’ve also created an open-ai integration shard crystal-openai.
Unofficial Crystal language shard for OpenAI API and Microsoft Azure Endpoints. Shard supports:
- ChatGPT
- GPT-3, GPT-4
- DALL·E 2
- Whisper
HIH
Taking my experiments from the original idea in March to the full shard took longer than I expected. Since it’s currently just wrapping llama.cpp, there was a lot of undocumented behavior that I had to reverse engineer.
Long term, I’ll be moving the inference entirely off of llama.cpp and into Llamero so it becomes a 100% crystal application.
I need to push a patch for the tokenizer, and then I can cut a release version for a stable early release. I will do that within a day or two.
I’m currently using Llamero in a production beta app that does static analysis of our code base (for the primary product) to generate an Open API Spec so we can scan with Bright. It’s a legacy Rails monolith that’s been updated and passed through multiple generations of devs over the last decade, so having the AI do the hard work has been convenient.
Grammars are a super power that are totally being slept on right now. I don’t understand why. But with more hardware being developed specifically to run models, and Crystal being a compiled language, we are going to be able to write pure Crystal applications and run them as embedded AI very soon. I’d guess within 18 months or less. It depends on when companies like Etched get their hardware generally available.
Just to be clear, there is not a “request” that happens. The prompt is sent directly to the model and executed on your machine.
If you have an M1 Pro or better laptop, you can run some of the medium size models on your local machine and have the AI influence the flow of your app at run time.
I wish that were possible. :-D The Claude models aren’t available for download (as far as I know) and probably require far more powerful hardware than any of us have — they’re probably on the order of 400B-parameter models. So my shards require sending requests to remote servers, including providing API keys.
So I did some test with the llamacpp version first, because I want to understand how it work.
I am using that model as suggested to me someone: TheBloke/Llama-2-13B-chat-GGUF · Hugging Face
I have 2 problems. First why when I ask a question, I get a lot of extra text I don’t want. Like it show a virtual conversation between the bot and a fake user. I guess I need to pass some extra parameter. My second question is probably more tricky. How can I allow llama to run some bash command in my system ?
I passed that command:
./llama-cli -m llama-2-13b-chat.Q6_K.gguf -cnv --color --chat-template vicuna-orca -p "Your name is Nexus and your goal is to help the user to manage it system" -t 13 --no-display-prompt
Fixed for future reference
Machine-learning models are all basically about inference — they infer missing information based on the information they do have. These are specifically large language models, so their inference is basically “what comes next in this text?”. If you’ve ever heard of Markov chains, you can think of these models as kind of a more powerful implementation of that idea.
Simon Willison gave a great talk at PyCon US this year where he breaks down why LLMs do it this way (link goes directly to that point in the presentation). The entire talk is great and I would recommend watching it to understand this kind of model a bit better.
For the talk I mentioned above, the speaker wrote a script that runs a model in the background to show on his screen the number of times he says “AI” during the talk. That script is available on GitHub (he mentions the repo in the talk) if you want to experiment with running code suggested by the model.
I can’t wait ! Can you let me know when it will be ready ?
So basically, even I have got a model, I need to train it to interact better basically to do this kind of system maintenance.
I will have a look, because I need to understand how to give access to the AI of my system. Actually it’s weird, if you run it locally, even when you just ask simple question like what’s time is it, it answer wrong
Large language models only generate text. They don’t have the ability to run commands. To do that, you have to tell the model what tools (“tools” is a generic term for an action to be taken) are available and then the model will tell you, based on information in the prompt, what tools to run and what arguments to pass to them. Then you have to run the tool and, if it makes sense for the given command, you pass the output back to the model.
So if you want it to tell you the current time, you have to tell it that you can figure out the current time for it and pass it back in. For example, with the Claude client I linked above:
require "anthropic"
claude = Anthropic::Client.new # assumes API key is in the ANTHROPIC_API_KEY env var
response = claude.messages.create(
messages: [Anthropic::Message.new("What is today's date in Kansas City? Use a format like 'Tuesday, January 23, 2023'.")],
model: Anthropic.model_name(:haiku),
# We expect this to be a short response so we give it a small token limit
max_tokens: 100,
temperature: 0.1,
tools: Anthropic::ToolHandlers{
# We need to be able to invoke a tool to get the time and date because LLMs
# have no idea what today is or any other current information. They only know
# facts from before they were trained.
GetCurrentTime,
},
system: <<-SYSTEM
Be concise in answering the user's questions. Place the value, and only the value, inside <answer> tags.
SYSTEM
)
if content = response.content.find_first(&.as?(Anthropic::Text))
# We asked Claude to put the answer in <answer>...</answer> tags, so we pull
# that out and print it no matter how much extra text it returned.
puts content.text.match(%r{<answer>(.*)</answer>}).not_nil![1]
else
puts response
end
struct GetCurrentTime < Anthropic::Tool::Handler
# This description is passed to the model so it knows how to select this tool.
def self.description
<<-EOF
Gets the current time and date in the UTC time zone. You can translate the
timestamp to any other time zone.
EOF
end
def call
Time.utc
end
end
This code uses an `Enumerable#find_first` method I monkeypatched into my app. If you'd like to do the same, the code is here.
module Enumerable
def find_first
each do |item|
if result = yield item
return result
end
end
end
end
What makes it work there is the tools
parameter. We pass in a GetCurrentTime
tool, which tells Claude that I know how to get the time, so if it needs that information, it asks my code for it. That code executes inside my program and the current time is then passed back to Claude. This way, it offers the appearance that the model is calling your code. But it isn’t actually doing that, it’s just generating text. The client is calling it because Claude told it which tool it needs to run to get that information.