Hey Crystal community,
I want to share a project I have been working on called speak. It is a local AI assistant that runs entirely on your computer. No cloud. No subscription. Your data never leaves your machine.
Why I built it
I was tired of paying for ChatGPT and Claude. I was also tired of AI services that forget who I am every time I close the tab. I wanted something that runs on my laptop (not a $10,000 server). Something that remembers things about me across sessions. Something that can read my files. Something that can search the web when I need current info. Something that costs nothing after setup. Something that keeps my conversations private.
So I built speak in Crystal.
What it does
When you open speak, you get a chat interface. You can type normally. Tell it your name once, and it remembers forever. Ask it to read your config.json, and it shows you the content. Ask it to search for something, and it fetches results from the web. Have a long conversation, and the memory stays flat because the KV cache lives on disk, not in RAM.
Here is an example session.
You type: Hello, who are you?
speak: I am speak, a local AI assistant running on your computer. I can read files, search the web, and remember things about you.
You type: My name is Sarah and I love Python.
speak: I have remembered that your name is Sarah and you love Python.
You type: Read my config.json
speak: (shows the content of config.json)
You type: Search for Python 3.13 features
speak: (shows search results from DuckDuckGo)
You type: What do you know about me?
speak: Your name is Sarah and you love Python.
The agent loop
speak now has an agent loop. It can call tools multiple times to complete a task. For example, if you ask it to read a file and then search based on what it finds, it will do both steps automatically. It plans, executes, observes the result, and decides what to do next. Then it calls the finish tool when it has the final answer.
Technical details
speak uses the Nanbeige 4.1 3B model. It runs on CPU only. No GPU required. The entire model at Q4_K_M quantization is 2.5GB. With memory mapping, the model stays mostly on disk. RAM usage is around 500MB to 1GB.
The KV cache is saved to SSD, not RAM. This means you can have a conversation with 10,000 turns and memory usage stays flat. No slow down. No crash.
speak automatically detects your hardware. It reads total RAM, available RAM, CPU cores, and AVX2 support. It then configures itself to run optimally on your machine. If you have 4GB RAM, it uses the smaller Q2_K model and a smaller context window. If you have 16GB RAM, it uses the larger Q6_K model and a larger context window. You can also override everything by editing config.json.
The tool system includes file reading, web search, and memory. The agent loop handles multi-step tasks. The disk cache persists across sessions, so you can close speak, open it again, and continue the same conversation without reprocessing anything.
Why Crystal
I chose Crystal because it is fast, memory efficient, and compiles to a single binary. The syntax is clean and readable. The performance is close to C but without the pain. There is already a great binding called llama.cr that wraps llama.cpp. This gave me a solid foundation to build on.
I also wanted to prove that Crystal is a good language for AI inference. Most people use Python for this. Python is slow and memory heavy. Crystal is lean. speak uses less than 2GB of RAM total. A Python equivalent would use 6GB or more.
The codebase is 1,689 lines of Crystal code. No bloat. No unnecessary abstractions. Just what is needed.
How to try it
Clone the repo. Run shards install. Build with crystal build src/speak.cr --release -o speak. Then run ./speak. The first run will detect your hardware, create a config file, and download the model (2.5GB). After that, it just works.
The installer supports resumable downloads. If your connection drops, just run ./speak again and it picks up where it left off.
You can find the code at github.com/zendrx/speak
What is next
I want to add more tools to the agent loop. File writing, terminal commands, and calendar integration are on the list. I also want to improve the web search with better result parsing. And I want to add a proper TUI with a status bar and command history.
But even as it is, speak is already useful. I use it every day.
Conclusion
speak is not trying to beat ChatGPT. It is trying to be different. Private. Local. Low RAM. Persistent memory. No subscription. It is for people who want an AI that remembers them and respects their privacy, all on hardware they already own.
Try it. Break it. Tell me what you think.
GitHub: github.com/zendrx/speak