Charconv — pure Crystal iconv replacement

wontruefree · March 11, 2026, 6:54am

This is charconv a pure Crystal implementation of libiconv. This is intended to be a drop in replacement with no C dependencies and increased performance.

Crystal’s String#encode and IO encoding support rely on the system’s libiconv. But is has some downsides. Most of all it is not written in Crystal but also.

FFI overhead
Platform-dependent behavior (macOS iconv vs GNU libiconv vs musl)
Performance
Static linking

Performance

Conversion	charconv	system iconv	Speedup
ASCII → ASCII	73 µs	11.9 ms	162x
ISO-8859-1 → UTF-8	2.1 ms	14.2 ms	6.9x
CP1252 → UTF-8	2.5 ms	17.2 ms	6.9x
UTF-8 → ISO-8859-1	3.4 ms	14.6 ms	4.3x
UTF-16BE → UTF-8	3.7 ms	10.8 ms	2.9x
UTF-8 → UTF-16LE	4.6 ms	10.1 ms	2.2x

I took some advice from some casey muratori videos on iconv and writing a terminal. It worked out pretty well.

Encoding Support

most of them I put them all in the README.md

One-shot conversion

result = CharConv.convert(“Hello wörld”, “UTF-8”, “ISO-8859-1”)

IO streaming

CharConv.convert(input_io, output_io, “Shift_JIS”, “UTF-8”)

Drop-in stdlib replacement

require “charconv/stdlib”

str = “こんにちは”.encode(“EUC-JP”)

casey muratori influence:

compile jump tables no virtual dispatch
Table-driven codecs
Stateful codecs

Anyone find this interesting

Would anyone use this for static binaries?
Is there an interest in remove c lib dependencies from the stdlib?
Anything else I missed?

lasso · March 11, 2026, 7:36am

Doing crystal on Windows is certainly easier with fewer dependencies, so as a general rule I’m all for replacing external C dependencies with crystal libs. Not that iconv has given me any trouble so far…

Om the other hand, the number of maintainers that are able to fix bugs in such a library will certainly go down (compared to libiconv). But that’s always the risk for programming languages with a smaller base.

I will try this out later today, if it works out I would be very happy to use it in more projects.

Carlos · March 11, 2026, 2:44pm

Hi, can you provide a link? (if there is one in your post I apologize for my blindness)

lasso · March 11, 2026, 3:19pm

It’s GitHub - jkthorne/charconv: libiconv rewrite in crystal · GitHub

Please double check the installation instructions, currently the instructions point to the wrong github account. I’ve sent a pull request to fix that.

paulocoghi · March 11, 2026, 4:29pm

The numbers are impressive. However, other members showed that they are not showing the same results in other tests.

yxhuvud · March 11, 2026, 7:24pm

So I checked it out and ran the benchmarks on something that is not a mac.

First impression: Holy moly, that is a heavy compile time.

Ubuntu 25.10, AMD Ryzen 9 9950X3D:

===========================================================
Throughput Benchmarks (1 MB input)
============================================================

--- ASCII → ASCII ---
    charconv  22.06k ( 45.32µs) (± 1.89%)  0.0B/op        fastest
system iconv   1.26k (793.22µs) (± 0.22%)  0.0B/op  17.50× slower

--- UTF-8 → ISO-8859-1 (mixed Latin ~80% ASCII) ---
    charconv 431.33  (  2.32ms) (± 0.37%)  0.0B/op   1.54× slower
system iconv 665.87  (  1.50ms) (± 0.95%)  0.0B/op        fastest

--- ISO-8859-1 → UTF-8 ---
    charconv 658.84  (  1.52ms) (± 0.07%)  0.0B/op   1.54× slower
system iconv   1.01k (987.73µs) (± 2.12%)  0.0B/op        fastest

--- UTF-8 → UTF-8 (mixed widths) ---
    charconv 292.34  (  3.42ms) (± 0.33%)  0.0B/op        fastest
system iconv 254.87  (  3.92ms) (± 0.11%)  0.0B/op   1.15× slower

--- CP1252 → UTF-8 ---
    charconv 626.82  (  1.60ms) (± 0.29%)  0.0B/op   1.29× slower
system iconv 807.84  (  1.24ms) (± 0.72%)  0.0B/op        fastest

--- UTF-8 → CP1252 (mixed Latin ~80% ASCII) ---
    charconv 425.75  (  2.35ms) (± 0.10%)  0.0B/op        fastest
system iconv 376.33  (  2.66ms) (± 0.31%)  0.0B/op   1.13× slower

--- UTF-16BE → UTF-8 (mixed widths) ---
    charconv 327.03  (  3.06ms) (± 0.86%)  0.0B/op   1.08× slower
system iconv 353.37  (  2.83ms) (± 0.40%)  0.0B/op        fastest

--- UTF-8 → UTF-16LE ---
    charconv 318.78  (  3.14ms) (± 0.35%)  0.0B/op   1.24× slower
system iconv 395.69  (  2.53ms) (± 0.30%)  0.0B/op        fastest

============================================================

And those results are .. not in line with yours. Or well, the crystal ones are mostly in line. Is the mac running the thing through Rosetta, or what is going on here?

wontruefree · March 11, 2026, 7:29pm

that is a great question. I will have to run this on a server and figure out why the performance is so different. I am surprised the majority flipped to slower. Definitely worth looking into.

HertzDevil · March 11, 2026, 7:38pm

There are a lot of massive array literals of tuples, Slice.literal will work better (but it doesn’t support tuples as element types yet, so some kind of AoS-to-SoA rewriting is necessary)

wontruefree · March 11, 2026, 7:52pm

honestly I had not been paying attention to the compile time. I will have to look into this.

Thanks for the advice. This is a great place to start.
Honestly I dont know much about optimizing for compile time.

zw963 · March 14, 2026, 10:57am

ASCII → ASCII 73 µs 11.9 ms 162x
ISO-8859-1 → UTF-8 2.1 ms 14.2 ms 6.9x
CP1252 → UTF-8 2.5 ms 17.2 ms 6.9x
UTF-8 → ISO-8859-1 3.4 ms 14.6 ms 4.3x
UTF-16BE → UTF-8 3.7 ms 10.8 ms 2.9x
UTF-8 → UTF-16LE 4.6 ms 10.1 ms 2.2x

Could you please give me a code to run this benchmark on my (Latest) Arch Linux laptop?
If this this real, we really should port this shards into Crystal stand-lib.

kojix2 · March 14, 2026, 12:43pm

There are writing systems where a single complex character carries the meaning of a whole word. One of them is Kanji（漢字）. The CJK world has many variations — in Japanese, Kanji is mixed with Hiragana（ひらがな）, Katakana（カタカナ）, and even emoji. Encoding in these cultures is complicated. Vendors have historically shipped subtly different implementations under the same name, and libiconv has accumulated edge-case handling. I’m somewhat skeptical that charconv handles all of this correctly.
I hope it does, though…

wontruefree · March 15, 2026, 12:30am

I have been thinking about and experimenting with compile times. This is still an experiment for me.

you can run benchmarks with crystal spec spec/bench_spec.cr --release.

============================================================
Throughput Benchmarks (1 MB input)
============================================================

--- ASCII → ASCII ---
    charconv  14.74k ( 67.86µs) (± 3.01%)  0.0B/op         fastest
system iconv  91.34  ( 10.95ms) (± 1.24%)  0.0B/op  161.34× slower

--- UTF-8 → ISO-8859-1 (mixed Latin ~80% ASCII) ---
    charconv 463.12  (  2.16ms) (± 3.44%)  0.0B/op        fastest
system iconv  74.70  ( 13.39ms) (± 1.38%)  0.0B/op   6.20× slower

--- ISO-8859-1 → UTF-8 ---
    charconv   2.38k (419.90µs) (± 1.26%)  0.0B/op        fastest
system iconv  76.02  ( 13.15ms) (± 2.13%)  0.0B/op  31.33× slower

--- UTF-8 → UTF-8 (mixed widths) ---
    charconv 301.04  (  3.32ms) (± 1.46%)  0.0B/op        fastest
system iconv  92.68  ( 10.79ms) (± 1.36%)  0.0B/op   3.25× slower

--- CP1252 → UTF-8 ---
    charconv   1.55k (646.87µs) (± 1.58%)  0.0B/op        fastest
system iconv  63.44  ( 15.76ms) (± 1.57%)  0.0B/op  24.37× slower

--- UTF-8 → CP1252 (mixed Latin ~80% ASCII) ---
    charconv 471.56  (  2.12ms) (± 1.64%)  0.0B/op        fastest
system iconv  72.78  ( 13.74ms) (± 1.43%)  0.0B/op   6.48× slower

--- UTF-16BE → UTF-8 (mixed widths) ---
    charconv 292.18  (  3.42ms) (± 6.26%)  0.0B/op        fastest
system iconv 103.26  (  9.68ms) (± 2.79%)  0.0B/op   2.83× slower

--- UTF-8 → UTF-16LE ---
    charconv 226.22  (  4.42ms) (± 5.60%)  0.0B/op        fastest
system iconv 104.47  (  9.57ms) (± 4.25%)  0.0B/op   2.17× slower

--- UTF-8 → EUC-JP (70% CJK) ---
    charconv 167.29M (  5.98ns) (± 1.30%)  0.0B/op         fastest
system iconv 854.96k (  1.17µs) (± 1.45%)  0.0B/op  195.68× slower

--- UTF-8 → GBK (70% CJK) ---
    charconv 457.14  (  2.19ms) (± 1.29%)  0.0B/op        fastest
system iconv  28.70  ( 34.84ms) (± 1.65%)  0.0B/op  15.93× slower

--- UTF-8 → EUC-KR (70% CJK) ---
    charconv 159.20M (  6.28ns) (± 2.31%)  0.0B/op        fastest
system iconv   1.76M (568.22ns) (± 2.68%)  0.0B/op  90.46× slower

--- UTF-8 → EUC-CN (70% CJK) ---
    charconv 160.09M (  6.25ns) (± 1.39%)  0.0B/op        fastest
system iconv   1.78M (561.09ns) (± 2.14%)  0.0B/op  89.83× slower

============================================================
.

Finished in 2:49 minutes
1 examples, 0 failures, 0 errors, 0 pending

I have been working on the compile times with bin files, but I have not settled on a permanent solution yet.

Topic		Replies	Views
Experimental Crystal stdlib replacements	2	169	April 19, 2026
String#ascii_only? Crystal Contrib	17	648	November 23, 2024
Propose: Incremental compilation Crystal Contrib	33	1620	January 15, 2024
Improve Crystal's profile on Rosetta Code Community	100	6482	February 28, 2024
Build with --release performance is slow than the 2017 crystal version? Help & Support	16	846	June 13, 2022