Charconv — pure Crystal iconv replacement

This is charconv a pure Crystal implementation of libiconv. This is intended to be a drop in replacement with no C dependencies and increased performance.

Crystal’s String#encode and IO encoding support rely on the system’s libiconv. But is has some downsides. Most of all it is not written in Crystal but also.

  • FFI overhead
  • Platform-dependent behavior (macOS iconv vs GNU libiconv vs musl)
  • Performance
  • Static linking

Performance

Conversion charconv system iconv Speedup
ASCII → ASCII 73 µs 11.9 ms 162x
ISO-8859-1 → UTF-8 2.1 ms 14.2 ms 6.9x
CP1252 → UTF-8 2.5 ms 17.2 ms 6.9x
UTF-8 → ISO-8859-1 3.4 ms 14.6 ms 4.3x
UTF-16BE → UTF-8 3.7 ms 10.8 ms 2.9x
UTF-8 → UTF-16LE 4.6 ms 10.1 ms 2.2x

I took some advice from some casey muratori videos on iconv and writing a terminal. It worked out pretty well.

Encoding Support

most of them I put them all in the README.md

One-shot conversion

result = CharConv.convert(“Hello wörld”, “UTF-8”, “ISO-8859-1”)

IO streaming

CharConv.convert(input_io, output_io, “Shift_JIS”, “UTF-8”)

Drop-in stdlib replacement

require “charconv/stdlib”

str = “こんにちは”.encode(“EUC-JP”)

casey muratori influence:

  • compile jump tables no virtual dispatch
  • Table-driven codecs
  • Stateful codecs

Anyone find this interesting

Would anyone use this for static binaries?
Is there an interest in remove c lib dependencies from the stdlib?
Anything else I missed?

10 Likes

Doing crystal on Windows is certainly easier with fewer dependencies, so as a general rule I’m all for replacing external C dependencies with crystal libs. Not that iconv has given me any trouble so far…

Om the other hand, the number of maintainers that are able to fix bugs in such a library will certainly go down (compared to libiconv). But that’s always the risk for programming languages with a smaller base.

I will try this out later today, if it works out I would be very happy to use it in more projects.

1 Like

Hi, can you provide a link? (if there is one in your post I apologize for my blindness)

It’s GitHub - jkthorne/charconv: libiconv rewrite in crystal · GitHub

Please double check the installation instructions, currently the instructions point to the wrong github account. I’ve sent a pull request to fix that.

1 Like

The numbers are impressive. However, other members showed that they are not showing the same results in other tests.

So I checked it out and ran the benchmarks on something that is not a mac.

First impression: Holy moly, that is a heavy compile time.

Ubuntu 25.10, AMD Ryzen 9 9950X3D:

===========================================================
Throughput Benchmarks (1 MB input)
============================================================

--- ASCII → ASCII ---
    charconv  22.06k ( 45.32µs) (± 1.89%)  0.0B/op        fastest
system iconv   1.26k (793.22µs) (± 0.22%)  0.0B/op  17.50× slower

--- UTF-8 → ISO-8859-1 (mixed Latin ~80% ASCII) ---
    charconv 431.33  (  2.32ms) (± 0.37%)  0.0B/op   1.54× slower
system iconv 665.87  (  1.50ms) (± 0.95%)  0.0B/op        fastest

--- ISO-8859-1 → UTF-8 ---
    charconv 658.84  (  1.52ms) (± 0.07%)  0.0B/op   1.54× slower
system iconv   1.01k (987.73µs) (± 2.12%)  0.0B/op        fastest

--- UTF-8 → UTF-8 (mixed widths) ---
    charconv 292.34  (  3.42ms) (± 0.33%)  0.0B/op        fastest
system iconv 254.87  (  3.92ms) (± 0.11%)  0.0B/op   1.15× slower

--- CP1252 → UTF-8 ---
    charconv 626.82  (  1.60ms) (± 0.29%)  0.0B/op   1.29× slower
system iconv 807.84  (  1.24ms) (± 0.72%)  0.0B/op        fastest

--- UTF-8 → CP1252 (mixed Latin ~80% ASCII) ---
    charconv 425.75  (  2.35ms) (± 0.10%)  0.0B/op        fastest
system iconv 376.33  (  2.66ms) (± 0.31%)  0.0B/op   1.13× slower

--- UTF-16BE → UTF-8 (mixed widths) ---
    charconv 327.03  (  3.06ms) (± 0.86%)  0.0B/op   1.08× slower
system iconv 353.37  (  2.83ms) (± 0.40%)  0.0B/op        fastest

--- UTF-8 → UTF-16LE ---
    charconv 318.78  (  3.14ms) (± 0.35%)  0.0B/op   1.24× slower
system iconv 395.69  (  2.53ms) (± 0.30%)  0.0B/op        fastest

============================================================

And those results are .. not in line with yours. Or well, the crystal ones are mostly in line. Is the mac running the thing through Rosetta, or what is going on here?

1 Like

that is a great question. I will have to run this on a server and figure out why the performance is so different. I am surprised the majority flipped to slower. Definitely worth looking into.

There are a lot of massive array literals of tuples, Slice.literal will work better (but it doesn’t support tuples as element types yet, so some kind of AoS-to-SoA rewriting is necessary)

honestly I had not been paying attention to the compile time. I will have to look into this.

Thanks for the advice. This is a great place to start.
Honestly I dont know much about optimizing for compile time.

ASCII → ASCII 73 µs 11.9 ms 162x
ISO-8859-1 → UTF-8 2.1 ms 14.2 ms 6.9x
CP1252 → UTF-8 2.5 ms 17.2 ms 6.9x
UTF-8 → ISO-8859-1 3.4 ms 14.6 ms 4.3x
UTF-16BE → UTF-8 3.7 ms 10.8 ms 2.9x
UTF-8 → UTF-16LE 4.6 ms 10.1 ms 2.2x

Could you please give me a code to run this benchmark on my (Latest) Arch Linux laptop?
If this this real, we really should port this shards into Crystal stand-lib.

There are writing systems where a single complex character carries the meaning of a whole word. One of them is Kanji(漢字). The CJK world has many variations — in Japanese, Kanji is mixed with Hiragana(ひらがな), Katakana(カタカナ), and even emoji​:black_nib:. Encoding in these cultures is complicated. Vendors have historically shipped subtly different implementations under the same name, and libiconv has accumulated edge-case handling. I’m somewhat skeptical that charconv handles all of this correctly.
I hope it does, though…

1 Like

I have been thinking about and experimenting with compile times. This is still an experiment for me.

you can run benchmarks with crystal spec spec/bench_spec.cr --release.

============================================================
Throughput Benchmarks (1 MB input)
============================================================

--- ASCII → ASCII ---
    charconv  14.74k ( 67.86µs) (± 3.01%)  0.0B/op         fastest
system iconv  91.34  ( 10.95ms) (± 1.24%)  0.0B/op  161.34× slower

--- UTF-8 → ISO-8859-1 (mixed Latin ~80% ASCII) ---
    charconv 463.12  (  2.16ms) (± 3.44%)  0.0B/op        fastest
system iconv  74.70  ( 13.39ms) (± 1.38%)  0.0B/op   6.20× slower

--- ISO-8859-1 → UTF-8 ---
    charconv   2.38k (419.90µs) (± 1.26%)  0.0B/op        fastest
system iconv  76.02  ( 13.15ms) (± 2.13%)  0.0B/op  31.33× slower

--- UTF-8 → UTF-8 (mixed widths) ---
    charconv 301.04  (  3.32ms) (± 1.46%)  0.0B/op        fastest
system iconv  92.68  ( 10.79ms) (± 1.36%)  0.0B/op   3.25× slower

--- CP1252 → UTF-8 ---
    charconv   1.55k (646.87µs) (± 1.58%)  0.0B/op        fastest
system iconv  63.44  ( 15.76ms) (± 1.57%)  0.0B/op  24.37× slower

--- UTF-8 → CP1252 (mixed Latin ~80% ASCII) ---
    charconv 471.56  (  2.12ms) (± 1.64%)  0.0B/op        fastest
system iconv  72.78  ( 13.74ms) (± 1.43%)  0.0B/op   6.48× slower

--- UTF-16BE → UTF-8 (mixed widths) ---
    charconv 292.18  (  3.42ms) (± 6.26%)  0.0B/op        fastest
system iconv 103.26  (  9.68ms) (± 2.79%)  0.0B/op   2.83× slower

--- UTF-8 → UTF-16LE ---
    charconv 226.22  (  4.42ms) (± 5.60%)  0.0B/op        fastest
system iconv 104.47  (  9.57ms) (± 4.25%)  0.0B/op   2.17× slower

--- UTF-8 → EUC-JP (70% CJK) ---
    charconv 167.29M (  5.98ns) (± 1.30%)  0.0B/op         fastest
system iconv 854.96k (  1.17µs) (± 1.45%)  0.0B/op  195.68× slower

--- UTF-8 → GBK (70% CJK) ---
    charconv 457.14  (  2.19ms) (± 1.29%)  0.0B/op        fastest
system iconv  28.70  ( 34.84ms) (± 1.65%)  0.0B/op  15.93× slower

--- UTF-8 → EUC-KR (70% CJK) ---
    charconv 159.20M (  6.28ns) (± 2.31%)  0.0B/op        fastest
system iconv   1.76M (568.22ns) (± 2.68%)  0.0B/op  90.46× slower

--- UTF-8 → EUC-CN (70% CJK) ---
    charconv 160.09M (  6.25ns) (± 1.39%)  0.0B/op        fastest
system iconv   1.78M (561.09ns) (± 2.14%)  0.0B/op  89.83× slower

============================================================
.

Finished in 2:49 minutes
1 examples, 0 failures, 0 errors, 0 pending

I have been working on the compile times with bin files, but I have not settled on a permanent solution yet.