LLM-friendliness as a metric: porting 20 languages with an LLM

added article LLM-friendliness as a metric: porting 20 languages with an LLM , spoiler crystal top1.

7 Likes

@kostya so much valuable information on your article. What a pleasant read! IMHO it deserves a post on its own here on the forum, to not be lost as an additional answer of an existing post.

Thanks, I tried to post on reddit, but they always ban me. People don’t like ai related content.

1 Like

In this case, reddit is the one losing. Here, in the star constellation of the Cristal community, your excellent work has made this constellation shine brighter.

I apologize because English is not my first language, and perhaps my expression here is not very good.

I appreciate that Crystal comes out on top, but I have a couple of concerns with this analysis.

  1. It’s very astonishing that Golang is ranked among the most expressive languages.
    It’s infamous for its verbosity. One of the most obvious features is the explicit error handling. But the comparison implementation doesn’t use much error handling at all.
    I see that as an indicator that this is not representative of general Golang code bases. And the same probably applies to all other language ports: They are only representative of code bases that primarily implement algorithms.
  2. The introduction states that the original implementation was in Crystal (I presume written by hand?) and an LLM ported it to other language. So the result is not necessarily idiomatic or ideal code. It’s a common observation that LLMs produce bloated code of subpar quality. So unless actual developers familiar with the respective languages have reviewed the ported code bases, I don’t think it’s fair to consider them representative of the language. They’re representative of whatever the LLM generates for that language (possibly also influenced by the original Crystal implementation)
  3. The size divided by gzipped size can serve as a heuristic for repetitive texts. But I have doubts how accurate it is as a metric for boilerplate in source code. Compression works only lexically. But source code has a lot of variability there. And there is no account of semantic boilerplate.
  4. A big factor of the comparison is not just the expressiveness of a language, but also availability of libraries. A language with a big standard library or easily available optional dependencies, typically requires less custom code for implementing algorithms and such. Language ecosystems also have different cultures on build vs. buy (or implement vs. import) decisions.
1 Like
  1. Regarding Go, I found it surprising too, but the numbers show exactly that. Yes, the code doesn’t have many if err != nil — simply because the code is more about algorithms rather than real business logic where many subsystems return errors. Here it’s mostly math — get data, compute, get result. There aren’t many if err != nil cases. But this applies to all other languages as well.
  2. This is addressed in the AI Critic section: ‘Topic 2: Lost in Translation?’
  3. Yes, the metric is completely subjective — but I found that it aligns very well with relative comparisons. For example, Java and TypeScript are around zero — average languages, and the progression Java → Kotlin → Scala shows a clear trend.
  4. Perhaps it’s not even the standard libraries, but rather the level of abstraction — the higher it is, the easier it is for an LLM to work with.

Yes, but in different ways because different languages have vastly different approaches to error handling and propagation.
And this applies to basically every other language feature that’s underrepresented in primarily math-focused algorithms.
These different code bases only represent a subset of all language features.

Where do I find that?

LangArena - Programming Languages Benchmark Comparison , AI Critic tab.

So surprised on this too. in my view, Go feels like a rather redundant language.

This is pretty cool. However, I’m surprised you didn’t do Ruby, especially considering that it’s Crystal’s spiritual ancestor.

Oh I just saw in the LangArena repo it says

Languages like Python, Ruby, or PHP are intentionally excluded to maintain a focused comparison within a similar performance bracket.

but then, Python was in the repo anyway. :person_shrugging:

Python was added because it has a fast runtime — PyPy — which is ~6 times slower than C. If there’s something similar for Ruby, I can add it.

Ah ok, that makes sense… I didn’t notice it was PyPy specifically.

I guess the closest thing on the Ruby side is CRuby with yjit enabled or JRuby/Truffleruby.