I have a toy program for generating first N 100-smooth numbers (i.e. each has all its prime factors less than 100). The Crystal version runs in about 1.2s while the Rust version runs in less than .5s. So rust version is more than 2x faster. My other programs that use arrays(sieve, totients, etc) have around the same perf as their Rust versions but this one isn’t.

Crystal

```
def get_100_smooths(limit : UInt32) : Array(UInt32)
primes = Array(UInt32){2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97}
sp_idxs = Array.new(primes.size, 0)
cands = primes.dup
smooths = Array.new(limit.to_i32 + 1, 1u32)
1.upto(to: smooths.size - 1) do |si|
smooths[si] = cands.min
0.upto(to: cands.size - 1).select { |ci| cands[ci] == smooths[si]}.each do |ci|
sp_idxs[ci] += 1
cands[ci] = smooths[sp_idxs[ci]] * primes[ci]
end
end
smooths
end
elapsed = Time.measure do
puts get_100_smooths(5_000_000u32).last
end
puts elapsed
```

Rust

```
fn get_100_smooths(lim: u32) -> Vec<u32> {
const PRIMES: [u32; 25] = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97];
let mut sp_idxs = [0usize; PRIMES.len()];
let mut cands = PRIMES;
let mut smooths = vec![1u32; (lim + 1) as usize];
(1..smooths.len()).for_each(|si| {
smooths[si] = *cands.iter().min().unwrap();
(0..cands.len()).for_each(|ci| {
if cands[ci] == smooths[si] {
sp_idxs[ci] += 1;
cands[ci] = smooths[sp_idxs[ci]] * PRIMES[ci];
}
});
});
smooths
}
fn main() {
let timer = std::time::Instant::now();
let smooths = get_100_smooths(5_000_000);
println!("{:?}", smooths.last());
println!("{:?}", timer.elapsed());
}
```

These were ran on a aarch64-linux-android through termux.

Maybe unrelated:

When I compile a rust version (this or the toy programs) compilation takes less than 2s. But with Crystal it takes about 15s. The part “Codegen (bc + obj)” takes about 13s. Is that normal? These are small Crystal programs and 15s doesn’t seem appropriate.