There aren’t really any guides on what “idiomatic” means in Crystal, so I’m just going off of what I do and what I’ve seen in the community (which I guess is what “idiomatic” really ought to mean anyway). In order to provide runnable code (for me and for you and for any future readers), I’m gonna use an IO::Memory
to simulate your file, and I’m gonna do a few different passes/versions of the code to show some idiomatic stuff and some really useful methods.
First, let’s keep the general structure of your code but change some lines to make them more idiomatic (and fix a bug!):
dico = Hash(String, Int32).new
sample_text = <<-TEXT
something something some thing
some thing something arduous
something some some thing
TEXT
# we're pretending this IO is a File
io = IO::Memory.new sample_text
io.each_line do |line|
a = line.split
a.each do |word|
word = word.strip(";,.'?!\n").downcase # we can do both of these operations in one line
if dico[word]? # don't need the != nil; nil is falsey
dico[word] += 1
else
dico[word] = 1 # gotta change this from 0 to 1
end
end
end
a = dico.to_a.sort_by { |key_val| key_val[1] }.reverse # we can do all of this in one line without mutating methods
a.each do |item| # we're deleting the [0,10] because it's not relevant to our example
puts item
end
The if dico[word]?
line is a pretty common pattern in Crystal, since nil
is already falsey. However, the next iteration shows a more convenient way to do what you’re doing with a Hash
.
Putting stuff in one line is totally a preference thing. In fact, moving the key-sorting to one line and using non-mutating methods will technically cause unnecessary memory allocation. In most cases, I think that’s not an optimization worth pursuing, but it’s up to you.
You can actually simplify the hash logic even more:
dico = Hash(String, Int32).new default_value: 0
sample_text = <<-TEXT
something something some thing
some thing something arduous
something some some thing
TEXT
io = IO::Memory.new sample_text
io.each_line do |line|
a = line.split
a.each do |word|
word = word.strip(";,.'?!\n").downcase
dico[word] += 1
end
end
a = dico.to_a.sort_by { |key_val| key_val[1] }.reverse
a.each do |item|
puts item
end
By giving the hash a default value, you make it so that when dico[word] += 1
expands to dico[word] = dico[word] + 1
, the dico[word]
on the right side is just equal to zero. You could also simplify that logic by leaving the hash initialization alone and using
dico[word] ||= 0 # sets dico[word] to 0 if it's falsey (i.e. nil)
dico[word] += 1
However, I think the default value is cleaner and simpler.
Finally, it turns out that the thing you’re trying to do is already a standard library method:
sample_text = <<-TEXT
something something some thing
some thing something arduous
something some some thing
TEXT
io = IO::Memory.new sample_text
dico = io.each_line.flat_map { |line| line.split }.tally_by { |word| word.strip(";,.'?!\n").downcase }
a = dico.to_a.sort_by { |key_val| key_val[1] }.reverse
a.each do |item|
puts item
end
IO#each_line
gets us an Iterator
of each line, then we #flat_map
to iterate over each word, and then we #tally
up the counts of the words after they’ve been stripped and downcased. I’m not totally sure, but I don’t think that should result in much more memory allocation than your code, and it relies on the standard library implementation, which reduces your code’s complexity (and removes the possibility of sneaky off-by-one errors!).
Does that all make sense?