Trying to use the "Cadmium" shard (NLP lib written in Crystal)

Hi,
I am attempting to use the Cadmium Shard : GitHub - cadmiumcr/cadmium: Natural Language Processing (NLP) library for Crystal

I have created a shard.yml file

name: tst
version: 0.1.0

authors:
  - serge <serge.hulne@gmail.com>

targets:
  tst:
    main: src/tst.cr

crystal: 1.2.2

license: MIT

dependencies:
  cadmium:
    github: cadmiumcr/cadmium
    branch: master

and here it the snippet of code (from the Cadmium doc) I’m trying to compile:

require "cadmium"

tokenizer = Cadmium.word_punctuation_tokenizer.new
tokenizer.tokenize("my dog hasn't any fleas.")

Yet, I get the error message:

In src/tst.cr:1:1

 1 | require "cadmium"
     ^
Error: can't find file 'cadmium'

NB: When running shards install, Cadmium seemed to install…

1 Like

Did you run shards install?

Yes, I did and Cadmium seemed to install (I have successfully installed other shards).

It looks like cadmium shard itself is just a proxy to other implementation cadmium shards. I.e. there is no cadmium.cr file for it to require. In your case you probably want to require "cadmium_tokenizer" as thats the sub shard that provides the tokenizer.

As pointed out in the readme, they also suggest only installing the shard you are going to use, so it might make more sense to directly install cadmium_tokenizer along with the others you want/need.

EDIT: Maybe @watzon should add a src/cadmium.cr file that requires all the other shards such that installing all of the sub shards is easier in the future.

1 Like

Thank you I’ll try that!

In fact, Cadmium installs “cadmium_tokenizer”:

Right, as I mentioned cadmium is an easy way to install all of the cadmium sub shards. But in practice you probably won’t need all of them, so it makes more sense to only install the sub shards you are going to use.

I tried

name: tst
version: 0.1.0

authors:
  - serge <serge.hulne@gmail.com>

targets:
  tst:
    main: src/tst.cr

crystal: 1.2.2

license: MIT

dependencies:
  cadmium:
    github: cadmiumcr/cadmium/cadmium_tokenizer
    branch: master

But that does not seem to work (wrong format in the URL in the yml file, I guess).

Yea it should be:

dependencies:
  cadmium_tokenizer:
    github: cadmiumcr/tokenizer
    branch: master

As all the sub shards are standalone shards in the same org.

1 Like

should I also modify the file which is required?

require "cadmium"

tokenizer = Cadmium.word_punctuation_tokenizer.new
tokenizer.tokenize("my dog hasn't any fleas.")

Yes that needs to match the name of the sub shard, i.e. cadmium_tokenizer.

Thanks !
I tried

require "cadmium_tokenizer"

tokenizer = Cadmium.word_punctuation_tokenizer.new
tokenizer.tokenize("my dog hasn't any fleas.")

But it doesn’t work.

In src/tst.cr:3:21

 3 | tokenizer = Cadmium.word_punctuation_tokenizer.new
                         ^-------------------------
Error: undefined method 'word_punctuation_tokenizer' for Cadmium:Module

Looks like the docs aren’t up to date, try:

require "cadmium_tokenizer"

tokenizer = Cadmium::Tokenizer::WordPunctuation.new
tokenizer.tokenize("my dog hasn't any fleas.")
1 Like

Thank you very much for your help!
It is sincerely appreciated.

I have mentioned it to the author of the library

Just so anyone coming here in the future knows, the Tokenizer issues are fixed. I’ll be working on improvements to Cadmium in the coming weeks.

3 Likes