Num.cr v1.0.0 released

Num.cr v1.0.0 released

Num.cr v1.0.0 has been released, completing a massive overhaul of the entire interface of the library, to enable device agnostic numerical computing to be used in Crystal, leveraging both CPU and GPU devices.

Some major highlights:

  • ClTensor(T) and Tensor(T) have been merged to Tensor(T, S), with OpenCL backed storage becoming a first class citizen. All creation methods support both storage backends, and the implementation paves the way for zero-copy interop with numerous other libraries (Apache Arrow is the next prime target).
  • Num::NN and Num::Grad feature full GPU support, with almost all layers and gates supporting OpenCL backed Tensors
  • Num::Einsum allows for optimized contractions of Tensors, providing functionality identical to Numpy's einsum.

Some less flashy highlights:

  • Vastly improved test coverage and stability, as well as a revamped API documentation, which can be found here.
  • OpenCL memory management has been implemented, with JIT compiled kernels backed by memory-safe caching.
  • Numpy inter-op is supported via reading and writing to .npy files

As always I am constantly looking for additional contributors to improve documentation, examples, performance, and continue to expand the API. I am especially interested in anyone with CUDA experience (and a CUDA enabled graphics card, which is currently blocking my ability to write the storage backend).

If you have a chance to experiment with the library, bug tickets + feedback in the Gitter channel are always appreciated.

30 Likes

@christopherzimmerman working on porting https://github.com/patlevin/face-detection-tflite/blob/main/fdlite/face_detection.py to crystal.

Just wondering if I could leverage your library?
There are operations like _get_sigmoid_scores(raw_scores) in python.
In crystal I have raw_scores as a Slice(Float32) and this is my sigmoid_scores function

def sigmoid_scores(data : Slice(Float32)) : Slice(Float32)
    data.map! { |x| 1.0_f32 / (1.0_f32 + Math.exp(-x)) }
end

How would I implement something like that using num.cr?
Would a case like that benefit from the library?
The slices representing Tensors already so I assume it’s a good fit

Also I do a lot of Tensor normalisation so I can use different NN models with the same code. Is there a way to use num.cr to accelerate something like

# Tensor(UInt8) => Tensor(Float32)
output_layer.as_u8.map { |result| (result.to_f32 / 255.0_f32) }

Thanks in advance!

3 Likes