Doing Raytracer in Crystal

Not sure it’s what I would call simplification. Do you think the number of lines is an issue? I think clarity is. Original version is easier to read for me.

Is it to show in how little lines of code Crystal can do it’s job to compare it to other languages?

1 Like

Also, you can replace the clamp proc with the existing Float#clamp method:

def self.toDrawingColor(c)
  r = (c.r.clamp(0.0,1.0)*255).floor
  g = (c.g.clamp(0.0,1.0)*255).floor
  b = (c.b.clamp(0.0,1.0)*255).floor
  {r, g, b}
end
1 Like

Oww…I like that! Didn’t know about Float#clamp, gets rid of the proc.

So here’s my revised version of @asterite’s original code, which was direct translation of the Ruby version. Note, I’ve changed some vector method names in Vector and Color, to make them more clear and consistent. I’ve also done some things to benefit from Crystal’s static typing.

The locs went down from 327 to 259.

Because writing code is just as much an art as a technical skill, the code reflects my preference for oneliners and clearer (more accurate) naming conventions.
However, after every change I do a diff between the *.png from @asterite’s reference version, and mine, and they’ve been identical.

Current code revision

require "stumpy_png"

struct Vector
  getter x, y, z

  def initialize(@x : Float64, @y : Float64, @z : Float64) end

  def self.minus(v1, v2); Vector.new(v1.x - v2.x, v1.y - v2.y, v1.z - v2.z) end

  def self.plus(v1, v2);  Vector.new(v1.x + v2.x, v1.y + v2.y, v1.z + v2.z) end
      
  def self.scale(k, v);   Vector.new(k * v.x, k * v.y, k * v.z) end

  def self.dot(v1, v2); v1.x * v2.x + v1.y * v2.y + v1.z * v2.z end
      
  def self.mag(v); Math.sqrt(Vector.dot(v, v)) end

  def self.norm(v)
    mag = Vector.mag(v)
    div = (mag == 0) ? Float64::INFINITY : 1.0 / mag
    Vector.scale(div, v)
  end

  def self.cross(v1, v2)
    Vector.new(v1.y * v2.z - v1.z * v2.y, v1.z * v2.x - v1.x * v2.z, v1.x * v2.y - v1.y * v2.x)
  end
end

struct Color
  getter r, g, b

  def initialize(@r : Float64, @g : Float64, @b : Float64) end

  def self.scale(k, v);  Color.new(k * v.r, k * v.g, k * v.b) end

  def self.plus(v1, v2); Color.new(v1.r + v2.r, v1.g + v2.g, v1.b + v2.b) end

  def self.mult(v1, v2); Color.new(v1.r * v2.r, v1.g * v2.g, v1.b * v2.b) end
      
  def self.toDrawingColor(c)
    r = (c.r.clamp(0.0, 1.0)*255).floor
    g = (c.g.clamp(0.0, 1.0)*255).floor
    b = (c.b.clamp(0.0, 1.0)*255).floor
    {r, g, b}
  end
end

Color_white        = Color.new(1.0, 1.0, 1.0)
Color_grey         = Color.new(0.5, 0.5, 0.5)
Color_black        = Color.new(0.0, 0.0, 0.0)
Color_background   = Color_black
Color_defaultColor = Color_black

class Camera
  getter pos, forward, right, up

  def initialize(pos : Vector, lookAt)
    down = Vector.new(0.0, -1.0, 0.0)
    @pos = pos
    @forward = Vector.norm(Vector.minus(lookAt, @pos))
    @right = Vector.scale(1.5, Vector.norm(Vector.cross(@forward, down)))
    @up = Vector.scale(1.5, Vector.norm(Vector.cross(@forward, @right)))
  end
end

record Ray, start : Vector, dir : Vector

record Intersection, thing : Thing, ray : Ray, dist : Float64

record Light, pos : Vector, color : Color

abstract class Thing end

class Sphere < Thing
  @radius2 : Float64

  def initialize(@center : Vector, radius : Float64, @_surface : Surface)
    @radius2 = radius*radius
  end

  def normal(pos); Vector.norm(Vector.minus(pos, @center)) end

  def surface; @_surface end

  def intersect(ray)
    eo = Vector.minus(@center, ray.start)
    v  = Vector.dot(eo, ray.dir)
    dist = 0.0
    if (v >= 0)
      disc = @radius2 - (Vector.dot(eo, eo) - v * v)
      dist = v - Math.sqrt(disc) if (disc >= 0)    
    end
    
    return nil if (dist == 0)

    Intersection.new(self, ray, dist)
  end
end

class Plane < Thing
  def initialize(@_norm : Vector, @offset : Float64, @_surface : Surface) end

  def normal(pos); @_norm end

  def surface;  @_surface end

  def intersect(ray)
    denom = Vector.dot(@_norm, ray.dir)
    return nil if denom > 0
    dist = (Vector.dot(@_norm, ray.start) + @offset) / (-denom)
    Intersection.new(self, ray, dist)
  end
end

abstract class Surface end

class ShinySurface < Surface
  def diffuse(pos); Color_white end

  def specular(pos); Color_grey end

  def reflect(pos); 0.7 end

  def roughness; 250 end
end

class CheckerboardSurface < Surface
  def diffuse(pos)  
    return Color_white if ((pos.z).floor + (pos.x).floor) % 2 != 0
    Color_black
  end

  def specular(pos); Color_white end

  def roughness; 250 end

  def reflect(pos) ((pos.z).floor + (pos.x).floor) % 2 != 0 ? 0.1 : 0.7 end
end

Surface_shiny        = ShinySurface.new
Surface_checkerboard = CheckerboardSurface.new

class RayTracer
  MaxDepth = 5

  def intersections(ray, scene)
    closest = Float64::INFINITY
    closestInter = nil
    scene.things.each do |item|
      inter = item.intersect(ray)
      if inter && inter.dist < closest
        closestInter = inter
        closest = inter.dist
      end
    end
    closestInter
  end

  def testRay(ray, scene)
    isect = self.intersections(ray, scene)
    isect ? isect.dist : nil
  end

  def traceRay(ray, scene, depth)
    isect = self.intersections(ray, scene)
    isect.nil? ? Color_background : self.shade(isect, scene, depth)
  end

  def shade(isect : Intersection, scene, depth)
    d = isect.ray.dir
    pos = Vector.plus(Vector.scale(isect.dist, d), isect.ray.start)
    normal = isect.thing.normal(pos)
    reflectDir = Vector.minus(d, Vector.scale(2, Vector.scale(Vector.dot(normal, d), normal)))
    naturalColor = Color.plus(Color_background, self.getNaturalColor(isect.thing, pos, normal, reflectDir, scene))

    reflectedColor = (depth >= MaxDepth) ? Color_grey :
                     self.getReflectionColor(isect.thing, pos, normal, reflectDir, scene, depth)

    Color.plus(naturalColor, reflectedColor)
  end

  def getReflectionColor(thing, pos, normal, rd, scene, depth)
    Color.scale(thing.surface.reflect(pos), self.traceRay(Ray.new(pos, rd), scene, depth + 1))
  end

  def getNaturalColor(thing, pos, norm, rd, scene)
    color = Color_defaultColor
    scene.lights.each { |light| color = self.addLight(color, light, pos, norm, scene, thing, rd) }
    color
  end

  def addLight(col, light, pos, norm, scene, thing, rd)
    ldis = Vector.minus(light.pos, pos)
    livec = Vector.norm(ldis)
    neatIsect = self.testRay(Ray.new(pos, livec), scene)

    isInShadow = false
    isInShadow = neatIsect <= Vector.mag(ldis) if neatIsect

    return col if isInShadow

    illum = Vector.dot(livec, norm)

    lcolor = Color_defaultColor
    lcolor = Color.scale(illum, light.color) if illum > 0

    specular = Vector.dot(livec, Vector.norm(rd))
    scolor = Color_defaultColor
    scolor = Color.scale(specular ** thing.surface.roughness, light.color) if (specular > 0)

    Color.plus(col, Color.plus(Color.mult(thing.surface.diffuse(pos), lcolor), Color.mult(thing.surface.specular(pos), scolor)))
  end

  def getPoint(x : Int32, y : Int32, screenWidth : Int32, screenHeight : Int32, camera)
    recenterX = ->(x : Int32) {  (x - (screenWidth  >> 1)) / (screenWidth  << 1) }
    recenterY = ->(y : Int32) { -(y - (screenHeight >> 1)) / (screenHeight << 1) }
    Vector.norm(Vector.plus(camera.forward, Vector.plus(Vector.scale(recenterX.call(x), camera.right), Vector.scale(recenterY.call(y), camera.up))))
  end

  def render(scene, image, screenWidth, screenHeight)
    screenHeight.times do |y|
      screenWidth.times do |x|
        color = self.traceRay(Ray.new(scene.camera.pos, self.getPoint(x, y, screenWidth, screenHeight, scene.camera)), scene, 0)
        r, g, b = Color.toDrawingColor(color)
        image.set(x, y, StumpyCore::RGBA.from_rgb(r, g, b))
  end end end
end

class DefaultScene
  getter :things, :lights, :camera

  def initialize
    @things = [
      Plane.new(Vector.new(0.0, 1.0, 0.0), 0.0, Surface_checkerboard),
      Sphere.new(Vector.new(0.0, 1.0, -0.25), 1.0, Surface_shiny),
      Sphere.new(Vector.new(-1.0, 0.5, 1.5), 0.5, Surface_shiny),
    ]
    @lights = [
      Light.new(Vector.new(-2.0, 2.5, 0.0), Color.new(0.49, 0.07, 0.07)),
      Light.new(Vector.new(1.5, 2.5, 1.5),  Color.new(0.07, 0.07, 0.49)),
      Light.new(Vector.new(1.5, 2.5, -1.5), Color.new(0.07, 0.49, 0.071)),
      Light.new(Vector.new(0.0, 3.5, 0.0),  Color.new(0.21, 0.21, 0.35)),
    ]
    @camera = Camera.new(Vector.new(3.0, 2.0, 4.0), Vector.new(-1.0, 0.5, 0.0))
  end
end

width, height = 500, 500
image = StumpyCore::Canvas.new(width, height)

t1 = Time.monotonic
rayTracer = RayTracer.new
scene = DefaultScene.new
rayTracer.render(scene, image, width, height)
t2 = Time.monotonic

puts "Completed in #{(t2 - t1).total_milliseconds} ms"

StumpyPNG.write(image, "crystal-raytracer3.png") 

This is a bit of a meaningless comparison unless you use the “standard” formatting, and I thing that would expand your oneliners back to 3 lines.

The website lists 2 parameters for each code version, time and locs, so I gave the loc.
My laptop was busy when I posted it, so it wasn’t in a quiet state to fairly time it.
Subsequently I’ve timed it, and it was about 104 ms compared to the original 106 ms.

However, these are ad hoc timings, and a more rigourous process could be used to standardize them, but it does indicates it’s at least as fast, if not faster.

And like I said already, the code reflects my style, and preferences, of coding.

I am not aware there is some standard way to format Crystal source code, though there may be common practice.

The most important thing is not how it looks, but what it does. You’re free to format it anyway you like for yourself. I don’t complain about other people’s formatting preferences, I try to understand their code.

So if you thnk you can improve its performance, please post it, in any formatting style you like. :relaxed:

I am not aware there is some standard way to format Crystal source code, though there may be common practice .

Run crystal tool format your_file.cr

2 Likes

In fact, all the snippets I submit here, or anywhere, use the standard/preferred formatting, because I have the formatter integrated with my editor. So formatting in any other way is mostly not Crystal-like and should be avoided, especially when publishing code in other websites for a lot of people to see (for example using 4 spaces for indentation is a no-no, or using semicolons instead of newlines.)

4 Likes

Another simplification/speedup.

Original:

  def getPoint(x : Int32, y : Int32, camera, screenWidth, screenHeight)
    recenterX = ->(x : Int32) do
      (x - (screenWidth / 2.0)) / 2.0 / screenWidth
    end
    recenterY = ->(y : Int32) do
      -(y - (screenHeight / 2.0)) / 2.0 / screenHeight
    end
    return Vector.norm(Vector.plus(camera.forward, Vector.plus(Vector.times(recenterX.call(x), camera.right), Vector.times(recenterY.call(y), camera.up))))
  end

To 1:

  def getPoint(x : Int32, y : Int32, screenWidth : Int32, screenHeight : Int32, camera)
    recenterX = ->(x : Int32) {  (x - (screenWidth  >> 1)) / (screenWidth  << 1) }
    recenterY = ->(y : Int32) { -(y - (screenHeight >> 1)) / (screenHeight << 1) }
    Vector.norm(Vector.plus(camera.forward, Vector.plus(Vector.scale(recenterX.call(x), camera.right), Vector.scale(recenterY.call(y), camera.up))))
  end

To 2:

  def getPoint(x : Int32, y : Int32, screenWidth : Int32, screenHeight : Int32, camera)
    recenterX =  (x - (screenWidth  >> 1)) / (screenWidth  << 1)
    recenterY = -(y - (screenHeight >> 1)) / (screenHeight << 1)
    Vector.norm(Vector.plus(camera.forward, Vector.plus(Vector.scale(recenterX, camera.right), Vector.scale(recenterY, camera.up))))
  end

You could extract some of those like:

def recenter(n : Int32, len : Int32)
   (n - (len  >> 1)) / (len  << 1)
end

def getPoint(x : Int32, y : Int32, screenWidth : Int32, screenHeight : Int32, camera)
  recenterX =  recenter(x, screenWidth)
  recenterY = -recenter(y, screenHeight)
  ...
end

… likewise for the -> variant.

Ah yes, that’s a possible refactor.
But in the code, this happens only 1 time in 1 place, so doing that just adds unnecessary code/complexity here.

The real, verifiable, performance gain was to replace the procs with just variables.
The same change likely improves the Ruby version (where it came from).

I’m trying to understand the theory of raytracers so I can assess what really needs to happen, and what value types need to be processed.

For example, the r,g,b colors are, by definition, unsigned 8 bit values [0…255].
And each final pixel has integer coordinates [x, y] with some color.

So reducing math operations between floats and integers could help speed it up.
I’ve been doing some micro benchmarks on snippets options, but because the scene is only 500x500 pixels it probably doesn’t matter for this program. Obviously for larger scene dimensions, it probably would matter to do the most efficient math possible.

2 Likes

Actually, after applying my theory of reducing floats|integers operations, the following version seems the fastest, after some quick preliminary benchmarking.

  def getPoint(x, y, screenWidth, screenHeight, camera)
    recenterX =  (x - (screenWidth  * 0.5)) / (screenWidth  * 2)
    recenterY = -(y - (screenHeight * 0.5)) / (screenHeight * 2)
    Vector.norm(Vector.plus(camera.forward, Vector.plus(Vector.scale(recenterX, camera.right), Vector.scale(recenterY, camera.up))))
  end

Here, I remove inputs (x, y) declared as Int32 (though they are, as well as width and height) to eliminate the (a / b) operation doing a division float conversion between integers (a, b). Since the resultant output is always a float scale factor, it now does all the math with float values, so the compiler doesn’t have to do any conversions.

I also tested with: (screenwidth / 2) but (screenwidth * 0.5) seems faster, as I assumed, since generally multiplication is faster than division for most cpus.

It’s these type of micro operational assessments that need to be made to get the same functional code to perform the fastest.

2 Likes

I think this is probably the last speedup.

In class CheckerboardSurface there are these 2 methods.

def diffuse(pos)  
  return Color_white if ((pos.z).floor + (pos.x).floor) % 2 != 0
  Color_black
end

def reflect(pos); ((pos.z).floor + (pos.x).floor) % 2 != 0 ? 0.1 : 0.7 end
  

All the values here are floats, but the floor makes them whole numbers.
The % 2 != 0 asks is (a + b) an odd number, but using % is expensive.
The measurably faster way to do this is as follows.

def diffuse(pos)  
  return Color_white if ((pos.z).floor + (pos.x).floor).to_i.odd?
  Color_black
end

def reflect(pos); ((pos.z).floor + (pos.x).floor).to_i.odd? ? 0.1 : 0.7 end

Here, we convert the whole number to an integer, and check its lsb; much faster.
We can even write it simpler|shorter, to make the whole class now become:

class CheckerboardSurface < Surface
  def diffuse(pos) 
    ((pos.z).floor + (pos.x).floor).to_i.odd? ? Color_white : Color_black
  end
  
  def reflect(pos); ((pos.z).floor + (pos.x).floor).to_i.odd? ? 0.1 : 0.7 end

  def specular(pos); Color_white end

  def roughness; 250 end
end

And for the curious, doing: ((pos.z).to_i + (pos.x).to_i).odd?
is not only slower, but gives the wrong picture. Try it, and see what image you get.
Here, two float>int conversions are done, vs one, and the sums aren’t the same.

I think I’ve now squeezed all the fat out of the math, and will benchmark to confirm.

4 Likes

There is another possible speedup, but it runs into implementation issues.

The purpose of this code: ((pos.z).floor + (pos.x).floor).to_i.odd?

in class CheckerboardSurface is to determine if you’re in a white or black checker. That’s determined here by checking if the sum of the (x, z) components are odd (white) or even (black), but the sum value is never used.

An equivalent boolean way to do this is to xor the integer values and check if the result is odd|even. If both are even or odd their sum is even, but odd if they are opposites. Here’s the code to do this.

((pos.z).floor.to_i ^ (pos.x).floor.to_i).odd?

Replacing the slower addition with the faster xor is more efficient.
However, to do that here we incur the .floor.to_i penalty, twice.

This code produces the correct image. It’s faster than the version using all floats and % 2 != 0 check, but a bit slower than the above version.

If there is some way upstream to bring those values in as whole numbers, or ints, and eliminate those conversions, this should be faster. There may also be an inherently faster way to make this determination using a different process.

As an aside, before the times of cpus having fast FPUs (floating point units), or systems GPUs, this type of algorithm would have been done with fixed point integers. As an old Forth programmer, on 16-bit systems, this was commonly done, and still is, because it’s still faster (and the preferred|only way) to do in many cases.

As a fun project, for some enterprising (curious) person with time on their hands, performing this with fixed point arithmetic may make it even faster.

Have a look at the raytracer I wrote using OpenGL where I use signed distance fields to render objects
see opengl_with_crystal/raymarch/lesson_05 at master · gummybears/opengl_with_crystal · GitHub
Did not have time to include reflections but that should not be difficult.

Plan is to build a small web interface (WebGL) like ShaderToy.

2 Likes

I think I’m almost there now, after this unexpected result.

I’m trying to make sure all inner loop methods are most efficient.

So just now, I changed this:

def intersect(ray)
    ...
    
   return nil if (dist == 0)
   Intersection.new(self, ray, dist)
end

to this:

def intersect(ray)
    ...
    
   (dist == 0) ? nil : Intersection.new(self, ray, dist)
end

This one change has resulted in an additional ~7.5% reduction in time! :scream:

I had noticed before doing this in other methods was faster, but I had missed this one. The compiler obviously handles optimizing this conditional better than the former.

I think (hope) I’m almost done now.
Now, the current versions is at least 16% faster than the original.

1 Like

I have a hard time believing that, but maybe it’s true, I don’t know. Using return if ... or the ternary operator should be equivalent.

The LLVM IR is actually a bit different:

--- 1/early_return.ll
+++ 2/ternary.ll
-  %5 = load %"(Int32 | Nil)", %"(Int32 | Nil)"* %0, !dbg !12865
-  ret %"(Int32 | Nil)" %5, !dbg !12865
+  br label %exit, !dbg !12865

else:
  # several identical instructions
-  %10 = load %"(Int32 | Nil)", %"(Int32 | Nil)"* %1, !dbg !12866
+  br label %exit, !dbg !12866
+
+exit:                                             ; preds = %else, %then
+  %9 = phi %"(Int32 | Nil)"* [ %0, %then ], [ %1, %else ], !dbg !12866
+  %10 = load %"(Int32 | Nil)", %"(Int32 | Nil)"* %9, !dbg !12866

(I’m using Int32 instead of Intersection)

Hey @asterite, I’m hurt. :frowning_face:
I did multiple timings, and the effect is real.

You (anyone) can check it yourself by copying your original code, make that one change in it, then do a|b timing between them.

My guess is that it has something to do with creating one return point, vs two, and that one return value is nil, which for this instance in the code, the compiler somehow optimizes very well when it’s used.

I guess the empirical thing to do is check the total compiled code for each version and see what the compiler is producing in each case.

Interesting!

It’s just that I thought LLVM would consider these equivalent, or that it would optimize it to the same code. It seems I was wrong.

Maybe the compiler could transform early returns to conditional branches in order to utilize such optimization? It would of course require more investigation into this.