How to sleep for nanoseconds?

Hello!

I am writing an emulator for the 65c816 processor (of Apple IIgs fame) in Crystal. Things have gone well, and I am working on the main loop.

When an instruction gets executed, it returns the number of emulated cycles the instruction took. At 8 MHz, each cycle should take 125 nanoseconds. I know I can time the instructions using Time.measure, so let’s say a NOP instruction taking 2 cycles has run. This should take 250 ns. With the --release flag, it takes way less than that, which is good.

So, let’s say my NOP instruction took 20 ns to execute. How do I sleep for the other 230 ns.? I could just have a while loop that does nothing, but I wasn’t sure if the empty while loop would be removed by the optimizer.

Thanks in advance for any help!

Oooh if it can be used to play Number Munchers, that would be amaaazing!

To answer your question:

sleep 230.nanoseconds

Realistically, the scheduler is a queue, so it could sleep for longer if other fibers are able to get in front and run for longer than expected. It shouldn’t sleep for less time, though. Just something to keep in mind for tuning.

LLVM is ludicrously good at eliminating code, but if you’re spinlocking you should be fine since Time.monotonic is nondeterministic. At the very least, you have it as an option if the sleep call sleeps for an unexpectedly long time.

Awesome, thank you!

Related: Sleep less than 1ms · Issue #6586 · crystal-lang/crystal · GitHub

1 Like

Oh fascinating, I didn’t even think about event-loop precision.

I ran a benchmark that compares the precision of sleep with a spinlock (it’s not technically a spinlock because it’s not polling a shared resource but despite the difference in goal the mechanism is the same) and it turns out sleep (at least on macOS) isn’t precise enough and, surprisingly, if you try to sleep for 100ns it sleeps longer than if you tried to sleep for 1µs.

Benchmark code
require "benchmark"

Benchmark.ips do |x|
  x.report "sleep 1ms" { sleep 1.millisecond }
  x.report "sleep 1µs" { sleep 1.microsecond }
  x.report "sleep 100ns" { sleep 100.nanoseconds }
  x.report "spinlock 1ms" { spinlock 1.millisecond }
  x.report "spinlock 1µs" { spinlock 1.microsecond }
  x.report "spinlock 100ns" { spinlock 100.nanoseconds }
end

def spinlock(duration : Time::Span)
  start = Time.monotonic
  while Time.monotonic - start < duration
  end
end
➜  Code crystal run --release sleep_precision.cr
     sleep 1ms 826.42  (  1.21ms) (± 6.73%)  0.0B/op  9544.99× slower
     sleep 1µs 358.20k (  2.79µs) (± 4.59%)  0.0B/op    22.02× slower
   sleep 100ns  85.14k ( 11.75µs) (± 1.77%)  0.0B/op    92.65× slower
  spinlock 1ms 999.94  (  1.00ms) (± 0.01%)  0.0B/op  7888.63× slower
  spinlock 1µs 992.42k (  1.01µs) (± 0.49%)  0.0B/op     7.95× slower
spinlock 100ns   7.89M (126.77ns) (± 0.78%)  0.0B/op          fastest

The spinlock approach is pretty precise (within 27ns), but sleep is off by 3 orders of magnitude when you try to get down that low and has less than half the variance.

Currently, sleep timeouts even go on the event loop (libevent), or at least have some interference with it (IOCP). So there’s a lot of code involved which can introduce quite a bit of variance.
If you need such precise timing, I guess the best option so far is just keep the current thread spinning.

That’s what I’ve ended up with, thanks.

Yeah, a sleep won’t trigger before its time has spanned, but it may take time to resume.

Sleeping involves both the scheduler and the event loop because we don’t want to block a thread, and when we do, we wait on the event loop (that will resume on the next timer). The scheduler may switch multiple fiber contexts before running the event loop and busy fibers may prevent the event loop from running, and that will delay the detection of ready sleeping fibers.

If blocking a thread is fine, then maybe nanosleep would behave a bit timer? It will “only” involve the OS’ scheduler, and spare an expensive spin lock —they’re usually never a good idea, as they misguide the OS scheduler and usually burn a CPU that could work on something else (or sleep).

If you need something closer to RT, an alternative could be to use timers, see timer_create(2)… but the signal handling will go through the event loop, so you’re back to square one (unless you call signal or sigaction directly instead of Signal.trap, but we’re going into unsafe Crystal here).

Note: Go checks timers independently from netpoll (aka event loop), and checks them early during goroutine context switches. That may help to resume them in a more timely manner.

EDIT for an emulator, you likely want to spin, but going with realtime (RT) timers might behave nicely (granted you override the signal handler by going with a raw sigaction). It would be interesting to know how other emulators usually deal with that (they need RT).