That is, of course, about as accurate as you can get in reality. But, it is pretty brute force and as you've found, harsh on the CPU utilization. At an absolute minimum you could be calling the CPU's pause instruction at least once within the loop in order to let the CPU know this is a spin wait. For a complete solution you want to do two things:
#1 At startup perform a little timing operation which figures out the duration of the “mm_pause” instruction. I typically do this looking for the number of times to call the instruction which is equivalent to about 1000ns. I also do this till I have 5 good samples meaning that if I see a few outliers, I throw them out and try again. An outlier would generally be anything more than 25% different than a prior sample. If it's higher, throw out the current sample, lower throw out prior samples.
#2 Change your code to a decay loop:
// Assumes #1 is stored in Spin1000ns.
// Delay time is in HPC ticks, convert externally for your target.
void decayTimer(uint64_t delayTime) {
uint64_t current = HPCGetCounter();
uint64_t target = delayTime + start;
while (current < target) {
// Convert delta from HPC ticks to MS.
uint64_t ms = HPCToMs(target - current);
if (ms > 2) {
// Assuming you've call timeBeginPeriod(1) somewhere..
// Due to all the things Windows manages, it is almost guaranteed this will run
// longer than you want it too so we sleep for "half" the time. If you still
// see occasional spikes, increase the (ms > 2) check, I believe I use 5.
Sleep(ms / 2);
} else {
for (uint64_t i = 0; i < Spin1000ns; ++i) {
_mm_pause;
}
}
// Update current.
current = HPCGetCounter();
}
This should chop the CPU busy time down significantly without giving up timer accuracy. If you need greater accuracy, you can take this a bit further and use a decay variation of the “Spin1000ns” loop, i.e. start at 1000ns, then use 500ns, then 250 etc etc till close enough. (NOTE: Double check the above, I'm sure I probably added a nice bug for the reader to figure out…. :D )