While checking out something'n'other via compiler explorer i noticed that MSVC does
mov [temp], eax
movss xmm0, [temp]
… instead of just …
movd xmm0, eax
Won't really matter performance wise for me, but still - is there some way to stop MSVC from being an absolute idiot and do the sane thing like every other darn compiler i can find? _castu32_f32 seems to be perfect fit - but is not available in MSVC. Is there something else that could work?
--------------------------------------------
Thought about it a bit more before pressing "post" … and got an idea to solve it. Hm, why not include it and post it anyway - what do you think of it?
float _castu32_f32(int unsigned val) {
float res;
_mm_store_ss(&res, _mm_castsi128_ps(_mm_loadu_si32(&val)));
return res;
}
1. _mm_loadu_si32 - move int into the beginning of a m128i (first 32 bits in m128 are what floats internally are - let's face it, the ancient FPU floats are so horrible that no-one uses them)
2 _mm_castsi128_ps - does nothing (just switches the type sugar coating for the compiler)
3 _mm_store_ss - does nothing (move m128 to float - where the later is internally actually also m128 anyway)
Works perfectly in release, but is terrible (of course) when optimizations are not enabled. Ignoring all the function call code and its extensive checking - this gem remains:
movd xmm0,dword ptr [val] // do the thing (_mm_loadu_si32)
movdqa xmmword ptr [rsp+50h],xmm0 // store it in preparation of doing nothing
movaps xmm0,xmmword ptr [rsp+50h] // load to do nothing (_mm_castsi128_ps)
movaps xmmword ptr [rsp+60h],xmm0 // store it in preparation of doing nothing
movaps xmm0,xmmword ptr [rsp+60h] // load to do nothing (_mm_store_ss)
movss dword ptr [res],xmm0 // store it in result after doing nothing
Amusing. Also, shows/confirms what the compiler sees.