I can''t resist an optimization challenge
This last bit of code is good, but if you use MMX you can cut memory accesses from 9 to 5, nearly doubling the effective speed of the code. I think this should work:
mov eax, 255 ; load value in eaxmov edx, DWORD PTR _mine$[esp] ; load destination addressmovd mm0, eax ; mov into mmx registerpunpckldq mm0, mm0 ; copy into high 32 bits of mmx registertest edx, 4 ; test to see if destination is 4-byte or 8-byte alignedjz alignqwordmov [edx], eax ; 4-byte alignedmovq [edx+4], mm0movq [edx+12], mm0movq [edx+20], mm0movq [edx+28], mm0retalignqword:movq [edx], mm0 ; 8-byte alignedmovq [edx+8], mm0movq [edx+16], mm0movq [edx+24], mm0mov [edx+32], eaxret
Of course, you have to remember to throw in an EMMS instruction before you can use normal floating-point code again. One thing I learned from Michael Abrash: There Ain''t No Such Thing As The Fastest Code. Modern compilers can generate very good code, but never the best code. Remember, the compiler only knows how to make generic optimizations. The best optimizer for your project is your own brain.
Note: I am not saying everything should be hand optimized, but for that critical 5% of the code, it may be just what your project needs.
- Democritus
* Truth is universal *