quote: Original post by Jumpster
// Old version...cTexel[0] = (unsigned char) (imageData[iIndex + 0] * fLowTexWeight);cTexel[1] = (unsigned char) (imageData[iIndex + 1] * fLowTexWeight);cTexel[2] = (unsigned char) (imageData[iIndex + 2] * fLowTexWeight);// New version...cTexel[0] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);cTexel[1] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);cTexel[2] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);
I am not sure, but I believe this will also help you out a little-bit. If I am not mistaking, the asm code for
iIndex+0 equates to something like:mov eax, [iIndex] add eax, 0 ... mov eax, [iIndex] add eax, 1 ... mov eax, [iIndex] add eax, 2 ...
It seems to me, although I have not checked this out yet, that the iIndex++ would translate to:inc [iIndex] ... inc [iIndex] ... inc [iIndex]
and provide the same results. Again, I am not sure, this is what I think seems like would happen.
At any rate, it doesn''t hurt to try it, right?
The first one will translate inte statements of the type:
mov esi,imageData
mov edi,cTexel
mov eax,[esi]
mov ebx,[esi+1]
...
mov [edi],eax
mov [edi+1],ebx
...
thats kinda fast and a good compiler will get it to fill both pentium pipes (note string instructions can be used to do this but!!! they are slower) (also when moving bytes probably it will move them in 32 bit chunks if possible)
whereas the second variant would go something like:
mov esi,imageData
mov edi,cTexel
mov eax,[esi]
inc esi
mov [edi],eax
inc edi
...repeat...
uh oh! this aint good... added complexety harder to get to fill both pipes holy macaroni this backfired.
note tho, the offset or index can sometimes add one cycle to the addressing count and that is the same cost that that of inc reg but actially when you know the number of items beeing moved the first version is actually better.
cheers!