Advertisement

Bit Shifting... Yay or Nay? :)

Started by December 25, 1999 02:29 AM
9 comments, last by GameDev.net 25 years ago
I think it's another thing that newer Intel chips are just slow at doing... There have been other weird things like that, such as PUSHA and POPA being slower than pushing and popping all the registers manually. I think another one had to do with the LOOP instruction actually being slower than manually looping.
On the other hand, check out: http://www.geocities.com/SiliconValley/Way/6422/optimizations.htm
It says a shift is 1 clock. I'm assuming a multiply is more. I'm also assuming you had optimizations off, but you might want to see what code the compiler made for you, and that it didn't substitute an ADD for the multiply. (Try multiplying by 64 and << by 6)..I'd be interested to see what you get from doing that It's also possible that the multiplication was better suited for the dual pipeline, but probably not if it used a MUL.
-ns


[This message has been edited by NightShade (edited December 22, 1999).]

-ns-
I think you have too much loop overhead to get meaningful results. Unroll the loop about 32 times (just copy/paste) and see if you get diff results. yeah, and disable optimizations like the guy said.

a few other things. do it in asm to make sure the compiler doesnt touch it. don't assign to a variable in memory every time. this dillutes any results you might get. go for the least loop overhead, like this:

code:
_asm {  mov eax, 4566436  mov ecx, 10000000/32looptop:  shl eax, 4  shl eax, 4  shl eax, 4  shl eax, 4  (repeat 32 times)  dec ecx  jnz looptop}

and
code:
_asm {  mov eax, 4566436  mov ecx, 10000000/32looptop:  imul eax, 16  imul eax, 16  imul eax, 16  imul eax, 16  (repeat 32 times)  dec ecx  jnz looptop}

[This message has been edited by foofightr (edited December 22, 1999).]

Advertisement
Hey

I just tried your suggested test (*64, << 6) and got exactly the same framerates (equal). I just redid the first test and found them to be pretty equal as well, only varying by 0.01 of a fps. Pretty odd And yes, I turned all optimisations off.

I guess its still better to use shifting tho, if only to aid any slower processors (although my p200 isn't exactly moder ).

Thanks for the relpy,

Russ

Hey. I was just playing around with generating assembly listings for console projects in VC++ 6, and seeing what C code would generate what assembly code. I had "i" as an integer variable. Anyway, I looked and saw that
code:
i = i * 2;i *= 2;i = i * 4;i *= 4;

generated
code:
shl eax,1shl eax,1shl eax,2shl eax,2

anyway. (I'm not sure if eax was the correct register or not, though). So the compiler uses bit shifts when you multiply by a power of 2 anyway. I just thought I'd mention that.

------------------
http://qoy.tripod.com

[This message has been edited by Qoy (edited December 22, 1999).]

Trivial shifts (like multiplication by a power of 2) are still quicker, but it depends. If there are a lot of independent instructions, PPro and above will overlap the short latency instructions with the long lantency multiply.

When you get into two or more shifts and adds, multiplying is quicker.

If you're unsure, use QueryPerformanceCounter() around the shifting version and multiplying version of a function and see which is quicker.

Which is faster? Directly bit shifting a variable or accessing an
array that has precalculated bit shifted values?

Also does the length of the register (32-bits, 64-bits) have
anything to do with how long it takes to bit shift?

Dubious

Advertisement
Are you doing this test without optimizations? 'tis possible that the compiler may be changing the *2 to << 1.
--Shannon Schlomer, BLAZE Technologies, Inc.
My guess would be that since you're multiplying constants, it just pre-calculates the answer and hard codes it. There is no reason to calculate it every pass, and even without optimizations it should do this. A shift is as fast as any instruction, and should certainly beat an integer multiply (at the asm level).

Make sure to check the asm code to see that it did generate shifts and mults. It shouldn't have in your example.

Rock

Ah yes, I forgot, the compiler is probably just multiplying the 2 constants and then hardcoding the results. Shift or multiply a variable instead, and don't use any optimizations or the variable may be taken out of the loop and replaced with a constant if the compiler detects its value doesn't change.

------------------
--Shannon Schlomer, BLAZE Technologies, Inc.

[This message has been edited by blaze (edited December 24, 1999).]

--Shannon Schlomer, BLAZE Technologies, Inc.
Hi,

A few mins ago I was reading some reviews of Andre's new book on amazon. Some guy was denouncing all of andre's claims, and one of them was that 'bit shifting is faster than multiplying'. After years of been told _always_ bit shift, I laughed.

I thought I'd check it out tho, having never actually compared normal multiplication and bit shifting, just taking it on faith that it is faster...anyway.

I used the following code for bit shifting;

for(DWORD i=0; i<10000000; i++)
temp = 6545345 << 1;

And this for normal multiplication;

for(DWORD i=0; i<10000000; i++)
temp = 6545345 * 2;

10,000,000 seemed a large enough amount of multiplication

Upon checking the frame rates, I got 3.15fps for the standard multiplication, and about 3.12fps for the bit shifting... HMM.

Is this just me? Bear in mind im on a lowly p200mmx, hence the slow fps But in comparison, how come bitshifting is actually slower than standard multiplication??

Is this all just me?

Comments/Advice/Flames...

--
Russ

This topic is closed to new replies.

Advertisement