quaternion to matrix, < 13 mults?
oops: 1 << (5) == 32
With integers add/sub is way faster than mul, but with floats, the relative difference is less I believe.
Is multiplication way faster than add/sub? To test that assumption I ran three tests. Each allocated a 1,000,000 by 3 array. The first generated random numbers for the first two columns, read the time, set the third column equal to the product of the other two one hundred times and read the time again. That is it went through the table from start to end 100 times as opposed to first row 100 times then second row. The second test did exactly the same thing except it did an add instead and the third did nothing. I got 1.92, 1.76 and .55 seconds. That seems to say multiply took 13.2% longer. That isn''t quite way to me. Rather I would call that marginal. Admittedly it is memory bound. There are a lot of limitations to the test, but still I got the results I did on this test. Multiplication lends itself well to being done in parallel. When transistors were scarce and expensive there was a huge differance, but it seems pretty marginal now.
As for testing if using an array matters. You use the CPU view on the debugger to see the generated assembler. Assuming you have debug information there will be comments showing you the source line followed by the generated assembler. If you have optimization turned on there isn''t necessarily a nice, neat correspondence between the generated code and assembler.
I don''t think it is going to make an improvement. I recently changed versions of my development tool. I don''t know how much is a change in version or befuddled memory. I tested a matrix multiply using two 4 x 4 matrices versus a structure. The 4 x 4 matrices actually had the extra instructions. I would swear I had the opposite result before. Oh well, that''s why assumptions are only good for designing a test and not for predicting its result.
As for testing if using an array matters. You use the CPU view on the debugger to see the generated assembler. Assuming you have debug information there will be comments showing you the source line followed by the generated assembler. If you have optimization turned on there isn''t necessarily a nice, neat correspondence between the generated code and assembler.
I don''t think it is going to make an improvement. I recently changed versions of my development tool. I don''t know how much is a change in version or befuddled memory. I tested a matrix multiply using two 4 x 4 matrices versus a structure. The 4 x 4 matrices actually had the extra instructions. I would swear I had the opposite result before. Oh well, that''s why assumptions are only good for designing a test and not for predicting its result.
Keys to success: Ability, ambition and opportunity.
On many modern processor FP add and multiply are as quick as each other, so minimising the number of multiplications can make things worse if it increase the number of additions and other FP operations.
Another factor is that many processors also have a multiply-add instruction that is as fast as a single multiply. This greatly accelerates much vector, matrix and quaternion code as much of it relies on sums of products done as quickly as possible. But it does require some thought (or hand coded assembly) to take full advantage of it.
But the biggest speedup can be got from taking advantage of the parallel/vector/SSE units in the processors in all PCs (and all ''next gen'' game consoles). These can perform such operations an order of magnitude faster than code on an FPU, and so are the only way to go for performance-critical code. Unfortunately this is far harder to do as it means hand coding assembler for each processor architecture, but the benefits are usually well worth it.
Another factor is that many processors also have a multiply-add instruction that is as fast as a single multiply. This greatly accelerates much vector, matrix and quaternion code as much of it relies on sums of products done as quickly as possible. But it does require some thought (or hand coded assembly) to take full advantage of it.
But the biggest speedup can be got from taking advantage of the parallel/vector/SSE units in the processors in all PCs (and all ''next gen'' game consoles). These can perform such operations an order of magnitude faster than code on an FPU, and so are the only way to go for performance-critical code. Unfortunately this is far harder to do as it means hand coding assembler for each processor architecture, but the benefits are usually well worth it.
John BlackburneProgrammer, The Pitbull Syndicate
quote:
Original post by johnb
Another factor is that many processors also have a multiply-add instruction that is as fast as a single multiply.
how called i don''t find documents about this instr..
"take a look around" - limp bizkit
www.google.com
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia
My Page davepermen.net | My Music on Bandcamp and on Soundcloud
Try the PMADDWD instruction.
Keys to success: Ability, ambition and opportunity.
hm..not that useful for floats is it?
mmx is integermath if i''m not mistaken..
"take a look around" - limp bizkit
www.google.com

"take a look around" - limp bizkit
www.google.com
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia
My Page davepermen.net | My Music on Bandcamp and on Soundcloud
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement