Advertisement

quaternion to matrix, < 13 mults?

Started by May 25, 2002 05:19 AM
15 comments, last by jitspoe 22 years, 8 months ago
'Scuse me if this has been covered already. I'm new here. I saw lots of stuff on the topic of quaternions, but I didn't see anything covering optimizing a quat. to mat. function. I'm too sleepy to keep looking right now, so I'll just post this. I managed to optimize the function down to 13 multiplies. I was curious if it was possible to optimize it anymore than that. Here's my code:

void pquat_to_matrix(pvector4f qinp, GLfloat m[4][4])
{
	pquat q;
	float q00, q01, q11, q12, q22, q13, q03, q23, q20;

	/* Since everything gets multiplied by 2, scale the quat
	 * by sqrt(2), that way when 2 values are multiplied,
	 * the end result is the same. */
	pvector4f_copy(qinp,q);
	pvector4f_scale(q,SQRT2,q);

	/* Each mult. gets used twice, so just do one mult
	 * for each and store in a temp var */
	q00 = q[0]*q[0];                  q20 = q[2]*q[0];
	q01 = q[0]*q[1]; q11 = q[1]*q[1];
	                 q12 = q[1]*q[2]; q22 = q[2]*q[2];
	q03 = q[0]*q[3]; q13 = q[1]*q[3]; q23 = q[2]*q[3];

	/* Create the matrix */
	m[0][0]=1-(q11+q22); m[1][0]=  (q01-q23); m[2][0]=  (q20+q13); m[3][0]=0;
	m[0][1]=  (q01+q23); m[1][1]=1-(q22+q00); m[2][1]=  (q12-q03); m[3][1]=0;
	m[0][2]=  (q20-q13); m[1][2]=  (q12+q03); m[2][2]=1-(q11+q00); m[3][2]=0;
	m[0][3]=0          ; m[1][3]=0          ; m[2][3]=0          ; m[3][3]=1;
}
 
[edited by - jitspoe on May 25, 2002 6:22:03 AM]
___________________________________Digital Paint: Paintball 2.0jitspoe's joint
You have to look at what it generates, but you might want to use an array with hardcoded subscripts for the intermediate work variables. My compiler uses an extra load for the address rather just loading an address once and then using an offset. So the MIGHT eliminate 20 or so instructions.
Keys to success: Ability, ambition and opportunity.
Advertisement
Here's the function from the Allegro library.
I counted 10 multiplies (not including *2). I don't know how fast this is all up, or if it misses anything yours does.


/* quat_to_matrix:
* Constructs a rotation matrix from a quaternion.
*/
void quat_to_matrix(AL_CONST QUAT *q, MATRIX_f *m)
{
float ww;
float xx;
float yy;
float zz;
float wx;
float wy;
float wz;
float xy;
float xz;
float yz;

/* This is the layout for converting the values in a quaternion to a
* matrix.
*
* | ww + xx - yy - zz 2xy + 2wz 2xz - 2wy |
* | 2xy - 2wz ww - xx + yy - zz 2yz - 2wx |
* | 2xz + 2wy 2yz - 2wx ww + xx - yy - zz |
*/

ww = q->w * q->w;
xx = q->x * q->x;
yy = q->y * q->y;
zz = q->z * q->z;
wx = q->w * q->x * 2;
wy = q->w * q->y * 2;
wz = q->w * q->z * 2;
xy = q->x * q->y * 2;
xz = q->x * q->z * 2;
yz = q->y * q->z * 2;

m->v[0][0] = ww + xx - yy - zz;
m->v[1][0] = xy - wz;
m->v[2][0] = xz + wy;

m->v[0][1] = xy + wz;
m->v[1][1] = ww - xx + yy - zz;
m->v[2][1] = yz - wx;

m->v[0][2] = xz - wy;
m->v[1][2] = yz + wx;
m->v[2][2] = ww - xx - yy + zz;

m->t[0] = 0.0;
m->t[1] = 0.0;
m->t[2] = 0.0;
}

Edit1:
I looked at yours a bit and saw that you avoided one multiply by not getting zz, and using its relationship with the other values, using a few extra subtractions.

Allegro's fn assumes it's normalized, which you should probably do if you want some serious speed. Normal quaternions multiply to produce more normal ones, and I can't think of a reason why you want anything else. Your normalizing I guess accounts for the other 4 muliplies.


[edited by - AndyMan on May 25, 2002 5:47:39 PM]
It''s 16 with the *2''s tho... or do multiplications by 2 get optimized and not count?
___________________________________Digital Paint: Paintball 2.0jitspoe's joint
A += A is the same as A *= 2, so it's much quicker.

Also, muliplying/dividing by any power of 2 can be done as a bit-shift operation, which is faster.

The compiler will use the fastest operation when it has the choice.

Edit: oic, it's scaling not normalising. Well, using *2 rather than scaling will be faster.



[edited by - AndyMan on May 25, 2002 6:02:01 PM]
quote:
Original post by AndyMan
Also, muliplying/dividing by any power of 2 can be done as a bit-shift operation, which is faster.

That''s only true with integers.

Advertisement
With the way floats are stored, you can multiply by 32, for instance, by adding 3 to the exponent section, no?
I expect a good instruction set would include this.
"With the way floats are stored, you can multiply by 32, for instance, by adding 3 to the exponent section, no?"

Wouldn''t it be 5?

And even still only works when the numbers are normalized.
Cool, thanks for the input guys. I''ve gotten it down to 9 multiplies now (and 21 adds/subtracts).

About the array comment -- where would I read up on that?
___________________________________Digital Paint: Paintball 2.0jitspoe's joint
about the adding to exponent thing:
this works always, as the value will remain normalized (just if you act on the mantissa it possibly have to be changed)

to the quat-func. how about sse?
then you have even faster code..

and btw, if you get rid of a multiplication but you''re adding some aditions/subtractions instead, you won''t get more speed. if multiplications are independend and after eachother they are faster

"take a look around" - limp bizkit
www.google.com
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

This topic is closed to new replies.

Advertisement