Advertisement

I'll show you mine, if you'll show me yours

Started by July 03, 2000 09:11 AM
13 comments, last by DeltaVee 24 years, 5 months ago
Say I''m an assembly programmer so I don''t, in C those sin and cos functions are they using the fpu? Or a look up table? Or what?

Cause they might be able to be optimized, not sure though... I may try coding this in asm. Cause that sounds like a cool idea..

See ya,
Ben
__________________________Mencken's Law:"For every human problem, there is a neat, simple solution; and it's always wrong."
"Computers in the future may weigh no more than 1.5 tons."- Popular Mechanics, forecasting the relentless march of science in 1949
cyberben, his new algorithm uses a lookup table...

DeltaVee...

try using

*(sintable+nAng)

where possible instead of

sintable[nAng]

since it''s faster... not in this case (i dont think it would change anything since it''s basically the same), but if you use some other variables in there, and if you use a 2D buffer, then it''s gonna make a noticable difference...

anyways.. great job.. maybe some assembler wont hurt
Advertisement
Two suggestions:

Doesn''t IDirectDrawSurface7::Blt() do rotated blits? I notice there is a dwRotationAngle field in the DDBLTFX structure that you pass to Blt(). The rotation would then be done in hardware, I presume. By the way, please excuse my ignorance if I''m wrong -- I''ve never used DD except as a means to get to D3D.

Also, I''ve seen a rotation algo before... it went something like this:


float u,v,uIncX,uIncY,vIncX,vIncY;
FLOAT ustart = SOMETHING; // sorry, I can''t remember
FLOAT vstart = SOMETHING_ELSE;
for(int z=startz; z < endz; z++){
u = ustart;
v = vstart;
for(int x=startx; x < endx; x++){
dest[x][z] = src[round(u)][round(v)];
u += uIncX;
v += vIncX;
}
ustart += uIncY;
vstart += vIncY;
}


Obviously you''re wondering, "what the hell are ustart, vstart, uIncX, etc.?". Well basically, U and V are the "x" and "y" coordinates of your source image -- as you go straight "across" your dest image (in your x loop), you can imagine that you are kinda taking a diagonal line through your source image (depending on rotation angle), hence you have to update both U and V. Obviously if you''re seriously interested in this you need to research it. I know this for sure, though: you shouldn''t have any multiplications in your inner loop.
Yes, you can rotate and blt at the same time, only problem is that your hardware needs to support it.

I have a neonstar 128 (or something like that) on my laptop, and believe me, a box of crayons has more drawing capabilities.

So to cut a long story short, I am pretty much writing my own software rasterizer. (My own blt is faster than my cards!). I need to anyway, I would like to publish (don''t we all) this game at some point, so it needs to work on a wide variety of cards.

I am currently revising the code for the next optimization. I can get rid of the inner multiplication, but needs a little fiddling around.

BTW. On my Celeron 500MHz Voodoo5 550 Agp 64MB (dual processor) the routine takes less than 9 ms!

thanx again



--------------------------
Carpe Diem
D.V.Carpe Diem
87ms? I can almost do a JTFA analysis on a 480x480 in that much time!

Get rid of those doubles, make em floats (and I mean floats not long floats!). Pentium class machines like DWORDs & 32bit floats. I really don;t think long double precision is needed.

Pentiums have built in lookup tables for sin's & cos's... there are a set of op codes dedicated to them. You even call an opcode to load PI & e & log2(10) nice & fast.

Use sinf() & cosf() for floating point versions of trig functions

Drop all the conversions possible & only use floats & longs. If you keep the FPU busy, it works better (more effiecently with no stalls). If you let the FPU do the conversion it will take a handful of clock ticks in the microcde, if you do the conversion it will take a jar full of ticks in an op code or two.

I do not believe you can superscale FPU & CPU instructions. One pipeline or the other stalls. *edit* I am mistaken, I checked the Intel docs and they claim that the CPU & FPU _can operate in parralell. The doc I have is dated, and only contains info about the pentium pro, it has two interger and two floating point pipes, anyone know how many the PII & PIII have? *edit*

somethings wrong if 24bbp is copying 3bytes. They should be 32bit padded values. And should get smashed into place with one MMX call.

Most importantly:
Insist that textures be an even size & process two or four points inside each while (hW--) loop.

-Magmai Kai Holmlor

I am brazen and fear no flames.

Edited by - Magmai Kai Holmlor on July 7, 2000 2:41:52 PM
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara

This topic is closed to new replies.

Advertisement