Back to General and Gameplay Programming

I'll show you mine, if you'll show me yours

DeltaVee · 2000-07-07T06:02:20

I am writing a 360 degree top down shoot-em-up (why not?). So basically I have written a routine that rotates a map segment (which is pre-rendered on a surface) and blts it onto another surface. Now the only problem is it is a bit slow. Any body have any faster methods? Or suggestions? on a 433 Celeron the blt takes 87 milliseconds to run. destination blt size = 480 x 480 16 bpp source surface is approx 700 x 700. code compiled with Visual Studio 6. Compiler optimizated for speed. Here is the code: bool MyRotateBlt(CRect cDest,LPDIRECTDRAWSURFACE7 lpDest,CRect cSrce,LPDIRECTDRAWSURFACE7 lpSrce,double dAng) { // lock the destination surface and calculate metrics DDSURFACEDESC2 DestDesc; ZeroMemory(&DestDesc, sizeof(DestDesc)); DestDesc.dwSize = sizeof(DestDesc); HRESULT result = lpDest->Lock( NULL, &DestDesc, DDLOCK_WRITEONLY, NULL ); if( result != DD_OK ) return false; int nBytesPerPlane = DestDesc.ddpfPixelFormat.dwRGBBitCount / 8; LONG lDestPels = DestDesc.lPitch / nBytesPerPlane; LONG lDestPelsLeft = lDestPels - cDest.Width(); LPWORD lpDestMem = (LPWORD)DestDesc.lpSurface; // lock the source surface and calculate metrics DDSURFACEDESC2 SrceDesc; ZeroMemory(&SrceDesc, sizeof(SrceDesc)); SrceDesc.dwSize = sizeof(SrceDesc); result = lpSrce->Lock( NULL, &SrceDesc, DDLOCK_READONLY, NULL ); if( result != DD_OK ) return false; LONG lSrcePels = SrceDesc.lPitch / nBytesPerPlane; LPWORD lpSrceMem = (LPWORD)SrceDesc.lpSurface; // pre-calculate various metrics long double sinAng = sin(DEGTORAD(dAng)); long double cosAng = cos(DEGTORAD(dAng)); int nSrceCX = cSrce.left + cSrce.Width() / 2; int nSrceCY = cSrce.top + cSrce.Height() / 2; int nCenterMemLoc = nSrceCX + nSrceCY * lSrcePels; long double nDestLeft = - cDest.Width() / 2; long double nDestRight = nDestLeft + cDest.Width(); long double nDestTop = - cDest.Height() / 2; long double nDestBottom = nDestTop + cDest.Height(); int nXDestOff = cDest.left; int nYDestOff = cDest.top * lDestPels + nXDestOff; long double sinY; long double cosY; for (long double nY = nDestTop ; nY < nDestBottom; nY ++ , nYDestOff += lDestPelsLeft) { sinY = nY * sinAng; cosY = nY * cosAng; for (long double nX = nDestLeft ; nX < nDestRight ; nX ++) { // perform transformation here int nSX = (int)(nX * cosAng + sinY); int nSY = (int)(nX * sinAng - cosY); lpDestMem[nYDestOff ++] = lpSrceMem[nCenterMemLoc + nSY * lSrcePels + nSX]; } } lpSrce->Unlock(NULL); lpDest->Unlock(NULL); return true; } 1. The error trapping sux, please no comments on that. 2. I am using long doubles because that is the size of the reals on the FPU (80 bits) there for there are no conversions needed to load them into the FPU registers. (Mixing real and integer arithmatic allows the compiler to create code that runs on the FPU and CPU at the same time, effectivly parallel processing) 3. in 8bpp there is absolutely no increase in performance as it takes just as long to move a byte as it does to move a word. 4. in 24 bpp there is a 30 percent performance hit as a byte and a word need to be moved (no assembler to move 24 bits at a time). 5. both surfaces are in system memory. 6. There are no comments because my code is beautiful and is self-documenting 7. If you are a newbie, you may wet yourself in excitment. Look free code that does stuff -------------------------- Carpe Diem

General and Gameplay Programming Programming

Started by DeltaVee July 03, 2000 09:11 AM

13 comments, last by DeltaVee 24 years, 5 months ago

cyberben

122

July 06, 2000 06:54 PM

Say I''m an assembly programmer so I don''t, in C those sin and cos functions are they using the fpu? Or a look up table? Or what?

Cause they might be able to be optimized, not sure though... I may try coding this in asm. Cause that sounds like a cool idea..

See ya,
Ben

__________________________Mencken's Law:"For every human problem, there is a neat, simple solution; and it's always wrong."
"Computers in the future may weigh no more than 1.5 tons."- Popular Mechanics, forecasting the relentless march of science in 1949

Gladiator

127

July 06, 2000 07:05 PM

cyberben, his new algorithm uses a lookup table...

DeltaVee...

try using

*(sintable+nAng)

where possible instead of

sintable[nAng]

since it''s faster... not in this case (i dont think it would change anything since it''s basically the same), but if you use some other variables in there, and if you use a 2D buffer, then it''s gonna make a noticable difference...

anyways.. great job.. maybe some assembler wont hurt

Eric

138

July 06, 2000 09:15 PM

Two suggestions:

Doesn''t IDirectDrawSurface7::Blt() do rotated blits? I notice there is a dwRotationAngle field in the DDBLTFX structure that you pass to Blt(). The rotation would then be done in hardware, I presume. By the way, please excuse my ignorance if I''m wrong -- I''ve never used DD except as a means to get to D3D.

Also, I''ve seen a rotation algo before... it went something like this:

float u,v,uIncX,uIncY,vIncX,vIncY;
FLOAT ustart = SOMETHING; // sorry, I can''t remember
FLOAT vstart = SOMETHING_ELSE;
for(int z=startz; z < endz; z++){
u = ustart;
v = vstart;
for(int x=startx; x < endx; x++){
dest[x][z] = src[round(u)][round(v)];
u += uIncX;
v += vIncX;
}
ustart += uIncY;
vstart += vIncY;
}

Obviously you''re wondering, "what the hell are ustart, vstart, uIncX, etc.?". Well basically, U and V are the "x" and "y" coordinates of your source image -- as you go straight "across" your dest image (in your x loop), you can imagine that you are kinda taking a diagonal line through your source image (depending on rotation angle), hence you have to update both U and V. Obviously if you''re seriously interested in this you need to research it. I know this for sure, though: you shouldn''t have any multiplications in your inner loop.

DeltaVee

Author

138

July 07, 2000 05:34 AM

Yes, you can rotate and blt at the same time, only problem is that your hardware needs to support it.

I have a neonstar 128 (or something like that) on my laptop, and believe me, a box of crayons has more drawing capabilities.

So to cut a long story short, I am pretty much writing my own software rasterizer. (My own blt is faster than my cards!). I need to anyway, I would like to publish (don''t we all) this game at some point, so it needs to work on a wide variety of cards.

I am currently revising the code for the next optimization. I can get rid of the inner multiplication, but needs a little fiddling around.

BTW. On my Celeron 500MHz Voodoo5 550 Agp 64MB (dual processor) the routine takes less than 9 ms!

thanx again

--------------------------
Carpe Diem

D.V.Carpe Diem

Shannon Barber

1,684

July 07, 2000 06:02 AM

87ms? I can almost do a JTFA analysis on a 480x480 in that much time!

Get rid of those doubles, make em floats (and I mean floats not long floats!). Pentium class machines like DWORDs & 32bit floats. I really don;t think long double precision is needed.

Pentiums have built in lookup tables for sin's & cos's... there are a set of op codes dedicated to them. You even call an opcode to load PI & e & log2(10) nice & fast.

Use sinf() & cosf() for floating point versions of trig functions

Drop all the conversions possible & only use floats & longs. If you keep the FPU busy, it works better (more effiecently with no stalls). If you let the FPU do the conversion it will take a handful of clock ticks in the microcde, if you do the conversion it will take a jar full of ticks in an op code or two.

I do not believe you can superscale FPU & CPU instructions. One pipeline or the other stalls. *edit* I am mistaken, I checked the Intel docs and they claim that the CPU & FPU _can operate in parralell. The doc I have is dated, and only contains info about the pentium pro, it has two interger and two floating point pipes, anyone know how many the PII & PIII have? *edit*

somethings wrong if 24bbp is copying 3bytes. They should be 32bit padded values. And should get smashed into place with one MMX call.

Most importantly:
Insist that textures be an even size & process two or four points inside each while (hW--) loop.

-Magmai Kai Holmlor

I am brazen and fear no flames.

Edited by - Magmai Kai Holmlor on July 7, 2000 2:41:52 PM

- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara

I'll show you mine, if you'll show me yours

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

I'll show you mine, if you'll show me yours

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines