i said:
"this would fill the array with zeros, but I would recommend using p, or even better, memcpy() [or CopyMemory, which is the same thing]"
the p is supposed to be p
i think
i absolutely cant stand how this board screws up my text
>:-[
there should be a switch for if you want plaintext or html
Incrementing Pointers
gaaahhh! it did it again!
it''s supposed to be p bracket i bracket
there
it''s supposed to be p bracket i bracket
there
adamm@san.rr.com
Hi again. I think I understand it now. All I wanted to do was make my alphablending code more efficent, so I''ll post the main loop before I changed it:
and the new version incrementing pointers:
if this is right, one more question. are while loops faster than for loops? Thanks in advance.
Visit our web site:
Asylum Entertainment
for (index_y = 0; index_y < height; index_y++) { for (index_x = 0; index_x < width; index_x++) { if((svidbuffer[index_x + sx + slPitch16 * (index_y + sy)] != colorkey)) { dindex = index_x + dx + dlPitch16 * (index_y + dy); sindex = index_x + sx + slPitch16 * (index_y + sy); sred = (((svidbuffer[sindex] >> 11) & 0x1F)); sgreen = (((svidbuffer[sindex] >> 5) & 0x3F)); sblue = ((svidbuffer[sindex]) & 0x1F); dred = (((dvidbuffer[dindex] >> 11) & 0x1F)); dgreen = (((dvidbuffer[dindex] >> 5) & 0x3F)); dblue = ((dvidbuffer[dindex]) & 0x1F); fred = ((l_Alpha[alpha][sred] - l_Alpha[alpha][dred])) + dred; // same as (alpha * sred - alpha * dred) + dred fgreen = ((l_Alpha[alpha][sgreen] - l_Alpha[alpha][dgreen])) + dgreen; fblue = ((l_Alpha[alpha][sblue] - l_Alpha[alpha][dblue])) + dblue; dvidbuffer[dindex] = RGB16(fred << 3,fgreen << 2,fblue << 3); // scale up the values } } }
and the new version incrementing pointers:
for (index_y = 0; index_y < height; index_y++) { for (index_x = 0; index_x < width; index_x++) { dindex = index_x + dx + dlPitch16 * (index_y + dy); sindex = index_x + sx + slPitch16 * (index_y + sy); if((*(USHORT*)svidbuffer != colorkey)) { sred = (((*(USHORT*)svidbuffer >> 11) & 0x1F)); sgreen = (((*(USHORT*)svidbuffer >> 5) & 0x3F)); sblue = (*(USHORT*)svidbuffer) & 0x1F); dred = (((*(USHORT*)dvidbuffer >> 11) & 0x1F)); dgreen = (((*(USHORT*)dvidbuffer >> 5) & 0x3F)); dblue = ((*(USHORT*)dvidbuffer) & 0x1F); fred = ((l_Alpha[alpha][sred] - l_Alpha[alpha][dred])) + dred; // same as (alpha * sred - alpha * dred) + dred fgreen = ((l_Alpha[alpha][sgreen] - l_Alpha[alpha][dgreen])) + dgreen; fblue = ((l_Alpha[alpha][sblue] - l_Alpha[alpha][dblue])) + dblue; *(USHORT*)dvidbuffer = RGB16(fred << 3,fgreen << 2,fblue << 3); // scale up the values } svidbuffer += sindex; dvidbuffer += dindex; } }
if this is right, one more question. are while loops faster than for loops? Thanks in advance.
Visit our web site:
Asylum Entertainment
My Geekcode: "GCS d s: a14 C++$ P+(++) L+ E-- W+++$ K- w++(+++) O---- M-- Y-- PGP- t XR- tv+ b++ DI+(+++) D- G e* h!"Decode my geekcode!Geekcode.com
Visit our web site:Asylum Entertainment
Visit our web site:Asylum Entertainment
That must not be right because it just crashes when I try to run the program. Is it the typecast I''m doing? I have to go now, I''ll look into it more later.
Visit our web site:
Asylum Entertainment
Visit our web site:
Asylum Entertainment
My Geekcode: "GCS d s: a14 C++$ P+(++) L+ E-- W+++$ K- w++(+++) O---- M-- Y-- PGP- t XR- tv+ b++ DI+(+++) D- G e* h!"Decode my geekcode!Geekcode.com
Visit our web site:Asylum Entertainment
Visit our web site:Asylum Entertainment
Hehe.. yeah I do spend a bit too long typing my posts
While loops arent intrinsically faster than for loops.
for(i=0;i<100;i++)
{
code;
}
is equivalent to this:
i=0;
while(i<100)
{
code;
i++;
}
-except- that if you declare a variable in the init part of the for loop, it goes out of scope after the loop:
for(int i=0;i<100;i++)
{
code;
}
is like this:
int i=0;
while(i<100)
{
code;
i++;
}
// i goes out of scope here
unfortunately, VC6 doesnt work this way, it uses the obsolete for scoping..
BUT.. if you are counting down to zero in a for loop it is faster than counting up to something.
For example:
for(i=100;i;i--);
is probably faster than
for(i=0;i<100;i++);
here''s the main part of my alpha blender (the 16-bit part anyway)
it''s fairly fast.. as long as you''re not using it for large parts of the screen on every frame, it''s perfectly fine..
It works great for alpha blending particles or special effects.. Or for making your health bar transparent. Or for one time, full-screen alpha blends, it''s alright too.
the lookup table is nice and saves all multiplies and divides
but what it really needs is a good assembly re-write... that would probably double the speed..
unfortunately, i have like 30k of alpha blending code :/
(all different kinds of alpha blends that handle all different bit depths.. etc..) and I have more important things to do!
I''m embarrassed by this code; it''s the worst looking code in the whole project!
It bears a comment warning all who might gaze upon it and be corrupted..
for(y=0;y {
for(x=0;x {
mul = amem[x] * 512 + 256; // *512 is done with a shift
swpixel = *((WORD*)smem+x);
sr = (swpixel >> 11), sg = (swpixel >> 5) & 0x3F, sb = swpixel & 0x1F;
dwpixel = *((WORD*)dmem+x);
dr = (dwpixel >> 11), dg = (dwpixel >> 5) & 0x3F, db = dwpixel & 0x1F;
r = dr+MulTable[mul+(sr-dr)];
g = dg+MulTable[mul+(sg-dg)];
b = db+MulTable[mul+(sb-db)];
*((WORD *)dmem+x) = (r<<11) / (g<<5) / b;
}
dmem += dpitch;
smem += spitch;
amem += apitch;
}
but yours wouldn''t work for a couple reasons..
mainly, you are not changing sVidBuffer. it is reading and writing the same pixel (the beginning of the line) for the entire width of the row.. and then moving to the next row.
So it would alpha blend only the first column of pixels in the rectangle you are working with.
next.. i warned you about calculating y*pitch+x for every pixel, which is the reason people told you to use a pointer in the first place, and that''s exactly what you''re doing
dindex = index_x + dx + dlPitch16 * (index_y + dy);
sindex = index_x + sx + slPitch16 * (index_y + sy);
that''s inside the main loop!
so it is recalculating sindex and dindex for every pixel.. that will KILL the speed!
I have to go, but try something like this instead:
BYTE *srcPtr = src.lpSurface + srcY*src.Pitch + srcX;
BYTE *destPtr = dest.lpSurface + destY*dest.Pitch + destX;
for(y=0;y{
for(x=0;x{
do alpha blend on *((WORD *)srcPtr + x) and *((WORD *)destPtr + x)
notice how I didn''t read the pixel over and over, I only read the pixel once and stored it in spixel. it''s faster to read spixel than to keep re-evaluating *((WORD *)srcPtr + x)
plus, the compiler could probably keep spixel in a register
}
srcPtr += srcPitch;
destPtr += destPitch;
}
While loops arent intrinsically faster than for loops.
for(i=0;i<100;i++)
{
code;
}
is equivalent to this:
i=0;
while(i<100)
{
code;
i++;
}
-except- that if you declare a variable in the init part of the for loop, it goes out of scope after the loop:
for(int i=0;i<100;i++)
{
code;
}
is like this:
int i=0;
while(i<100)
{
code;
i++;
}
// i goes out of scope here
unfortunately, VC6 doesnt work this way, it uses the obsolete for scoping..
BUT.. if you are counting down to zero in a for loop it is faster than counting up to something.
For example:
for(i=100;i;i--);
is probably faster than
for(i=0;i<100;i++);
here''s the main part of my alpha blender (the 16-bit part anyway)
it''s fairly fast.. as long as you''re not using it for large parts of the screen on every frame, it''s perfectly fine..
It works great for alpha blending particles or special effects.. Or for making your health bar transparent. Or for one time, full-screen alpha blends, it''s alright too.
the lookup table is nice and saves all multiplies and divides
but what it really needs is a good assembly re-write... that would probably double the speed..
unfortunately, i have like 30k of alpha blending code :/
(all different kinds of alpha blends that handle all different bit depths.. etc..) and I have more important things to do!
I''m embarrassed by this code; it''s the worst looking code in the whole project!
It bears a comment warning all who might gaze upon it and be corrupted..
for(y=0;y {
for(x=0;x {
mul = amem[x] * 512 + 256; // *512 is done with a shift
swpixel = *((WORD*)smem+x);
sr = (swpixel >> 11), sg = (swpixel >> 5) & 0x3F, sb = swpixel & 0x1F;
dwpixel = *((WORD*)dmem+x);
dr = (dwpixel >> 11), dg = (dwpixel >> 5) & 0x3F, db = dwpixel & 0x1F;
r = dr+MulTable[mul+(sr-dr)];
g = dg+MulTable[mul+(sg-dg)];
b = db+MulTable[mul+(sb-db)];
*((WORD *)dmem+x) = (r<<11) / (g<<5) / b;
}
dmem += dpitch;
smem += spitch;
amem += apitch;
}
but yours wouldn''t work for a couple reasons..
mainly, you are not changing sVidBuffer. it is reading and writing the same pixel (the beginning of the line) for the entire width of the row.. and then moving to the next row.
So it would alpha blend only the first column of pixels in the rectangle you are working with.
next.. i warned you about calculating y*pitch+x for every pixel, which is the reason people told you to use a pointer in the first place, and that''s exactly what you''re doing
dindex = index_x + dx + dlPitch16 * (index_y + dy);
sindex = index_x + sx + slPitch16 * (index_y + sy);
that''s inside the main loop!
so it is recalculating sindex and dindex for every pixel.. that will KILL the speed!
I have to go, but try something like this instead:
BYTE *srcPtr = src.lpSurface + srcY*src.Pitch + srcX;
BYTE *destPtr = dest.lpSurface + destY*dest.Pitch + destX;
for(y=0;y{
for(x=0;x{
do alpha blend on *((WORD *)srcPtr + x) and *((WORD *)destPtr + x)
notice how I didn''t read the pixel over and over, I only read the pixel once and stored it in spixel. it''s faster to read spixel than to keep re-evaluating *((WORD *)srcPtr + x)
plus, the compiler could probably keep spixel in a register
}
srcPtr += srcPitch;
destPtr += destPitch;
}
adamm@san.rr.com
d''oh!
stupid board!
all those things like this:
for(y=0;y{
for(x=0;x{
were supposed to be like:
for(y=0;y<_hei;y++)
{
for(x=0;x<_wid;x++)
except without the _
[and it''d better work time!]
stupid board!
all those things like this:
for(y=0;y{
for(x=0;x{
were supposed to be like:
for(y=0;y<_hei;y++)
{
for(x=0;x<_wid;x++)
except without the _
[and it''d better work time!]
adamm@san.rr.com
Ok, I think I get it now, but I rewrote the function, and it no longer crashes, but it doesn''t work properly. It looks like the pitch is off or something, so here''s the whole function:
and here''s the call to it:
I must be doing something wrong, but it seems to me like I''ve done what you have suggested (but maybe I still don''t get it, I don''t know). Thanks for all your help.
Visit our web site:
Asylum Entertainment
void Blend16(int sx,int sy, int width, int height, int dx, int dy, int alpha, int slPitch16, int dlPitch16, USHORT colorkey, USHORT* svidbuffer, USHORT* dvidbuffer, int pixel_format) { // To Do: Rewrite this function to work with 5.5.5 cards int index_x, index_y; int dindex, sindex; UCHAR sred,sgreen,sblue,dred,dgreen,dblue,fred,fgreen,fblue; dindex = dx + dlPitch16 * dy; sindex = sx + slPitch16 * sy; svidbuffer += sindex; dvidbuffer += dindex; for (index_y = 0; index_y < height; index_y++) { for (index_x = 0; index_x < width; index_x++) { if((*(dvidbuffer + index_x + dx) != colorkey)) { sred = (*(svidbuffer + index_x + dx) >> 11) & 0x1F; sgreen = (*(svidbuffer + index_x + dx) >> 5) & 0x3F; sblue = *(svidbuffer + index_x + dx) & 0x1F; dred = (*(dvidbuffer + index_x + dx) >> 11) & 0x1F; dgreen = (*(dvidbuffer + index_x + dx) >> 5) & 0x3F; dblue = *(dvidbuffer + index_x + dx) & 0x1F; fred = ((l_Alpha[alpha][sred] - l_Alpha[alpha][dred])) + dred; // same as (alpha * sred - alpha * dred) + dred fgreen = ((l_Alpha[alpha][sgreen] - l_Alpha[alpha][dgreen])) + dgreen; fblue = ((l_Alpha[alpha][sblue] - l_Alpha[alpha][dblue])) + dblue; *(dvidbuffer + index_x + dx) = RGB16(fred << 3,fgreen << 2,fblue << 3); // scale up the values } } svidbuffer += slPitch16; dvidbuffer += dlPitch16; } }
and here''s the call to it:
// lock dest surface (lpddswork) and source surface (lpddslogo) dvidbuffer = Lock_Surface16(lpddswork, &dpitch); svidbuffer = Lock_Surface16(lpddslogo, &spitch); // draw logo with alpha transparency Blend16(0,0,600,200,0,0,logofade, spitch >> 1, dpitch >> 1, RGB16(0,0,0), svidbuffer, dvidbuffer, g_pixformat); Unlock_Surface(lpddslogo); Unlock_Surface(lpddswork);
I must be doing something wrong, but it seems to me like I''ve done what you have suggested (but maybe I still don''t get it, I don''t know). Thanks for all your help.
Visit our web site:
Asylum Entertainment
My Geekcode: "GCS d s: a14 C++$ P+(++) L+ E-- W+++$ K- w++(+++) O---- M-- Y-- PGP- t XR- tv+ b++ DI+(+++) D- G e* h!"Decode my geekcode!Geekcode.com
Visit our web site:Asylum Entertainment
Visit our web site:Asylum Entertainment
I dont know if these are all the problems, but...
if((*(dvidbuffer + index_x + dx) != colorkey))
you already added dx to the vid buffer, now you''re adding it again!
[you do it on all the other lines as well]
(in the call)
spitch >> 1, dpitch >> 1
you cant safely assume that the pitch will be divisible by two.. but it generally always is (unless the surface width is odd and it''s created in system memory)
and this isnt a bug, but more of a problem with accessing surfaces directly.. directly writing and especially reading surfaces in video memory is EXTREMELY slow on almost every graphics card out there.. so you would definately only want to do alpha blending if both surfaces are in system memory.. unless if you are doing the alpha blend only once or something
plus it would be faster if you only read from the surfaces once, and stored the pixel in variables, like spixel and dpixel
fred = ((l_Alpha[alpha][sred] - l_Alpha[alpha][dred])) + dred;
also, it looks like you have a 2 dimensional lookup table.. and this means it takes an extra multiply and an add per lookup
so you''re adding 6 multiplies and 6 adds per pixel there
(or 6 shifts and adds per pixel, depending on the the size of the table..)
you should use a 1 dimensional lookup table and do the indexing yourself
if((*(dvidbuffer + index_x + dx) != colorkey))
you already added dx to the vid buffer, now you''re adding it again!
[you do it on all the other lines as well]
(in the call)
spitch >> 1, dpitch >> 1
you cant safely assume that the pitch will be divisible by two.. but it generally always is (unless the surface width is odd and it''s created in system memory)
and this isnt a bug, but more of a problem with accessing surfaces directly.. directly writing and especially reading surfaces in video memory is EXTREMELY slow on almost every graphics card out there.. so you would definately only want to do alpha blending if both surfaces are in system memory.. unless if you are doing the alpha blend only once or something
plus it would be faster if you only read from the surfaces once, and stored the pixel in variables, like spixel and dpixel
fred = ((l_Alpha[alpha][sred] - l_Alpha[alpha][dred])) + dred;
also, it looks like you have a 2 dimensional lookup table.. and this means it takes an extra multiply and an add per lookup
so you''re adding 6 multiplies and 6 adds per pixel there
(or 6 shifts and adds per pixel, depending on the the size of the table..)
you should use a 1 dimensional lookup table and do the indexing yourself
adamm@san.rr.com
quote: Original post by adammil
Hehe.. yeah I do spend a bit too long typing my posts
While loops arent intrinsically faster than for loops.
You may find that a do...while loop can be minimally faster though, since the check is after the loop rather than before it, saving a couple of jumps at the end. Obviously not all loops are appropriate to use do...while with, and the gain will be pretty negligible. You''d only really need to consider this in tight inner loops with 1 or 2 instructions inside them.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement