But the software blit is faster on small images due to almost zero overhead...
-Dimonic-
Optimization Question
Well to that I''d reply - if you''re doing lots of small blits, you''re probably doin something wrong ;-)
#pragma DWIM // Do What I Mean!
~ Mad Keith ~
**I use Software Mode**
#pragma DWIM // Do What I Mean!
~ Mad Keith ~
**I use Software Mode**
It's only funny 'till someone gets hurt.And then it's just hilarious.Unless it's you.
Hmm, particle system where particles are very small bitmaps,
is an example. Any hardware blitter will die doing this blits.
is an example. Any hardware blitter will die doing this blits.
-dimonic-
> That algorithm is O(n^2), because you have a loop nested > within another loop. Very slow.
>
> Usually you''d do something like this:
[a severely misguided piece of code removed]
Uh, what exactly do you think memcpy() does? Last time I checked, it copied memory in a loop.
> If by the best way, you mean the fastest, you are wrong.
> Compiled sprites still whoop the crap out of blits.
> Basically, what you do is write a program that takes your
> sprites and turns them into code. Then you just call that
> function to display them. Technically, it runs in O(1)
> time.
I really hope that one day, the overwhelming confusion that seems to reign in the game development field regarding the Big-O notation will dissipate and people will stop throwing around sentences like "Technically, it runs in O(insert your favorite low order term here) time" without having absolutely any idea what they''re saying.
>Hmm, particle system where particles are very small bitmaps,
>is an example. Any hardware blitter will die doing this blits.
Have you actually tried comparing 16x16 alpha hardware blit to software compiled sprite blit or are you just talking out of your ass?
Sorry, but the amount of blatant misinformation distributed on the gamedev.net forums is really starting to get to me.
-goltrpoat
--
Float like a butterfly, bite like a crocodile.
>
> Usually you''d do something like this:
[a severely misguided piece of code removed]
Uh, what exactly do you think memcpy() does? Last time I checked, it copied memory in a loop.
> If by the best way, you mean the fastest, you are wrong.
> Compiled sprites still whoop the crap out of blits.
> Basically, what you do is write a program that takes your
> sprites and turns them into code. Then you just call that
> function to display them. Technically, it runs in O(1)
> time.
I really hope that one day, the overwhelming confusion that seems to reign in the game development field regarding the Big-O notation will dissipate and people will stop throwing around sentences like "Technically, it runs in O(insert your favorite low order term here) time" without having absolutely any idea what they''re saying.
>Hmm, particle system where particles are very small bitmaps,
>is an example. Any hardware blitter will die doing this blits.
Have you actually tried comparing 16x16 alpha hardware blit to software compiled sprite blit or are you just talking out of your ass?
Sorry, but the amount of blatant misinformation distributed on the gamedev.net forums is really starting to get to me.
-goltrpoat
--
Float like a butterfly, bite like a crocodile.
--Float like a butterfly, bite like a crocodile.
April 25, 2000 01:47 PM
First, the bottom line: To really do a lot better (in software) you will need to use MMX assembler.
A clarification about big-O notation and blitting.
ALL blitting -- ALL OF IT (hardware, software, compiled sprite -- whatever) is O(n), where n is the number of destination pixels.
Remember -- O(3n) IS O(n) -- they are the same.
I wouldn''t lie to you.
Compiled sprites are not O(constant) -- they are O(n). Bigger sprites mean more instructions and bigger loops, proportional to n.
But the point isn''t algorithmic complexity. The point is -- what optimizations can we make to the O(n) algorithm?
The fellows who mentioned the hardware acceleration are correct. You cannot beat a hardware accelerated asynchronous blit. It cannot be done. It cannot. The difference is a only a few CPU instructions (to get the GPU in motion) versus several hundred (for the CPU to micromanage it). You must also pay the price of sending the final product across the bus to the graphics card, if indeed you do not have to pay it twice -- first to get it, then push it back.
If you are dealing with an environment when you MUST write your own blitter and are not swayed by all the reasons above. Here are some changes you might make to your code:
In for loops, always count down to 0. You save some instructions this way.
You are strangling your CPU when you grab 16-bits at a time. The fellow who posted the memcpy was right about it being faster (although the algorithmic complexity is still the same). A smart compiler will make 32-bit moves at the very least -- 64-bit moves (MMX) if your computer and compiler can swing it.
To implement the suggestion of the fellow who mentioned the if business, I suggest MMX if you are developing on an Intel platform. www.intel.com (look through the developer stuff) has a very nice document on MMX optimization that I think it fairly readable if you know a decent amount of assembler.
Wayne Miller
CS Graduate Student
University of Kansas
A clarification about big-O notation and blitting.
ALL blitting -- ALL OF IT (hardware, software, compiled sprite -- whatever) is O(n), where n is the number of destination pixels.
Remember -- O(3n) IS O(n) -- they are the same.
I wouldn''t lie to you.
Compiled sprites are not O(constant) -- they are O(n). Bigger sprites mean more instructions and bigger loops, proportional to n.
But the point isn''t algorithmic complexity. The point is -- what optimizations can we make to the O(n) algorithm?
The fellows who mentioned the hardware acceleration are correct. You cannot beat a hardware accelerated asynchronous blit. It cannot be done. It cannot. The difference is a only a few CPU instructions (to get the GPU in motion) versus several hundred (for the CPU to micromanage it). You must also pay the price of sending the final product across the bus to the graphics card, if indeed you do not have to pay it twice -- first to get it, then push it back.
If you are dealing with an environment when you MUST write your own blitter and are not swayed by all the reasons above. Here are some changes you might make to your code:
In for loops, always count down to 0. You save some instructions this way.
You are strangling your CPU when you grab 16-bits at a time. The fellow who posted the memcpy was right about it being faster (although the algorithmic complexity is still the same). A smart compiler will make 32-bit moves at the very least -- 64-bit moves (MMX) if your computer and compiler can swing it.
To implement the suggestion of the fellow who mentioned the if business, I suggest MMX if you are developing on an Intel platform. www.intel.com (look through the developer stuff) has a very nice document on MMX optimization that I think it fairly readable if you know a decent amount of assembler.
Wayne Miller
CS Graduate Student
University of Kansas
wayne,
good to finally see someone with a clue here.
-goltrpoat
--
Float like a butterfly, bite like a crocodile.
good to finally see someone with a clue here.
-goltrpoat
--
Float like a butterfly, bite like a crocodile.
--Float like a butterfly, bite like a crocodile.
oh, also, depending on your compiler's optimization abilities, it is usually better to increment a pointer instead of an index to the array on x86 processors, i.e.
instead of
for (x = 0; x < n; x++) { data[x] = blah; }
it's faster to do
pdata = data; for (x = 0; x < n; x++, pdata++) { *pdata = blah; }
-goltrpoat
--
Float like a butterfly, bite like a crocodile.
Edited by - goltrpoat on 4/25/00 2:09:27 PM
instead of
for (x = 0; x < n; x++) { data[x] = blah; }
it's faster to do
pdata = data; for (x = 0; x < n; x++, pdata++) { *pdata = blah; }
-goltrpoat
--
Float like a butterfly, bite like a crocodile.
Edited by - goltrpoat on 4/25/00 2:09:27 PM
--Float like a butterfly, bite like a crocodile.
April 25, 2000 03:03 PM
Faster still :
pdata = data; for (x = n-1; x >= 0; x--, pdata++) { *pdata = blah; }
Wayne Miller
pdata = data; for (x = n-1; x >= 0; x--, pdata++) { *pdata = blah; }
Wayne Miller
goltrpoat - It''s ironic how it is more worth your time to bitch about ''blatent misinformation'' than actually correct anything someone may have said wrong. At least AP Wayne was constructive. I was under the impression (and I may be wrong) that you need a loop for 0(n) time; compiled sprites (at least the way I would do them) would have no loops. Please correct me if I am wrong, not go into some hissy-fit about blatent misinformation.
Mike
Mike
"Unintentional death of one civilian by the US is a tragedy; intentional slaughter of a million by Saddam - a statistic." - Unknown
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement