Advertisement

How to hide a lens flare behind mountains ?

Started by February 11, 2002 03:32 AM
41 comments, last by Bestel 23 years ago
Many people seem to be under the (false) assumption that depth reads are slow. They are *NOT*. Even on old cards. OK, you shouldn''t read back your entire screen, as this would be (very) slow. But we''re talking about a *single* pixel ! Or even if you get an area to fade flares away, take a 4*4, that''s 16 pixel, 64 *byte* data ! That''s *nothing* ! Using a line of sight tracer to determine flare visibility is a total waste of CPU cycles, that could be better spent in some more important stuff.

> and on very old video cards depth reading is VERY slow.

OK, we agree on the fact, that the Voodoo2 is old. If I read back the entire screen depthbuffer on it, that''s 800*600*2 = 960000 pixel *per frame*, I get around 12 fps only reading. Means, I can read 11.5 MPixel per second. Now you are reading a *single* one: this will take 1/11,500,000 of a second ! (Plus some function overhead). Where is the problem ? And that''s on PCI. Take this number * 50 on a GF3.

> the agp bus is designed for sending data, not recieving it.

How do video capture cards work then ?
Thanks for all these answers.

It was very helpful.
I''ve resolved my problem with depth test, so now, all works perfectly.

Maybe, when I will implement fade flare, I will give you a link to download it if you are interested in watch the result .


But, nobody has answered if it was better to use gluProject or compute my own projection ?
Advertisement
Just use gluProject, it''s easier. You could probably get a little more speed by doing the transformation/projection yourself, but that''s not really going to be a bottleneck anyway. And if it should become one, you can still write your own function later on.
>>OK, we agree on the fact, that the Voodoo2 is old. If I read back the entire screen depthbuffer on it, that''s 800*600*2 = 960000 pixel *per frame*, I get around 12 fps only reading. Means, I can read 11.5 MPixel per second. Now you are reading a *single* one: this will take 1/11,500,000 of a second ! (Plus some function overhead). Where is the problem ? And that''s on PCI. Take this number * 50 on a GF3.

> the agp bus is designed for sending data, not recieving it.

How do video capture cards work then ?<<

acutually pci can be quicker than agp with read pixels, agp is a step backwards in this reguard.
*50 for a gf3? im guessing 3x if your lucky.
>>Now you are reading a *single* one: this will take 1/11,500,000 of a second ! (Plus some function overhead)<<

wtf

http://uk.geocities.com/sloppyturds/gotterdammerung.html
gluProject/Unproject are a bit slow, but for testing on a single pixel they''re good enough.
And considering it''s the easiest way, then go for it !
If later you really feel that it''s too slow you''ll reconsider that.
But for now there''s no need to waste more time on it.
Just add it in your TO-DO list of optimizations.

About the off-screen collision detection, I do agree. But who cares of a sun which is outside the screen ? Unless you develop a highly-realistic renderer, you don''t need to know that information.
> >>Now you are reading a *single* one: this will take 1/11,500,000 of a second ! (Plus some function overhead)<<
> wtf

wtf what ? Oh you are right, it is even faster.

I just did some precise RDTSC timings on an old Voodoo2. This board is notorious for being incredibly slow when it comes to readbacks. And _not_ because of bus bottlenecks, but because of it''s onboard memory access design.

OK, reading back a single depth value anywhere on the screen takes: 103 cycles. On an 1.4Ghz Athlon, that is approx. 1/13,592,233 seconds. And that *includes* function overhead. Hey, that''s even faster than I thought.

I''ll do similar timings on my GF3, as soon as I''m home.

Now try to do a line of sight tracer, that calculates the flare visibility in 103 cycles... Have fun.
Advertisement
u better check your code

heres results from my gf2mx200 agp
http://uk.geocities.com/sloppyturds/readpixels_RGBA_32.png
each horizontal bar is 1million pix/sec
each vertical line is an increase or 50pixels eg first readpixels of 1x1 then 2x2 then 3x3 -> 1100x1100 (ignore results after ±800x800, where it starts linearly going up)

looking here u will see that reading a single pixel gives u ±0.01 million pix a second compared to a top of ±128x128 of about 11.5million pixels a second

btw these are 32 RGBA values + not depth values, i can run through depth values if u want but its not gonna make that much difference


http://uk.geocities.com/sloppyturds/gotterdammerung.html
Well, what can I say, that''s the code I use (BTW, I also tried reading RGBA values, but depth values give the same result):

cl = rdtsc();
glReadPixels(xp, yp, 1, 1, GL_RGBA, GL_UNSIGNED_BYTE, data);
cl = rdtsc() - cl;

cl is 103 cycles on a Voodoo2. I retimed the whole thing in my engine, where I use it for flare/glow visibility. With 17 glows in the view (17 1-pixel tests), those make up 2071 cycles. That''s roughly 122 cycles per flare. A bit slower, but might be an alignment problem with my stack variables.

I haven''t tested it on my GF3 yet, I''ll do that this evening.
Perhaps you should review your code.
ok...

I''ve run some ''real world'' test on this to show how careful you need to be with glReadPixels, and why, in the long run, a line of sight test is always going to be better.


The main reason I''ve been stressing that glReadPixels is a bad idea, is due to the way modern renderers work. I had decided not to explain my reasons behind this, until I had tested it out.

I have now.

ok, one very common misconception about renderering, is that when you call glDrawElements, or what not, after the call, the drawing is done.. This is not the case. Due to the highly advanced programming that goes into video card drivers, The video card will be working at 100% while the cpu is left to continue with other matters... while the video card is still drawing
this means, on a good T&L chipset, if all you do is DrawElements calls, the cpu will remain mostly idle while the GPU does all the work.

How does this relate?

well, because of this, it becomes EXTREMLY important WHERE you put your glReadPixels call. If you put it in at all. The reason for this, is that for glReadPixels to work, the image must be drawn, and if the video card is still drawing, then the cpu MUST wait for it to finish.
So, if you put the read call in just after you have finished drawing, any future cpu work will likly have a hit on the frame rate because all that time renderering, the cpu was sitting idle.

I decided to test this in my project (the one with the vehicle)

to do this, I ramped up the physics frequency as high as I could get it without having major (less than 5fps) effect on the frame rate.
I stopped at 4000hz, where the frame rate went down from 85fps to 80 at 1152x864x32. showing that the cpu and video card were both being worked at around 100%.

Next, I put in a single glReadPixels call after the object manager was called to render the scene. reading a single pixel.

The results were amazing.

the new frame rate ranged between 13 and 24 fps.

quite a hit.

This is why you really need to be carful with glReadPixels, and a line of sight test is infinitly better.

to optimize this in a high level engine is basically impossible wihtout getting into per-object rendering code. which is exactly what a high level engine is designed not to need.
>>cl = rdtsc();
glReadPixels(xp, yp, 1, 1, GL_RGBA, GL_UNSIGNED_BYTE, data);
cl = rdtsc() - cl;
cl is 103 cycles on a Voodoo2<<

i tried the same code and got ±40000 cycles but that beside the point.

following your code timing, A + B would give about the same results speedwise clearly aint right

A/
for (i=0; i<10000; i++ )
glReadPixels(1,1)

B/
glReadPixels(100,100)

btw RipTorn made a very valid point

http://uk.geocities.com/sloppyturds/gotterdammerung.html

This topic is closed to new replies.

Advertisement