Advertisement

How to hide a lens flare behind mountains ?

Started by February 11, 2002 03:32 AM
41 comments, last by Bestel 23 years ago
I don''t want to disrupt you in your debate, but, I''ve tried to use glReadPixel because it looked easier to implement.

that has not had a big effect on my framerate. But, if I want to use a line of sight in the future... I understand how to do that with a terrain, but if there is a lot of object (buildings, vehicule, etc...) It will be difficult to implement that, no ?
Right, there is no easiest way than glReadPixels.

I agree about RipTorn''s warning, although I disagree on its importance.
It''s a very special case. You CPU is never used around 100% simply because it would be (almost) impossible to maintain a framerate in such critical case.
Advertisement
quote:

following your code timing, A + B would give about the same results speedwise clearly aint right

A/
for (i=0; i<10000; i++ )
glReadPixels(1,1)

B/
glReadPixels(100,100)



This is a very naive assumption. It wouldn''t. First off, B will always be faster, since you don''t have the function call overhead. Second, you have a framebuffer read cache on modern 3D cards (GF like), so if you get a single texel, you will fillup the cache, and get the next few readtexels for free because of cache granularity. B is alot faster on GF, comparing to a Voodoo. A fullscreen readback of the depth buffer (using the special NV_packed_stencil_depth 24-8 mode) is blazing fast. It is not on the Voodoo, because it will lock up the output queue of the GPU. On both cards, reading access has to share RAM reading resources with the GPU and the DAC. A single texel read can be easily inserted into the read stream, there is practically *no* overhead. The chance, that the GPU or the DAC will access this particular RAM address exactly at the time you read it back, is very small. A block read, on the other hand, will almost certainly conflict with other read accesses and will produce stalls. That''s why it is slower.

Riptorn has a valid point, if we''re talking about simply Quake like scenes. 5-10k/tri''s. But try to do a fast realtime line of sight tracer with 500k/tri scenes. And a ReadPixel call will not always flush the GPU FIFO. Modern cards (GF3+) are very good at sorting out screen areas, on which modification occur. And even if your card has to flush the pipe, if you do your readback near the end of the frame (near a buffer swap), there won''t be much harm done. And if you are using VAR/fence to synchronize CPU/GPU in parallel while rendering your scene, then problems are solved, since you can be pretty sure that the FIFO will be almost empty when you come to your readback.

I would definitely go for the glReadPixel. Line of sight is very useful, when it comes to AI and such. But it shouldn''t be used in the renderer.

> i tried the same code and got ±40000 cycles but that beside the point.
I tried it on GF3. 177 cycles for a single packed depth/stencil value. And as I said, I extensively use that in my engine, it has been profiled from top to bottom hundreds of times. It never was an issue.

> It will be difficult to implement that, no ?

The principle is the same, although there are some optimizations you can do on a heightmap based terrain. No, it''s not going to be easy to get it fast, you''ll need some kind of subdivision structure, eg. an octree. It''s the principle of a software-only raytracer. But it all depends on the general complexity of your scene. If it''s only 5k faces or so, it''s trivial. But if you have more, and faces are very dense, intersecting and close to eachother, then this can be a very hard task. I did an LOS algorithm a few months ago, to calculate soft shadows by multisampling of area lights. It had to operate on a 2M face scene. It was just unbearable, even with complex acceleration structures, it was just plain slow. I finished up by using the pixel readback, although in a different way (hemicube).
I would say it''s fairly important, what I said.
as keeping both the processor and video card as busy as possible is the only way you can get the highest possible frame rate... if the cpu is''t very busy, then maybe it could be used to do some occlusion culling, or if it is too busy, maybe then it''s too accurate.

and when you factor physics, ai, scene management, etc, into a game, all things the video card can''t help, the cpu will often be the bottleneck.
This isn''t the case with demos, because they are usually designed to do one thing, and one thing only.

4000hz may sound very excessive, and, for one car, it is... as I usually run it at 60.. but what if I were to add 30 cars? and all the other objects needed in a game?
In a situation like this, a single glReadPixels call could be the difference between 80fps and 20.
quote:

I would say it''s fairly important, what I said.
as keeping both the processor and video card as busy as possible is the only way you can get the highest possible frame rate... if the cpu is''t very busy, then maybe it could be used to do some occlusion culling, or if it is too busy, maybe then it''s too accurate.

and when you factor physics, ai, scene management, etc, into a game, all things the video card can''t help, the cpu will often be the bottleneck.
This isn''t the case with demos, because they are usually designed to do one thing, and one thing only.

4000hz may sound very excessive, and, for one car, it is... as I usually run it at 60.. but what if I were to add 30 cars? and all the other objects needed in a game?



I fully agree on all of that.

quote:

In a situation like this, a single glReadPixels call could be the difference between 80fps and 20.


Nope. As I stated in my post above, if you position it right in your code, it won''t do any harm at all.

Advertisement

dont get me wrong im a fan of readpixels (see my post in c.g.a.o from a couple of weeks ago bitching about the gf2mx being slower at them compared to my vanta). one huge benifit of readpixels is when u use alpha polygons etc plus the simplicty factor

assuming that AP(AH) is the same person throughout

>>A/ for (i=0; i<10000; i++ ) glReadPixels(1,1)
B/
glReadPixels(100,100)
This is a very naive assumption. It wouldn''t. First off, B will always be faster,<<

yet

>>OK, we agree on the fact, that the Voodoo2 is old. If I read back the entire screen depthbuffer on it, that''s 800*600*2 = 960000 pixel *per frame*, I get around 12 fps only reading. Means, I can read 11.5 MPixel per second. Now you are reading a *single* one: this will take 1/11,500,000 of a second ! (Plus some function overhead). Where is the problem ? And that''s on PCI. Take this number * 50 on a GF3.<<

>>OK, reading back a single depth value anywhere on the screen takes: 103 cycles. On an 1.4Ghz Athlon, that is approx. 1/13,592,233 seconds. And that *includes* function overhead. Hey, that''s even faster than I thought.<<

800x600 = 11.5mps
1x1 = 13.5mps

according to your tests 1x1 is accutually quicker than 800x600. mate somethings screwed up.

heres the results i got from SPECglperf (THE opengl benchmarking program, very nice)

readpixels(100,100) = 10,500,000 pixels a second
readpixels(1,1) = 18,300 pixels a second (feels a bit low i believe i get around 100,000 hmmm)

SPECglperf™ is a tool developed to measure performance of OpenGL on a given system. It enables you to explicitly set the state of OpenGL and the kind of data you will be sending down. My foremost goal was to provide flexibility in measuring performance over a wide range of states and scenarios. You should use this tool as you would a magnifying glass -- to closely inspect performance in specific sections of OpenGL.

18300 ReadPixel images per second
Test Type ReadPixelsTest
GLperf Version 3.1.2
OpenGL Renderer GeForce2 MX/AGP
OpenGL Client Vendor NVIDIA CORPORATION
OpenGL Client Version 1.3.0
Image Format GL_RGBA
Image Type GL_UNSIGNED_BYTE
Image Width 1
Image Height 1
Width of ReadPixels 1
Height of ReadPixels 1

1050 ReadPixel images per second
Test Type ReadPixelsTest
GLperf Version 3.1.2
OpenGL Renderer GeForce2 MX/AGP
OpenGL Client Vendor NVIDIA CORPORATION
OpenGL Client Version 1.3.0
Image Format GL_RGBA
Image Type GL_UNSIGNED_BYTE
Image Width 100
Image Height 100
Width of ReadPixels 100
Height of ReadPixels 100


do u give up or will i be forced to say ni again

http://uk.geocities.com/sloppyturds/gotterdammerung.html
> assuming that AP(AH) is the same person throughout

Absolutely.

> according to your tests 1x1 is accutually quicker than 800x600. mate somethings screwed up.

Everything is OK. Read my post above. The Voodoo2 is very slow when it comes to pixel reads, that makes it a good benchmarking candidate for this kind of tests. 1x1 pixel reads *are* faster, simply because they have minimal interference with access resources and the DAC/GPU. So, consecutive 1 pixel reads, measured over a high number of frames, will be faster than a block read on those old cards. Things are a bit different on modern cards, since they have optimized paths for block transfers to/from the framebuffer.

> readpixels(100,100) = 10,500,000 pixels a second

Seems OK.

> readpixels(1,1) = 18,300 pixels a second (feels a bit low i believe i get around 100,000 hmmm)
Definitely not OK. There''s a prob with your config. That''s *absolutely* not normal. I will try SPECglperf on my GF3 and post the results.
> do u give up or will i be forced to say ni again

Damn, I hate that movie

But seriously: no. I know my engine, I know my hardware, I know my bottlenecks. ReadPixel is none.

quote:
Original post by Anonymous Poster

Nope. As I stated in my post above, if you position it right in your code, it won''t do any harm at all.




I mentioned this.
But I also mentioned that this is utterly impractical in a well designed OOP based engine. As you basically have to break all the design rules of the engine.

And you cannot redesign an engine so that half way through rendering an object (a lens flare), it will stop to let all the physics, gameplay cpu stuff go ahead.

The only thing you could do is to put the rendering on a seperate thread, but this is not a good thing, as you may well get overlap of updated, and non-updated data, which would lead to a lot of potential crash situations when objects are being destroyed.

as I said, it''s ok for demos, but not in larger projects.
quote:
But I also mentioned that this is utterly impractical in a well designed OOP based engine. As you basically have to break all the design rules of the engine.


I agree that it''s not easy to implement this feature in an engine that is already defined.
But assuming that you _specify_ this feature soon enough, you''ll base your engine around that (not only around that, but it will be one the features you want to implement in your engine) and everything will be ok.

How do you qualify a "well designed OOP based engine" ?
As far as I know, there''s no genereic way of designing an engine. There''s no other rule that the rules you define yourself for the applications you look for.

This topic is closed to new replies.

Advertisement