Advertisement

Bump mapping & modern games.

Started by July 02, 2002 11:18 PM
15 comments, last by Sergey Eliseev 22 years, 7 months ago
quote:
That''s fake EMBM, it uses a simple x/y perpixel offset into the envmap. This can be done on < GF3, but as you noticed, it''s very slow, since it involves a framebuffer readback. It only works on planar surfaces, but looks pretty good if you respect the limitations.

Real EMBM (much better quality and robust on arbitary geometry) needs a perpixel matrix multiply and reflection vector computation. That''s GF3+ only. IIRC, there is a demo somewhere on nVidia''s site that shows the difference between both methods


it just sounds like
"easy, its just a fake embm, so its easy to write a softwarefallback on gf2, but you know, the REAL embm does so much more, a fallback for this is impossible, thats gf3+ only, but the simple x/y perpixeloffset, thats easy.."

oh, and btw, this demo runs smooth on my gf2mx..

what i wanted to say its not important if its now a fake 2d embm or a "real" embm with matrixtransform and reflection is not really much of a topic as its software anyways. and yes the registercombiners are capable of calculating a reflectionvector..

as i say, i''m nitpicking at the moment, but clarifieing stuff is sometimes important

why it looks fake:
lowcolorres. think about it, the whole math is with 8bit per component vectors.. that means the reflected vector can only hit exactly a pixel on the cubemap, nothing between.. GL_LINEAR cannot be used. thats why nvidia invented the HILO texture format for normals, to calculate the reflection at higher precision. but else, you can rotate the whole cubeenvmap and you see that it gets reflected, and well.. what more can you want? you always need a vector for the lookup in a cubemap. so it needs to be the correctly reflected vector..
nvidia demos on gf3 hw don''t look different except more accurate.. thats all..
its not softwarerendering, they don''t use software at all, except that they plot individual pixels with glBegin(GL_POINTS); if you call that software..
its not rendering triangles anymore, but its hw.. and the only way for gf2mx to emulate texture-shader effects in hw. its btw MUCH faster than the softwareemulator from nvidia for the texture-shaders. THAT one you can feel that its software.. 10sec a frame of such a size.. that demo runs at about 20-30fps.. it uses as much hw as possible (not entierly true, if gl would not hide the data from us, it could be done much faster..)

how it could be done much faster?
render it into a buffer (the reflected vectors wich are used directly for the lookup), lock this buffer to get the pointer to the data. that data there in is for example rgba with unsigned chars as components. bind this with glTexCoordArray to your array, wich consists of glVertexArray and glTexCoordArray . the vertexarray stores for each vertex (its 256x256 array for a 256x256 screen) the screenpos, the texcoordarray then simply has the lookupvector for the cubemap). draw the array as GL_POINTS.
it could be done quite fast, without any bustransfers and all, fully in hw. i bet we could have 50+ fps then.. (sure, it depends on res, but watching gf3 or gf4 demos smooth on a gf2 on 320x240 would be cool nontheless (playing doom3 yeah! ))

"take a look around" - limp bizkit
www.google.com
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

To make it short: It is software rendering. As soon as a rendering pass requires the CPU to touch individual pixels , then this pass is SW rendering. The fact that the HW performs a cubemap lookup after that is irrelevant. The whole idea of HW rendering is that the host CPU will never see fragments, only geometric specifications. That's not the case here. It's just as eg. applying a filtering pass on the final image using the CPU (if your 3D card doesn't support the imaging subset): then this particular pass is SW.

Any HW/SW combination on the final framebuffer is evil and should be avoided at all costs, esp. if high amounts of pixels are concerned. It requires a framebuffer readback, which stalls the command fifo, floods the AGP bus (in the 'wrong' direction), and creates a 'bubble' in the execution pipe.

quote:

draw the array as GL_POINTS.
it could be done quite fast, without any bustransfers and all, fully in hw. i bet we could have 50+ fps then.. (sure, it depends on res, but watching gf3 or gf4 demos smooth on a gf2 on 320x240 would be cool nontheless (playing doom3 yeah! ))


That's what texture shaders are there for on GF3+. Don't forget that GL_POINTS will stress the goemetry pipeline. Many cards will internally process gl_points as singular area polygons, since they don't have dedicated HW for individual points. Eg. the GF2 treats a point just as a triamgle (although I think the surface interpolators are switched off, but I'm not sure, those infos are nVidias trade secret). For a 1024x768 screen, that would make 786432 faces per frame . In addition to your normal scene. Not good.

/ Yann

[edited by - Yann L on July 6, 2002 8:25:54 AM]
Advertisement
i dont call it softwarerendering as there is no softwarerastericer involved, but thats just mather of taste. you read them back and send them directly up, no stuff is done in software except this. drivers often readback data or send data up.. that does not affect hw.
you can do it without stall with your own renderingthread wich can wait for the readback (its often missunderstood that the agp backsending of data is slow, or that it stops the gpu working, or what ever.. its just the cpu is in idle while gpu is not yet finished with drawing.. so do a second drawing thread on your own..).
yes, gf3+ can do it with textureshaders. a) just this particular example and some others, but not at all all the possible things we could do else (see radeon8500 for real pixelshaders), i''m talking about pregf3 hw here, wich can do it twice the speed than now.. would have been fun as promotion for gf3 to see the demos with such an ext.. i''m still waiting anyways for an ext that allows to bind pixeldata as vertexdata, as it would be very helpful for using vertexprograms and pixelprograms for real programs, not just plotting pixels, but updating physics and more.. but nvidia don''t think its important, i talked several times about it with cass and others.. implementing it is not a big problem, but they don''t care about me

its your statement, can''t prove its true, but i dont believe points get into the rastericer if they have glPointSize(1) and no antialiasing on points.. else, yes, but not that way.. that would just be stupid. skipping the rastericer would make much more sence, as triangle setup is, even while quite fast, a possible bottleneck (and when you draw tons of points it can be even a quite big bottleneck..)

anyways, i don''t care about that all.. the demo runs smooth here on gf2mx, and it does not use any softwarecalculations, so its hardware for me. it is not a hardware drawn triangle, or quad, but its a hardware postimageprocessingeffect, wich does perpixel do a texturelookup into a cubemap. you can do this on gf3+ with a rendertexture and drawing a quad afterwards fullscreen, but its essencially the same, except we draw instead of a quad each pixel.. and drawing that amount of pixels is not necessarily slow, as a) this demo is not slow for me b) i''ve benched how much i get with VAR on agpmemory, and well, if i could only bind the output directly as vertexarray and use this, it would be much faster i can say for sure.. (doing the points in agpmem doubles the speed of rendering points on my gf2mx, and this even with full ztest and blending and alphatest, could be much faster without as it is as well fillrate eating (smooth points..)). the ext would give the power of the full vertexprogram perpixel for a final image post process.. would be cool..

"take a look around" - limp bizkit
www.google.com
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

quote:
Original post by davepermen
i dont call it softwarerendering as there is no softwarerastericer involved, but thats just mather of taste.



Well, 'software rendering' is perhaps the wrong word, more software pass: the CPU touches pixel data, so it's not HW anymore.

quote:

you read them back and send them directly up, no stuff is done in software except this. drivers often readback data or send data up.. that does not affect hw.


They do it very rarely, and never readback framebuffer data on their own (except for software emulated features, such as accumulation buffer, which makes it really slow).

quote:

its your statement, can't prove its true, but i dont believe points get into the rastericer if they have glPointSize(1) and no antialiasing on points..


As I said, I am not 100% sure if it will invoke the interpolators (=rasterizer) or not. But that's not the point. It will invoke the full geometry transformation pipeline (since a gl_point is treated like a normal 3D point), and that can render an application geometry limited in no time. And think about the incredible AGP bus bandwidth it will take every frame.

quote:

you can do it without stall with your own renderingthread wich can wait for the readback (its often missunderstood that the agp backsending of data is slow, or that it stops the gpu working, or what ever.. its just the cpu is in idle while gpu is not yet finished with drawing.. so do a second drawing thread on your own..).


There is nothing misunderstood. The AGP bus is like a one way street: it is 100% optimized on data transfer towards the graphics card, and not the other way round. Readbacks will be much slower. Also the GPU will get stalled when issuing a readback, and that has nothing to do with the CPU being idle. A readback will: a) initiate a full fifo flush, b) flush an active VAR (which is very bad), and finally c) lock the command pipeline for the time the readback is performed and all framebuffer data has been transfered. No other commands will be accepted by the GPU in that time. This very undesirable effect is commonly called a 'command stream bubble'.

/ Yann

[edited by - Yann L on July 7, 2002 9:23:54 PM]
a readback on geforce hw with current drivers:
gpu just draws the stuff it has to, once finished, it send back the data it has to, when done, it waits for future commands. no real problem for the gpu. the cpu instead has to wait till the gpu is finished to sendback the data. that you can simply solve with a waiting thread.
that statement is from cass from nvidia. driver developer..
and sure, data is slow through agp in the wrong direction, but its acceptable for low resolutions (no one talks about 1024x768).
and the t&l unit should be capable of processing that much points anyways.. as they anounce them all to be that fast.. at least, its the fastest t&l settings possible.. one texture, 5 coordinates, what else do you want? (sure, its 8 coordinates in the end anyways, but well)

and what i said is if you could "switch" pixelbuffers to vertexbuffers you would not need the whole agptransfer anymore.. then it could be as fast as possible for the hw.. would even be handy for gf3+, would give much power in the pixelshader (as you get the vertexshader as pixelshader). and yes, its not for highresimages. but not everyone wants to develop ya 1024x768x32x100fps game.. there are people that want new effects, even when they run at 320x240, with doubling of speed all half year thats no problem.. today 320x240, next year 640x480, one year after, 1280x960. actually, a normal game has at minimum 2 years of development time, so you _could_ dev at 320x240 at the beginning, if and only if the todays hw would at least let users use more than just the fast features.. say the radeon8500 would support more than two passes in the pixelshader. not important if it would then have to reload the stuff perpixel and such, it would be slow yes, but faster than software.. could be yet used in 3dsmax for example..

"take a look around" - limp bizkit
www.google.com
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

quote:

gpu just draws the stuff it has to, once finished, it send back the data it has to, when done, it waits for future commands. no real problem for the gpu. the cpu instead has to wait till the gpu is finished to sendback the data.



The GPU doesn't send the data. The CPU pulls it. AGP DMA only works in one direction: towards the graphics card. The other way round, it's the CPU pulling it (another reason for it's slowness). The GPU does virtually nothing in the meantime (perhaps serving pages, I don't know if the memory access mechanisms are implemented in the GPU core).

quote:

that you can simply solve with a waiting thread.


No. See above. And don't forget that OpenGL is not threadsafe.

quote:

that statement is from cass from nvidia. driver developer..
and sure, data is slow through agp in the wrong direction, but its acceptable for low resolutions (no one talks about 1024x768).


I talk about 1024x768. That's the target resolution of any modern game. And if you shade fullscreen geometry, you'll end up at that resolution.

quote:

and what i said is if you could "switch" pixelbuffers to vertexbuffers you would not need the whole agptransfer anymore..


As said, that's why nVidia implemented texture shaders. It could be an interesting feature (and would actually be rather simple to implement in HW), but for the moment I don't really see for what it can be useful. OK, you could send per pixel data through the vertex pipeline, but it would be far better to work on more powerful pixelshaders instead.

And at the bottomline: We don't have that feature. But we have texture shaders on GF3+. So I use them on GF3, and disable that specific feature on GF2-. A game running smoothly only at 320x240 is just not acceptable.

/ Yann

[edited by - Yann L on July 8, 2002 10:01:28 AM]
Advertisement
320x240 not playable

thats just an annoying statement. after playing years at such resolutions.. and i prefer playing at those resolutions with good fsaa than at 1024x768 without. as the final image does not look that clean and geometric but more natural. outcast i had to play on 320x240 the whole game, and its my most loved game.. the highest res i use is 800x600, i don''t really need more.. and never will.. (tv does not have that much, so what? i played on consoles for a long time). do you really think you can play doom3 at 1024x768 with your gf3? do you?
and i am happy with simply support for features. even while they are not that useful for everyone, they are useful for others.. as we don''t have powerful pixelshaders at all today, the vertexshaderemulator would help to start planning developing for the future.. and the mapping of vertex and image buffers should be there since a long time anyways... much uses would be there.. manipulating a heightmap in the pixelshader for the watersurface, that works yet quite good. you could bind that afterwards directly as your watersurface and render it in 3d.. for example.. or do animate your meshes fully on the gpu.. the whole physics and all.. i really hope future hw does support this, as soon we have floatingpointcolormath anyways, so there is no physical difference between vertex and pixelarrays anymore..

"take a look around" - limp bizkit
www.google.com
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

This topic is closed to new replies.

Advertisement