HLSL: Clip alternative for Vertex Shader?

Started by dopplex April 16, 2008 08:15 PM

4 comments, last by elliotwoods 11 years, 5 months ago

Author

164

April 16, 2008 08:15 PM

I'm looking for a way to efficiently discard a vertex in a vertex shader. I found a brief reply about potentially causing the vertex to form a degenerate triangle in order to discard it, but the idea wasn't delved into. (http://www.gamedev.net/community/forums/topic.asp?topic_id=421211) I was thinking that I could artificially replicate a clipped vertex by simply passing a scalar value to the pixel shader - 0 for vertices I want to keep, -1 for vertices to chop. As the first statement in the pixel shader "clip(myClipScalar);" should then discard any pixels that are in a triangle with a clipped vertex. Questions: 1. Would this work, in practice? I've never really used clip(), and the little I've read of it indicates that it can create some quirkiness. 2. If so, would it increase efficiency? (I'd think it would improve fill rate, at least...) 3. How would this approach compare to the one mentioned in the other thread of creating a degenerate triangle in order to get rid of the vertex. Thanks!

MJP

20,297

April 16, 2008 10:58 PM

In most cases using clip in the pixel shader won't save you anything, since the pixel shader still needs to execute. If you make use of dynamic branching you could possibly increase performance if the shader is long and the branching is coherent, but this can be difficult.

I'm guess creating a degenerate triangle would be quicker, since doing so would allow you to skip the pixel shader entirely.

The Blog | The Book

dopplex

Author

164

April 17, 2008 08:58 AM

Quote:
Original post by MJP
In most cases using clip in the pixel shader won't save you anything, since the pixel shader still needs to execute. If you make use of dynamic branching you could possibly increase performance if the shader is long and the branching is coherent, but this can be difficult.

I'm guess creating a degenerate triangle would be quicker, since doing so would allow you to skip the pixel shader entirely.

That makes sense. I was thinking that the easy way to make the triangle degenerate would be to just output the average of the positions of the two adjacent vertices. (I'm planning on using a custom vertex declaration that has the positions of the neighboring vertices in the triangle. All of the vertices that the processing will be run on will only be part of one triangle)

I'm a bit confused as to why clip would result in no performance improvement though. Let's say that the very first instruction in a long pixel shader program is the clip. In the event that the evaluation of clip told the shader to discard the pixel, is the GPU going to keep evaluating the rest of the shader? In a way, I thought that clip *was* a kind of dynamic branching - am I misunderstanding?

Is it that this doesn't happen, or is it that the actual running of the shader isn't the primary bottleneck for the pixel shader pipeline and that getting the triangle culled due to being degenerate is going to avoid some more expensive operations?

(I'm going to test this for myself but at the moment my engine is totally nonfunctional as I'm midway through pulling out the guts of its content importing processes - and things are pretty nonfunctional at the moment.)

(BTW: Overall what I'm trying to do is throw a ton of extra triangle data (an additional vertex buffer for each mesh of roughly the same size as the mesh's original vertex buffer) at the GPU and let it do silhouette detection for me instead of the CPU. Since the vast majority of those triangles will not in fact be on silhouettes - and the vertex shader will be able to detect that quite quickly (a couple of transforms, a couple of dot products) - I'm looking for a way to minimize the load that these discarded triangles cause for the pixel shader)

[Edited by - dopplex on April 17, 2008 9:58:05 AM]

MJP

20,297

April 17, 2008 10:01 AM

Quote:
Original post by dopplex

I'm a bit confused as to why clip would result in no performance improvement though. Let's say that the very first instruction in a long pixel shader program is the clip. In the event that the evaluation of clip told the shader to discard the pixel, is the GPU going to keep evaluating the rest of the shader? In a way, I thought that clip *was* a kind of dynamic branching - am I misunderstanding?

No there's no dynamic branching with a texkill instruction (this is the ASM instruction that cancels pixel rendering). The entire shader will evaluate, then afterwards it will be kept or discarded based on the parameters sent to texkill. I checked into it and it looks like this holds even if your pixel shader is using ps_3_0 dynamic branching. ATI also used to say that texkill really messes up their hierarchal z-culling, which is z-cull pixels before they're rendered. I'd imagine this was true for Nvidia's GPU's as well.

The Blog | The Book