Advertisement

Compute Shader projects

Started by October 28, 2017 06:15 AM
3 comments, last by JoeJ 7 years, 3 months ago

After making a fool of myself in another thread I realize its about time I really commit to learning compute shaders.  What are some good projects to do that would give me "the complete experience" when it comes to compute shaders.  I've been putting this off a long time and would really like the most concise way to go about this.

Thanks in advance.

edit - my interests and goals are gpu driven pipelines, gpu occlusion culling, and similar topics.

-potential energy is easily made kinetic-

I always recommend the chapter from OpenGL Super Bible. Even if you're DX only. Everything you need to know, well chosen examples.

Some ideas to play around with:

Any kind of image processing (e.g. simple blur, SSAO) whre you benefit from caching part of the image to LDS, so conpute shader reads only one pixel per thread vs. a pixel shader that would read multiple pixels.

Cloth simulation.

Ray tracing: E.g. Bin all rays to coarse BVH nodes, For each coarse node load its detailed subtree to LDS and intersect all binned rays with it. (Eliminates the data divergence problem of a pixel shader raytracer where every thread traverses the tree differently.)

Or more practical things like GPU culling (but i don't think there is much to learn here.)

 

There should be much more ideas, but it's always about utilizing the option to have threads communicating with each other, or/and to have common data for them. And even if you don't need anything of this, you don't need to draw a triangle to emit pixel shaders. 

 

 

 

Advertisement
50 minutes ago, JoeJ said:

I always recommend the chapter from OpenGL Super Bible. Even if you're DX only. Everything you need to know, well chosen examples.

Any particular edition? 5th, 6th, or 7th?

51 minutes ago, JoeJ said:

Cloth simulation

Is there any particular method or technique I should look up?

52 minutes ago, JoeJ said:

Or more practical things like GPU culling (but i don't think there is much to learn here.)

I was thinking of implement frustum culling as a simple example.  I was wondering if an append buffer is the best solution for outputting the list of visible entities?

-potential energy is easily made kinetic-

My edition is 6th (They have a bug with wrongly ordered execution/memory barriers, memoryBarrierShared(); barrier(); is correct.)

For cloth i was thinking on a very simple setting, like a grid with all valence 4 vertices so neighbours can be easily indexed, and simple verlet integration. Similar to bluring an image, or do some fluid / smoke simulation in a grid, the thing to learn is to cache things in LDS. Probably you should only implement things you already know about how they work. (Otherwise personally i still implement them on CPU first.)

16 minutes ago, Infinisearch said:
1 hour ago, JoeJ said:

Or more practical things like GPU culling (but i don't think there is much to learn here.)

I was thinking of implement frustum culling as a simple example.  I was wondering if an append buffer is the best solution for outputting the list of visible entities?

Probably yes. (I'd fill a buffer feeding an indirect draw.)

Khronos APIs don't have append buffers, here you increase a counter with atomicAdd to get a destination index to write your stuff to a regular buffer. (Neither counter nor buffer are special - one reason why i think MS tends to overspecify.)

But this is a good example for a common optimization detail to avoid expensive atomics to global memory:

Instead appending to the global buffer by incrementing a global counter within EACH thread, we first write the list to LDS incrementing a local (group shared) counter (so in LDS as well). After the workgroup is done with the list (or it becomes too full), only ONE thread does the atomic add to global memory with the list size, and then we copy the list from LDS to global memory buffer. That's mostly faster (but not always).

I don't know if MS can decide to implement this by itself under the hood with append buffers, but i doupt it because we would loose control over exact LDS usage which affects occupancy.

 

 

 

 

 

This topic is closed to new replies.

Advertisement