Advertisement

Skinning with compute Shader => pipeline integration

Started by January 18, 2018 02:20 PM
4 comments, last by evelyn4you 7 years ago

hi,

until now i use typical vertexshader approach for skinning with a Constantbuffer containing the transform matrix for the bones and an the vertexbuffer containing bone index and bone weight.

Now i have implemented realtime environment  probe cubemaping so i have to render my scene from many point of views and the time for skinning takes too long because it is recalculated for every side of the cubemap.

For Info i am working on Win7 an therefore use one Shadermodel 5.0 not 5.x that have more options, or is there a way to use 5.x in Win 7

My Graphic Card is Directx 12 compatible NVidia GTX 960

the member turanszkij has posted a good for me understandable compute shader. ( for Info: in his engine he uses an optimized version of it )

https://turanszkij.wordpress.com/2017/09/09/skinning-in-compute-shader/

Now my questions

 is it possible to feed the compute shader with my orignial vertexbuffer or do i have to copy it in several ByteAdressBuffers as implemented in the following code ?

  the same question is about the constant buffer of the matrixes

 my more urgent question is how do i feed my normal pipeline with the result of the compute Shader which are 2 RWByteAddressBuffers that contain position an normal

for example i could use 2 vertexbuffer bindings

1 containing only the uv coordinates

2.containing position and normal

How do i copy from the RWByteAddressBuffers to the vertexbuffer ?

 

(Code from turanszkij )

Here is my shader implementation for skinning a mesh in a compute shader:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
struct Bone
{
float4x4 pose;
};
StructuredBuffer<Bone> boneBuffer;
 
ByteAddressBuffer vertexBuffer_POS; // T-Pose pos
ByteAddressBuffer vertexBuffer_NOR; // T-Pose normal
ByteAddressBuffer vertexBuffer_WEI; // bone weights
ByteAddressBuffer vertexBuffer_BON; // bone indices
 
RWByteAddressBuffer streamoutBuffer_POS; // skinned pos
RWByteAddressBuffer streamoutBuffer_NOR; // skinned normal
RWByteAddressBuffer streamoutBuffer_PRE; // previous frame skinned pos
 
inline void Skinning(inout float4 pos, inout float4 nor, in float4 inBon, in float4 inWei)
{
 float4 p = 0, pp = 0;
 float3 n = 0;
 float4x4 m;
 float3x3 m3;
 float weisum = 0;
 
// force loop to reduce register pressure
 // though this way we can not interleave TEX - ALU operations
 [loop]
 for (uint i = 0; ((i &lt; 4) &amp;&amp; (weisum&lt;1.0f)); ++i)
 {
 m = boneBuffer[(uint)inBon].pose;
 m3 = (float3x3)m;
 
p += mul(float4(pos.xyz, 1), m)*inWei;
 n += mul(nor.xyz, m3)*inWei;
 
weisum += inWei;
 }
 
bool w = any(inWei);
 pos.xyz = w ? p.xyz : pos.xyz;
 nor.xyz = w ? n : nor.xyz;
}
 
[numthreads(1024, 1, 1)]
void main( uint3 DTid : SV_DispatchThreadID )
{
 const uint fetchAddress = DTid.x * 16; // stride is 16 bytes for each vertex buffer now...
 
uint4 pos_u = vertexBuffer_POS.Load4(fetchAddress);
 uint4 nor_u = vertexBuffer_NOR.Load4(fetchAddress);
 uint4 wei_u = vertexBuffer_WEI.Load4(fetchAddress);
 uint4 bon_u = vertexBuffer_BON.Load4(fetchAddress);
 
float4 pos = asfloat(pos_u);
 float4 nor = asfloat(nor_u);
 float4 wei = asfloat(wei_u);
 float4 bon = asfloat(bon_u);
 
Skinning(pos, nor, bon, wei);
 
pos_u = asuint(pos);
 nor_u = asuint(nor);
 
// copy prev frame current pos to current frame prev pos
streamoutBuffer_PRE.Store4(fetchAddress, streamoutBuffer_POS.Load4(fetchAddress));
// write out skinned props:
 streamoutBuffer_POS.Store4(fetchAddress, pos_u);
 streamoutBuffer_NOR.Store4(fetchAddress, nor_u);
}

 

Hi,

1) You don't have to copy, you can create a vertex buffer which can be used as a Byte Address Buffer. You can even create it as a typed buffer, so that you don't have to do manual type conversion in the shader code by hand.

For creating it as a ByTeAddressbuffer, you should specify D3D11_BIND_SHADER_RESOURCE Bind flags when creating the resource and use the Misc Flag D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS. When creating a shader resource for a raw buffer like this, you should use DXGI_FORMAT_R32_TYPELESS Format for the shader resource view desc, declare the viewdimension as D3D11_SRV_DIMENSION_BUFFEREX and srv_desc.BufferEx.Flags should be D3D11_BUFFEREX_SRV_FLAG_RAW. Also srv_desc.BufferEx.NumElements should be calculated as something like BufferByteWidth / 4.

Unfortunately you can't create a raw buffer view for a constant buffer, because a constant buffer can have an entirely different behaviour in the GPU implementation side for accessing data. If you are exceeding constant buffer limits, you can create your matrix array as a raw buffer, structured buffer or typed buffer, but performance might vary, so keep that in mind.

2) So like the previous paragraph, you should not copy from vertex buffer to your compute shader RW buffer and vica versa, you can just create an unordered access view for the buffer which the compute shader will use, and you can also bind it as a regular vertex buffer too. But be sure that you unbind the unordered access view before binding it as vertex buffer, or the D3D11 runtime will prevent you to bind it. 

As a side note, you probably don't have to include uv coordinates in your skinning shader, so it would be a good idea to have them in a separate buffer. 

Good luck! :)

Advertisement

 

hello turanszkij,

many, many thanks for you kind answer.

Yesterday evening have set up a test framework only for testing compute shader with skinning.

Your explanations were very helpfull. Especially the hint for the correct flags to set.

Without this it was impossible to create a buffer that can be used as a vertex buffer an also as UOA View.

In my test framework

1. create structured buffer which can be used for  UOA View und Vertex Buffer

2. create a staging buffer with cpu access and fill it with testdata array

3. copy values from the staging buffer to the structured buffer

4. compile and run a compute shader that changes the data of the structured buffer

5. create again a staging buffer an copy the values from the structured buffer back

6. access staging buffer and compare values if correct or not. By this way i check, ifthe compute shader does work correctly.

Did i understand things right ?

a. When using a buffer e.g. structured buffer in a shader e.g. vertex shader i always have to use a ShaderResourceView

b. when using a buffer e.g. rwstructured buffer in a compute shader i allways have to use a UOA View ?

c. a constant buffer is a special very fast buffer and can NEVER used in a compute shader

 

The next step would be to feed my compute shader with

first a "typical" strutured buffer including

- position, normal, bone_index, weight_index

the second buffer will be a "typical" structured buffer including the bone transform maticies

- matrix

the third buffer will be the "special" mentioned buffer above raw read write byteadressbuffer that can be used as UOA View and vertexbuffer including

- position, normal, uv

with this buffer ( containing the bone transformed vertecies) if feed my "old pipeline" as before so the old shader even does not know that this is a bone transformed object, by this way there are no shader permutations necessary.

Is this the right way or do i have a wrong concept ?

 

 

 

 

 

On ‎19‎/‎01‎/‎2018 at 11:48 AM, evelyn4you said:

In my test framework

1. create structured buffer which can be used for  UOA View und Vertex Buffer

Unfortunately you can't create a buffer that can be bound as Vertex Buffer and Structured Buffer too. You have to use either ByteAddressBuffer (RAW buffer) or just plain Buffer (Typed buffer) type to access vertex buffer from a compute shader. The ByteAddressBuffer is good because it doesn't do any magic behind the scenes, just loads memory, but you have to address it with byte offsets. The typed buffer (declared as Buffer<float4> for example in the shader) works like a texture, so you have automatic type conversions handled for you, and can address by element.

On ‎19‎/‎01‎/‎2018 at 11:48 AM, evelyn4you said:

 

Did i understand things right ?

a. When using a buffer e.g. structured buffer in a shader e.g. vertex shader i always have to use a ShaderResourceView

b. when using a buffer e.g. rwstructured buffer in a compute shader i allways have to use a UOA View ?

c. a constant buffer is a special very fast buffer and can NEVER used in a compute shader

a.) When it is a read-only resource in your shader, you use a shader resource view, when you also want to write it, you use an unordered access view. You can create both a shaderresourceview and unorderedaccessview for a resource, but you can only bind one of them to a shader at any time.

b.) Resources with RW_ prefix always correspond to an unordered access view

c.) Constant buffers can be used in any shader including compute shaders but they are strictly Read Only. You can read more about constant buffer vs. structured buffer performance here: https://developer.nvidia.com/content/understanding-structured-buffer-performance

On ‎19‎/‎01‎/‎2018 at 11:48 AM, evelyn4you said:

 

The next step would be to feed my compute shader with

first a "typical" strutured buffer including

- position, normal, bone_index, weight_index

the second buffer will be a "typical" structured buffer including the bone transform maticies

- matrix

the third buffer will be the "special" mentioned buffer above raw read write byteadressbuffer that can be used as UOA View and vertexbuffer including

- position, normal, uv

with this buffer ( containing the bone transformed vertecies) if feed my "old pipeline" as before so the old shader even does not know that this is a bone transformed object, by this way there are no shader permutations necessary.

Is this the right way or do i have a wrong concept ?

You're right, you have a read only vertex buffer, containing the unanimated "T-pose" geometry. This is accessible as a byteaddressbuffer or plain Buffer in a compute shader. Also can use StructuredBuffer if you don't have a vertex buffer binding for it. You have a matrix array which contains bone transforms, in a structured buffer preferably. Lastly, you have the RWByteAddressBuffer or RWBuffer for the animated geometry. This is no good with StructuredBuffer, because you will also bind it as a vertex buffer when rendering (unless you don't use vertex buffers, and load vertices manually in a vertex shader).

Good luck!

PS. next time Quote me or tag me, because then I will receive a notification. Otherwise I might forget this thread and not see your question. :) 

19 minutes ago, turanszkij said:

Unfortunately you can't create a buffer that can be bound as Vertex Buffer and Structured Buffer too. You have to use either ByteAddressBuffer (RAW buffer) or just plain Buffer (Typed buffer) type to access vertex buffer from a compute shader. The ByteAddressBuffer is good because it doesn't do any magic behind the scenes, just loads memory, but you have to address it with byte offsets. The typed buffer (declared as Buffer<float4> for example in the shader) works like a texture, so you have automatic type conversions handled for you, and can address by element.

a.) When it is a read-only resource in your shader, you use a shader resource view, when you also want to write it, you use an unordered access view. You can create both a shaderresourceview and unorderedaccessview for a resource, but you can only bind one of them to a shader at any time.

b.) Resources with RW_ prefix always correspond to an unordered access view

c.) Constant buffers can be used in any shader including compute shaders but they are strictly Read Only. You can read more about constant buffer vs. structured buffer performance here: https://developer.nvidia.com/content/understanding-structured-buffer-performance

You're right, you have a read only vertex buffer, containing the unanimated "T-pose" geometry. This is accessible as a byteaddressbuffer or plain Buffer in a compute shader. Also can use StructuredBuffer if you don't have a vertex buffer binding for it. You have a matrix array which contains bone transforms, in a structured buffer preferably. Lastly, you have the RWByteAddressBuffer or RWBuffer for the animated geometry. This is no good with StructuredBuffer, because you will also bind it as a vertex buffer when rendering (unless you don't use vertex buffers, and load vertices manually in a vertex shader).

Good luck!

PS. next time Quote me or tag me, because then I will receive a notification. Otherwise I might forget this thread and not see your question. :) 

 

hello turanszkij,

again, many thanks for your answer.

Now things have hopefully become clear to me.

I will try and report. I also have opened a second thread for a comparable implementation as geometry streamout method.

When you could spend a little time for me to give an answer that would be very kind of you

 

 

 

This topic is closed to new replies.

Advertisement