Vertex Buffer vs Structured Buffer

chlerub · 2018-01-05T17:50:38

So i was contemplating how to instance skinned geometry and came across this DICE paper (http://www.dice.se/wp-content/uploads/2014/12/GDC11_DX11inBF3_Public.pdf). I took some time to compare the performance of instancing via vertex buffer binding versus StructuredBuffer+SV_InstanceID. To be a bit more specific, here's what i compared: Version A: bind buffer containing a float4x4 per instance as vertex buffer. passed to vertex shader via input layout. Version B: set structured buffer containing a float4x4 as shader resource view. lookup in vertex shader by SV_InstanceID. To my surprise, StructuredBuffer comes out ~30% slower on my GTX1080 Does anyone have a similar experience? I'm wondering if there's any way to optimize this, else DICEs recommendation leaves me puzzled I suppose i'm still going to use the StructuredBuffer to get around the constant buffer size limitation for bone matrices, but it'd be so nice to have less shader permutations...

Graphics and GPU Programming Programming

Started by chlerub January 04, 2018 02:50 PM

10 comments, last by galop1n 7 years, 1 month ago

galop1n

1,046

January 05, 2018 05:50 PM

23 minutes ago, chlerub said:
Hmyeah i guess so. But when the API calls alone make such a huge difference... oh well

It does not make sense either to measure the DirectX CPU that way. Especially the nVidia driver use heavy multithreading and has a notorious fat do it all Present call black-box. What you may have observe is some cost migrating from a deferred path to immediate and vice versa. And in all real scenario i had to observe, AMD drivers always perform worse than nVidia in regards to their CPU usage accross a frame, trust me !

Also, the speed of light of a single draw calls, if interesting, usually does not matters, drivers are not optimized to send one draw call, but thousands with complex state changes. The difference you see could absolutely vanish in a real case scenario because the driver can work in parallel and do what so ever.

As for the GPU, a Cube is again one of the worst unit testing you can have. It does not provide a proper amount of work to the GPU per instance and you can hit hidden bottleneck with partially empty wavefronts and bad vertex cache usage.

I recommend you to focus on GPU performance, it is always possible to some extent to improve CPU by getting rid of redudant states, reordering draw orders, but your GPU frame usually quickly reach a sum of little things that are as fast as possible but that are to be here ! And to measure the GPU, you need to use time stamp Queries first, Then make sure that your driver is not throttling frequency/voltage when you run !!!!!

And for your sanity, forget about vertex buffers, they are so rigid, even when not involving per instance data. Even if it would cost you a little more cpu, the gain in flexibility, control and maintenance is priceless that you should decide to afford it !

Vertex Buffer vs Structured Buffer

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Vertex Buffer vs Structured Buffer

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines