DirectCompute - shared memory bank conflicts

Graphics and GPU Programming Programming DX12

Started by _void_ April 01, 2018 11:48 AM

2 comments, last by _void_ 6 years, 10 months ago

_void_

Author

866

April 01, 2018 11:48 AM

Hi guys,

I am implementing parallel prefix sum on DirectCompute and using GPU Gems 3 article as a reference for CUDA implementation.

In the article, the authors add logic to handle shared memory bank conflicts.

Mark Harris, one of the authors, claimed later at Stack Overflow that you do not need to handle explicitly bank conflicts in CUDA any more.

How about DirectCompute? Do you need to manage this yourself? Is there a difference between D3D10/D3D11/D3D12 versions?

Thanks!

JoeJ

4,399

April 01, 2018 12:45 PM

This entirely depends on hardware, not the API. AMD has infos in its OpenCL optimization guides: https://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/

Personally i never tested those effects for shared (LDS) memory, but i know memory access patterns to global (main) memory matter a lot (achieved 2 x speedups often.)

In addition to what is said in those guides, like having larger power of 2 strides between parallel threads is bad, i've noticed the same is true for serial access.

Example, each thread is doing this:

for (int i=0; i<n; i++) globalMemory[threadIndex*256+i] = x; // slow

... has very bad performance on GCN. To fix it change the stride to a non power of 2:

for (int i=0; i<n; i++) globalMemory[threadIndex*257+i] = x; // fast

This seems undocumented, so i mention it. For Nvidia both approaches performed equally fast for me.

Here is one more resource about the effect with LDS memory: http://diaryofagraphicsprogrammer.blogspot.co.at/2015/01/reloaded-compute-shader-optimizations.html

Seems it's more an issue for older hardware, not sure about NV.

_void_

Author

866

April 01, 2018 04:26 PM

@JoeJ Great :-) Thanks for the links!

DirectCompute - shared memory bank conflicts

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

DirectCompute - shared memory bank conflicts

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines