Advertisement

DirectCompute: sync within warp

Started by October 26, 2017 03:31 PM
10 comments, last by JoeJ 7 years, 3 months ago
1 hour ago, maxest said:

Performance differs in both listings. Second one is around 15% faster.

What a shame. :(

You could use preprocessor for seperate code paths, like AMD_GCN, NV_KEPLER, NV_PASCAL, NV_SAVE etc...

If a future chip is not known by your app, you can use NV_SAVE with all the barriers. But there's still the small risk a driver update would brake NV_PASCAL.

This topic is closed to new replies.

Advertisement