DirectCompute thread groups

Graphics and GPU Programming Programming

Started by matt77hias November 02, 2017 05:48 PM

2 comments, last by galop1n 7 years, 3 months ago

matt77hias

Author

560

November 02, 2017 05:48 PM

Is [numthreads(1, GROUP_SIZE, GROUP_SIZE)]

as efficient as [numthreads(GROUP_SIZE, GROUP_SIZE, 1)] ?

CUDA confused me by disabling their z dimension.

🧙

JoeJ

4,409

November 02, 2017 07:48 PM

Personally i assume there is no hardware for those kind of dimensional thread partitioning at all, and it's just something that should make things easier for us.

I did not look at any ISA output to prove that, but i know that putting the thread ID into a register is faster than constantly reading it from the built in API variable (Vulkan and AMD), so i doupt there are 3 hardware registers holding 3 indices for nothing all the time.

Anyone knows?

galop1n

1,046

November 02, 2017 08:09 PM

Some GCN hardware have a halfed wave spawn rate if you use the Z dimension, not sure if it is still true or not. GCN again, there is an input vgpr per dimension and no combined one, at least on PS4 ( taken from a compute ISA s14 = s_tgid_x s15 = s_tgid_y v0 = v_thread_id_x v1 = v_thread_id_y ).

You could look at the ISA in Pix for AMD to confirm all that on PC.