Advertisement

DirectCompute thread groups

Started by November 02, 2017 05:48 PM
2 comments, last by galop1n 7 years, 3 months ago

Is [numthreads(1, GROUP_SIZE, GROUP_SIZE)]

as efficient as [numthreads(GROUP_SIZE, GROUP_SIZE, 1)] ?

CUDA confused me by disabling their z dimension.

🧙

Personally i assume there is no hardware for those kind of dimensional thread partitioning at all, and it's just something that should make things easier for us.

I did not look at any ISA output to prove that, but i know that putting the thread ID into a register is faster than constantly reading it from the built in API variable (Vulkan and AMD), so i doupt there are 3 hardware registers holding 3 indices for nothing all the time.

Anyone knows?

 

Advertisement

Some GCN hardware have a halfed wave spawn rate if you use the Z dimension, not sure if it is still true or not. GCN again, there is an input vgpr per dimension and no combined one, at least on PS4 ( taken from a compute ISA s14 = s_tgid_x s15 = s_tgid_y v0 = v_thread_id_x v1 = v_thread_id_y ).

 

You could look at the ISA in Pix for AMD to confirm all that on PC.

This topic is closed to new replies.

Advertisement