Hi,
I am trying to brute-force a closest-point-to-closed-triangle-mesh algorithm on the GPU by creating a thread for each point-primitive pair and keeping only the nearest result for each point. This code fails however, with multiple writes being made by threads with different distance computations.
To keep only the closest value, I attempt to mask using InterlockedMin, and a conditional that only writes if the current thread holds the same value as the mask after a memory barrier.
I have included the function below.
As can be seen I have modified it to write to a different location every time the conditional succeeds for debugging. It is expected that multiple writes will take place, for example where the closest point is a vertex shared by multiple triangles, but when I read back closestPoints and calculate the distances, they are different, which should not be possible.
The differences are large (~0.3+) so I do not think it is a rounding error. The CPU equivalent works fine for a single particle. After the kernel execution, distanceMask does hold the smallest value, suggesting the problem is with the barrier or the conditional.
Can anyone say what is wrong with the function?
RWStructuredBuffer<uint> distanceMask : register(u4);
RWStructuredBuffer<uint> distanceWriteCounts : register(u0);
RWStructuredBuffer<float3> closestPoints : register(u5);
[numthreads(64,1,1)]
void BruteForceClosestPointOnMesh(uint3 id : SV_DispatchThreadID)
{
int particleid = id.x;
int triangleid = id.y;
Triangle t = triangles[triangleid];
float3 v0 = GetVertex1(t.i0);
float3 v1 = GetVertex1(t.i1);
float3 v2 = GetVertex1(t.i2);
float3 q1 = Q1[particleid];
ClosestPointPointTriangleResult result = ClosestPointPointTriangle(q1, v0, v1, v2);
float3 p = v0 * result.uvw.x + v1 * result.uvw.y + v2 * result.uvw.z;
uint distance = asuint(length(p - q1));
InterlockedMin(distanceMask[particleid], distance);
AllMemoryBarrierWithGroupSync();
if(distance == distanceMask[particleid])
{
uint bin = 0;
InterlockedAdd(distanceWriteCounts[particleid],1,bin);
closestPoints[particleid * binsize + bin] = p;
}
}