Advertisement

Anyone else having problems with D3D12 compute shaders on NVIDIA hardware?

Started by July 25, 2017 10:15 PM
12 comments, last by Ben Jamin 7 years, 3 months ago

I'm having an odd problem with D3D12 compute shaders. I have a very simple compute shader that does nothing but write the global ID of the current thread out to a buffer:


RWStructuredBuffer<uint> g_pathVisibility : register(u0, space1);

cbuffer cbPushConstants : register(b0)
{
	uint g_count;
};

[numthreads(32, 1, 1)]
void main(uint3 DTid : SV_DispatchThreadID)
{
	if(DTid.x < g_count)
	{
		g_pathVisibility[DTid.x] = DTid.x + 1;
	}
}

I'm allocating 2 buffers with space or 128 integers. One buffer is the output buffer for the shader above and the other is a copy destination buffer for CPU readback. If I set numthreads() to any power of two, for example it's set to 32 above, I get a device reset error on NVIDIA hardware only. If I set numthreads() to any non-power of 2 value the shader works as expected. The exceptionally odd thing is that all of the compute shaders in the D3D12 samples work fine with numthreads() containing powers of 2. It doesn't matter if I execute the compute shader on a graphics queue or a compute queue - it's the same result either way. I've tested this on a GTX 1080 and a GTX 1070 with identical results. AMD cards seem to work as expected. Anyone have any idea what the hell could be going on? I tried asking NVIDIA on their boards but per-usual they never responded. I'm using their latest drivers. I've attached my sample application if anyone is interested, it's a UWP app since Visual Studio provides a nice D3D12 app template that I use to play around with simple ideas. The shader in question in the project is TestCompute.hlsl and the function where the magic happens is Sample3DSceneRenderer::TestCompute() line 1006 in Sample3DSceneRenderer.cpp.

PathTransform_2.zip

I've definitely run into a few Nvidia DX12 driver bugs (especially when DX12 was new), but I haven't personally seen anything with compute shaders. The driver and/or shader JIT is probably just trying to do something clever, and ends up doing something bad. 

Advertisement

Thanks, I figured it was likely a driver issue but wanted to make sure I wasn't crazy. I guess I'll continue waiting for the next major driver release.

Perhaps you should file a report with nvidia?

-potential energy is easily made kinetic-

I get no GPU hang here on a 980 Ti but I do get a GPU Based Validation error that you seem to have introduced:

D3D12 ERROR: GPU-BASED VALIDATION: Dispatch, Descriptor heap index out of bounds: Heap Index To DescriptorTableStart: [0], Heap Index From HeapStart: [0], Heap Type: D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV, Num Descriptor Entries: 0, Index of Descriptor Range: 0, Shader Stage: COMPUTE, Root Parameter Index: [1], Dispatch Index: [0], Shader Code: TestCompute.hlsl(13,3-40), Asm Instruction Range: [0xbc-0xdf], Asm Operand Index: [0], Command List: 0x000001F3C5E38C20:'m_testComputeList', SRV/UAV/CBV Descriptor Heap: 0x000001F3C5C824B0:'m_testComputeCBVHeap', Sampler Descriptor Heap: <not set>, Pipeline State: 0x000001F3C5973380:'m_testComputePipeline',  [ EXECUTION ERROR #936: GPU_BASED_VALIDATION_DESCRIPTOR_HEAP_INDEX_OUT_OF_BOUNDS]

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Turns out the hang wasn't 100%. It 'Succeeded' and render the cube after the test for the first few times, but did hang on a later run. The GPU-Based Validation error is still there though.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Advertisement

@ajmiles Interesting I don't have any GPU validation errors. Did you change anything in the code or perhaps gobal D3D12 or driver settings? I've tried removing the root constants, setting the UAV register space to 0, and hardcoding g_count to 128 in the shader so that there's only the UAV but that had no effect. I also tried switching it from a RWStructuredBuffer to just RWBuffer but that also had no effect. No matter what I do numthreads() with 32 (or any power of 2) fails and numthreads() with 31 (or any non-power of 2) succeeds. I don't suppose there's any other insight you can provide on your end given that I'm not getting the validation errors? Presumably if the descriptor heap and root descriptor settings were actually invalid it wouldn't be able to successfully write with a non-power of 2 dispatch?

It's possible that the version I'm on (16251) has newer GPU Validation bits that what you're running.

What version of Windows 10 are you running? Run 'winver' at a command prompt and there should be an OS Build number in parentheses.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

That could be it. I'm on build 15063.483 (Creators Update). It looks like you're using a July 26 Windows Insider Preview build. That still doesn't explain why it would be able to successfully write with a non-power of 2 if the descriptor heap was corrupt but not with a power of 2? Do you see anything I'm doing wrong with my descriptor heap?

I am having the same problem as described.

Any suggestions?

I do not encounter the same problem with the same compute shader on

1. RX 480

2. Warp device

Only Nvidia

This topic is closed to new replies.

Advertisement