Hi everyone : )
I'm trying to implement SSAO with D3D12 (using the implementation found on learnopengl.com https://learnopengl.com/Advanced-Lighting/SSAO), but I seem to have a performance problem...
Here is a part of the code of the SSAO pixel shader :
Texture2D PositionMap : register(t0);
Texture2D NormalMap : register(t1);
Texture2D NoiseMap : register(t2);
SamplerState s1 : register(s0);
// I hard coded the variables just for the test
const static int kernel_size = 64;
const static float2 noise_scale = float2(632.0 / 4.0, 449.0 / 4.0);
const static float radius = 0.5;
const static float bias = 0.025;
cbuffer ssao_cbuf : register(b0)
{
float4x4 gProjectionMatrix;
float3 SSAO_SampleKernel[64];
}
float main(VS_OUTPUT input) : SV_TARGET
{
[....]
float occlusion = 0.0;
for (int i = 0; i < kernel_size; i++)
{
float3 ksample = mul(TBN, SSAO_SampleKernel[i]);
ksample = pos + ksample * radius;
float4 offset = float4(ksample, 1.0);
offset = mul(gProjectionMatrix, offset);
offset.xyz /= offset.w;
offset.xyz = offset.xyz * 0.5 + 0.5;
float sampleDepth = PositionMap.Sample(s1, offset.xy).z;
float rangeCheck = smoothstep(0.0, 1.0, radius / abs(pos.z - sampleDepth));
occlusion += (sampleDepth >= ksample.z + bias ? 1.0 : 0.0) * rangeCheck;
}
[....]
}
The problem is this for loop. When I run it, it takes around 140 ms to draw the frame (a simple torus knot...) on a GTX 770. Without this loop, it's 5ms. Running it without the PositionMap sampling and the matrix multiplication takes around 25ms. I understand that matrix multiplication and sampling are "expensive", but I don't think it's enough to justify the sluggish drawing time.
I suppose the shader code from the tutorial is working, so unless I've made something terribly stupid that I don't see I suppose my problem comes from something I did wrong with D3D12 that I'm not aware of (I just started learning D3D2).
Both PositionMap and NormalMap are render targets from the gbuffer, for each one I created two DescriptorHeap : one as D3D12_DESCRIPTOR_HEAP_TYPE_RTV and one as D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV, and called both CreateRenderTargetView and CreateShaderResourceView.
The NoiseMap only has one descriptor heap of type D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV.
Before calling DrawIndexedInstanced for the SSAO pass, I copy the relevant to a descriptor heap that I then bind, like so :
CD3DX12_CPU_DESCRIPTOR_HANDLE ssao_heap_hdl(_pSSAOPassDesciptorHeap->GetCPUDescriptorHandleForHeapStart());
device->CopyDescriptorsSimple(1, ssao_heap_hdl, _gBuffer.PositionMap().GetDescriptorHeap()->GetCPUDescriptorHandleForHeapStart(),
D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
ssao_heap_hdl.Offset(CBV_descriptor_inc_size);
device->CopyDescriptorsSimple(1, ssao_heap_hdl, _gBuffer.NormalMap().GetDescriptorHeap()->GetCPUDescriptorHandleForHeapStart(),
D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
ssao_heap_hdl.Offset(CBV_descriptor_inc_size);
device->CopyDescriptorsSimple(1, ssao_heap_hdl, _ssaoPass.GetNoiseTexture().GetDescriptorHeap()->GetCPUDescriptorHandleForHeapStart(),
D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
ID3D12DescriptorHeap* descriptor_heaps[] = { _pSSAOPassDesciptorHeap };
pCommandList->SetDescriptorHeaps(1, descriptor_heaps);
pCommandList->SetGraphicsRootDescriptorTable(0, _pSSAOPassDesciptorHeap->GetGPUDescriptorHandleForHeapStart());
pCommandList->SetGraphicsRootConstantBufferView(1, _cBuffSamplesKernel[0].GetVirtualAddress());
Debug/Release build give me the same results, so do shader compilation flags with/without optimisation.
So does anyone see something weird in my code that would cause the slowness ?
By the way, when I run the pixel shader in the graphics debugger, this line :
offset.xyz /= offset.w;
does not seem to produce the expected results, the two lines in the following table are the values in the debugger before and after the execution of that line of code
Name | Value | Type | |
---|---|---|---|
offset offset | x = -1.631761000, y = 1.522913000, z = 2.634875000, w = 2.634875000 x = -0.619293700, y = 0.577983000, z = 2.634875000, w = 2.634875000 | float4 float4 |
so X and Y are okay, not Z.
Please tell me if you need more info/code.
Thank you for your help !