DX12 Occlusion Queries

Author

1,119

December 01, 2017 12:48 PM

Hi!

I wonder if I can achieve the same (not quite optimal) CPU readback of occlusion queries as with DX11.


u64 result = 0;
HRESULT hr = deviceCtx11->GetData(id3d11Query, result, sizeof(u64), D3D11_ASYNC_GETDATA_DONOTFLUSH);
if (S_OK == hr) return "ready"; else "not ready";

This happens on the CPU. I'm able to see if it's ready or not and do other stuff it isn't.

In DX12, ResolveQueryData obviously happens on the GPU. If I put a fence after ResolveQueryData, I can be sure it copied the results into my buffer. However I wonder, if there's any other way then inserting fences after each EndQuery to see if the individual queries already finished. It sounds bad and I guess the fence might do some flushing.

I first want to implement what other platforms in our engine do, before changing all of them to some more sensible batched occlusion query querying model.

Thanks for any remarks.

Hodgman

52,718

December 01, 2017 01:15 PM

15 minutes ago, pcmaster said:
It sounds bad and I guess the fence might do some flushing.

The flushing in the D11 flag refers to submitting previously made draw calls to the GPU (the equivalent of finishing the immediate context and calling ID3D12CommandQueue::ExecuteCommandLists). The no-flush flag means "don't call ExecuteCommandLists" before checking the query results.

Though, yes, I wouldn't be surprised if fences caused some kind of GPU cache flushing... But this would generally be a requirement for the GPU to be completely sure that data has reached RAM before it tells the CPU that the data is ready.

. 22 Racing Series .

pcmaster

Author

1,119

December 01, 2017 02:16 PM

So the expected CPU-readback approach on PC should be inserting a fence after ResolveQueryData and waiting on it on CPU.

Btw, Hodgman, just out of curiosity, do you know by any chance on GCN, if already at the bottom-of-pipe it writes the query results for each of the 4/8 DBs, based on counters, into the backing memory? Or are some caches (DB?) involved?

Infinisearch

3,058

December 01, 2017 02:25 PM

Out of curiosity have you considered conditional rendering? (predication)

https://msdn.microsoft.com/en-us/library/windows/desktop/dn903927(v=vs.85).aspx

-potential energy is easily made kinetic-

pcmaster

Author

1,119

December 01, 2017 02:36 PM

Sure but the time budget doesn't allow right now

Hodgman

52,718

December 01, 2017 07:47 PM

5 hours ago, pcmaster said:
So the expected CPU-readback approach on PC should be inserting a fence after ResolveQueryData and waiting on it on CPU.
Btw, Hodgman, just out of curiosity, do you know by any chance on GCN, if already at the bottom-of-pipe it writes the query results for each of the 4/8 DBs, based on counters, into the backing memory? Or are some caches (DB?) involved?

Yeah. Or you could just fence N times per frame, and check the fence that proceeds the query that you're checking. Could even just fence once per frame and accept a full frame of query latency.

Sorry I'm not too experienced with queries so don't know any low details, because in my book they're a horrible hack for visibility culling (getting results to a problem long after you were required to have answers always rubbed me the wrong way).

. 22 Racing Series .

pcmaster

Author

1,119

December 05, 2017 11:15 AM

I agree it's a horrible solution.

pcmaster

Author

1,119

December 08, 2017 11:40 AM

One last thought. By reading back the query results on CPU, I decide not to issue the draws already on CPU. Therefore I save the CPU time needed to prepare the constant buffers, descriptor tables, set other states, etc.. With GPU predication, I'd still have to prepare each draw, possibly in vain.

This is all only valid for a "traditional" renderer without fancy on-GPU command list building.

Infinisearch

3,058

December 08, 2017 03:15 PM

I had recently come across an article on retrofitting a dx11 renderer with GPU based occlusion culling. Maybe you'll find it useful.

https://interplayoflight.wordpress.com/2017/11/15/experiments-in-gpu-based-occlusion-culling/amp/

-potential energy is easily made kinetic-

pcmaster

Author

1,119

December 08, 2017 03:39 PM

Thank you for the article. It's very interesting, however in the engine (and rather types of games) I'm implementing DX12 into, we don't happen to be instancing that very much and that approach doesn't lower the CPU cost - the higher level still has to prepare the data for each draw, which isn't negligible. But the approach sounds very good for many applications.

DX12 Occlusion Queries

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

DX12 Occlusion Queries

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines