Advertisement

DX12 Occlusion Queries

Started by December 01, 2017 12:48 PM
11 comments, last by Infinisearch 7 years, 1 month ago

Hi!

I wonder if I can achieve the same (not quite optimal) CPU readback of occlusion queries as with DX11.


u64 result = 0;
HRESULT hr = deviceCtx11->GetData(id3d11Query, result, sizeof(u64), D3D11_ASYNC_GETDATA_DONOTFLUSH);
if (S_OK == hr) return "ready"; else "not ready";

This happens on the CPU. I'm able to see if it's ready or not and do other stuff it isn't.

In DX12, ResolveQueryData obviously happens on the GPU. If I put a fence after ResolveQueryData, I can be sure it copied the results into my buffer. However I wonder, if there's any other way then inserting fences after each EndQuery to see if the individual queries already finished. It sounds bad and I guess the fence might do some flushing.

I first want to implement what other platforms in our engine do, before changing all of them to some more sensible batched occlusion query querying model.

Thanks for any remarks.

15 minutes ago, pcmaster said:

It sounds bad and I guess the fence might do some flushing.

The flushing in the D11 flag refers to submitting previously made draw calls to the GPU (the equivalent of finishing the immediate context and calling ID3D12CommandQueue::ExecuteCommandLists). The no-flush flag means "don't call ExecuteCommandLists" before checking the query results. 

Though, yes, I wouldn't be surprised if fences caused some kind of GPU cache flushing... But this would generally be a requirement for the GPU to be completely sure that data has reached RAM before it tells the CPU that the data is ready. 

Advertisement

So the expected CPU-readback approach on PC should be inserting a fence after ResolveQueryData and waiting on it on CPU.

Btw, Hodgman, just out of curiosity, do you know by any chance on GCN, if already at the bottom-of-pipe it writes the query results for each of the 4/8 DBs, based on counters, into the backing memory? Or are some caches (DB?) involved?

Out of curiosity have you considered conditional rendering? (predication)

https://msdn.microsoft.com/en-us/library/windows/desktop/dn903927(v=vs.85).aspx

 

-potential energy is easily made kinetic-

Sure but the time budget doesn't allow right now :(

5 hours ago, pcmaster said:

So the expected CPU-readback approach on PC should be inserting a fence after ResolveQueryData and waiting on it on CPU.

Btw, Hodgman, just out of curiosity, do you know by any chance on GCN, if already at the bottom-of-pipe it writes the query results for each of the 4/8 DBs, based on counters, into the backing memory? Or are some caches (DB?) involved?

Yeah. Or you could just fence N times per frame, and check the fence that proceeds the query that you're checking. Could even just fence once per frame and accept a full frame of query latency. 

Sorry I'm not too experienced with queries so don't know any low details,  because in my book they're a horrible hack for visibility culling (getting results to a problem long after you were required to have answers always rubbed me the wrong way). 

Advertisement

I agree it's a horrible solution.

One last thought. By reading back the query results on CPU, I decide not to issue the draws already on CPU. Therefore I save the CPU time needed to prepare the constant buffers, descriptor tables, set other states, etc.. With GPU predication, I'd still have to prepare each draw, possibly in vain.

This is all only valid for a "traditional" renderer without fancy on-GPU command list building.

I had recently come across an article on retrofitting a dx11 renderer with GPU based occlusion culling.  Maybe you'll find it useful.

https://interplayoflight.wordpress.com/2017/11/15/experiments-in-gpu-based-occlusion-culling/amp/

-potential energy is easily made kinetic-

Thank you for the article. It's very interesting, however in the engine (and rather types of games) I'm implementing DX12 into, we don't happen to be instancing that very much and that approach doesn't lower the CPU cost - the higher level still has to prepare the data for each draw, which isn't negligible. But the approach sounds very good for many applications.

This topic is closed to new replies.

Advertisement