Advertisement

Async Compute Structure

Started by August 12, 2017 08:39 AM
1 comment, last by JoeJ 7 years, 6 months ago

I have been reading about async compute in the new apis and it all sounds pretty interesting.

Here is my basic understanding of the implementation of async compute in a simple application like computing the Mandelbrot fractal:

In this case, the compute queue generates a texture of the fractal and the graphics queue presents it.

Program structure:


// Create 3 UAV textures for triple buffering

// Create 3 fences for compute queue

beginCmd(computeCmd);
cmdDispatch(computeCmd);
endCmd(computeCmd);
queueSubmit(computeQueue, fence[frameIdx]);

if (!getFenceReady(fence[frameIdx - 1])
    waitForFences(fence[frameIdx - 1]);

beginCmd(graphicsCmd);
cmdDraw(uavTexture[frameIdx - 1]);
endCmd(graphicsCmd);
queueSubmit(graphicsQueue);

I am not sure about one thing in this structure

  • All the examples I have seen use vkWaitForFences but I thought fences are used for waiting from the CPU for the GPU to complete. Should I use semaphores instead, so the graphics queue waits on the GPU for the compute queue to finish if it's running faster than the compute queue?

Any advice on this will really help to make efficient use of async compute.

Agree, probably semaphores are better. I've used them to sync between multiple compute queues and assume this works between graphics and compute queue as well. But each semaphore is still expensive and inserts a bubble.

This thread some has some related details: https://www.gamedev.net/forums/topic/690700-question-concerning-internal-queue-organisation/

In short:

Async compute happens automatically within a single queue as long as there are no pipeline barriers.

A second compute queue keeps working while the first one is stalled due to a pipeline barrier.

(still not 100% sure about this and also i did not test anything on Nvidia yet...)

 

This topic is closed to new replies.

Advertisement