Advertisement

Timestamp, counters in tile based rendering

Started by November 03, 2015 06:59 AM
1 comment, last by Pedambe 9 years, 1 month ago

Hi guys,

I was wondering how timestamping works in case of GPUs with tile based architecture, especially the draw call timestamps.

Something like

TimestampedDraw()

{

t1 = GetTimestamp()

Draw()

t2 = GetTimestamp()

return (t2-t1)

}

Two issues that I see

  1. Draw call might not be complete (Deferred rendering) when the query is issued. How does the driver return timestamp in this case? Will it result in a flush ?
  2. Is the time spent by the GPU on tiling the primitives considered while returning the timestamp for a draw ?

Regards,

Pedambe

That psuedocode won't work on any GPU, due to the fact that all commands go into a buffer that the GPU executes later.
Generally, the process for timing something looks like:
      t1_handle = SubmitTimestamp()
      Draw()
      t2_handle = SubmitTimestamp()
      e = SubmitEvent();

      ... at least frame later ... 
       WaitForEvent();// this will stall the CPU until the GPU has finished those commands
       //that's why we do it (AT LEAST) a frame later, to minimize the chance of actually stalling

       t1 = ReadbackTimestamp(t1_handle);
       t2 = ReadbackTimestamp(t2_handle);
       t = t2-t1;
GPUs have different event types internally -- the two main ones are events that are triggered as soon as the GPU front-end / command-processor receives the event command. A timestamp even of this type will read the clock immediately. In the above code, that would result in an incorrect timing of the draw call, as the three commands (timestamp, draw, timestamp) will pass through the command-processor in quick succession - while the draw command is still being processed in the background.
The second type of event is an end-of-pipe event. Timestamps of this type will only read the clock after all preceeding workloads have finished completely. This can be used to accurately measure the timing of the draw command, but, it may interfere with the performance of the GPU. Normally the GPU may be able to have two different draw commands running in parallel, slightly overlapping in time. To correctly time one of them, the GPU must insert a small stall in between the two draw commands and break any parallelism. Therefore you should try to avoid measuring individual draw calls and instead measure larger groups of them.

APIs often don't tell you what kind of even they will be using under the hood (start of pipe or end of pipe). You'll have to carefully read the spec/documentation of your timing API to find out... and if it doesn't say, then take your results with a large grain of salt.

As for tiled/deferred architectures, the situation is the same. All GPUs are parallel, latent, deferred in some way.
Advertisement

Thanks Hodgman !

This topic is closed to new replies.

Advertisement