FPS meter, Moving buffers to the GPU, and Using the stencil part of the depth-stencil

posted in Lyost's Journal
Published July 04, 2017
Advertisement

Update: source now available at https://github.com/lyost/d3d12_framework

While trying to build a couch and dealing with a broken pipe below the concrete floor of the basement, I've also been continuing playing with Direct3D12.  Since the last blog entry, I have implemented an FPS meter that uses a basic texture atlas for its display, added classes for having vertex and index buffers reside in GPU memory without direct CPU access, and I added a depth-fail shadow volume test case for adding use of the stencil part of the depth-stencil to the framework.

FPS Meter

large.fps_monitor.png.f5530f79f5dd47b9c0ffeed7d04bdbfb.png

So far in the framework, the Game base class passed the value of the fixed timestep to the update and draw functions as the elapsed time.  In order to compute the actual number of frames per second, the actual elapsed time between frames is needed instead.  So, both values are now provided as arguments to the update and draw functions.  This allows for it to easily be the choice of the game for which value to use, or it can use both.  This of course required a minor update to all the existing test programs to add in the additional argument even though they are still using the fixed timestep value.

The FPS meter itself is a library in the project named "fps_monitor" so it can be easily re-used for projects as needed.  The library is the FPSMonitor class and the shaders needed for rendering it.  The FPSMonitor calculates and displays the minimum, maximum, and average FPS over a configurable number of frames.  It has its own graphics pipeline for rendering.  So that it doesn't get bloated with code for loading different image formats or texture atlas data formats, the already loaded data is taken as arguments to the constructor.

The vertices sent to the vertex shader use projection space x and y coordinates that maintain the width and height of the character as provided to the FPSMonitor constructor (which means this works best with monospaced fonts), uv coordinates for the texture going from 0-1 in both dimensions, and the key into the texture atlas lookup table (initialized to 0, but the Update function fills in the desired value for that frame).


m_vertex_buffer_data[i * VERTS_PER_CHAR    ] = { XMFLOAT2(-1 + x,                y),                 XMFLOAT2(0.0f, 0.0f), 0 };
m_vertex_buffer_data[i * VERTS_PER_CHAR + 1] = { XMFLOAT2(-1 + x,                y - m_char_height), XMFLOAT2(0.0f, 1.0f), 0 };
m_vertex_buffer_data[i * VERTS_PER_CHAR + 2] = { XMFLOAT2(-1 + x + m_char_width, y - m_char_height), XMFLOAT2(1.0f, 1.0f), 0 };
m_vertex_buffer_data[i * VERTS_PER_CHAR + 3] = { XMFLOAT2(-1 + x + m_char_width, y),                 XMFLOAT2(1.0f, 0.0f), 0 };

The texture atlas lookup table is provided to the vertex shader through a constant buffer that is an array of the uv coordinates to cover a rectangle for that entry.


struct LookupTableEntry
{
  float left;
  float right;
  float top;
  float bottom;
};

cbuffer LOOKUP_TABLE : register(b0)
{
  LookupTableEntry lookup_table[24];
}

The combination of the 0-1 uv coordinates on each vertex and the lookup table index allow for the vertex shader to easily compute the uv coordinates for the particular character in the texture atlas.


output.uv.x = (1 - input.uv.x) * lookup_table[input.lookup_index].left + input.uv.x * lookup_table[input.lookup_index].right;
output.uv.y = (1 - input.uv.y) * lookup_table[input.lookup_index].top  + input.uv.y * lookup_table[input.lookup_index].bottom;

An alternative approach would be to skip the index field in the vertex data and update the uv coordinates on the host so that the vertex shader becomes more of a pass through.

In order to test that the FPS values are being computed correctly, the test program needs the frame rate to vary.  Conceptually there are 2 ways to accomplish this within a program.  One is to switch between different content for one set that don't stress the system's rendering capabilities and one that does.  Another way, and the way taken in the test program, is to change the fixed timestep duration.  By pressing and releasing numpad 1, 2, or 3 the test program will move between 60, 30, or 24 FPS respectively.  While changing the frame rate up or down instantly changes the min or max FPS, the average FPS takes a little bit, based on the number of samples, to get to a steady value.  Assuming a system can handle the requested frame rate, once enough samples at the new frame rate have occurred to fill all of the sample slots in the FPSMonitor class, then all 3 should have the same value.

GPU Vertex and Index Buffers

The vertex and index buffers in the framework thus far have used D3D12_HEAP_TYPE_UPLOAD so that their memory can be mapped when their data needs to be updated.  While the FPS meter discussed in the previous section needs to update a vertex buffer every frame, this is a rare case.  Taking the common example of loading a model, normally after loading its vertex and index buffers wouldn't change.  So there is no need for CPU access after loading.  To cover this, there are additional classes for vertex and index buffers that use D3D12_HEAP_TYPE_DEFAULT named VertexBufferGPU_* and IndexBufferGPU16.  To populate or update the data in in these GPU-only buffers, the existing vertex and index buffer classes provide a PrepUpload function for the corresponding GPU-only type.  This adds to a command list for copying data between the two buffers.  The actual copying is done when the command list is executed.  Beyond the lack of CPU access, they function the same as the previously existing vertex and index buffers, so there's not too much to say about these.

Stencil Part of the Depth-Stencil Buffer

large.stencil.png.b403d34dc0585439fcb0d95f11019749.png

Up until now, the depth-stencil buffer has been used for just depth data.  Exercising the stencil portion of this buffer required framework updates to create a depth-stencil with an appropriate format (previously the depth-stencils were all DXGI_FORMAT_D32_FLOAT), adding the ability to configure the stencil when creating a pipeline, and an algorithm to use for a test case.

For the format, the DepthStencil class has an optional argument of "bool with_stencil" that if true will create the depth stencil with a format of DXGI_FORMAT_D32_FLOAT_S8X24_UINT.  If it is false (the default), the format will be DXGI_FORMAT_D32_FLOAT.

For configuring the stencil, the static CreateD3D12 functions in the Pipeline class had their "DepthFuncs depth_func" argument changed to "const DepthStencilConfig* depth_stencil_config".  If that argument is NULL, both the depth and stencil tests are disabled.  If it points to an instance of the DepthStencilConfig struct, then the depth and stencil test can be enabled or disabled individually along with the specifying the other configuration data.


/// <summary>
/// Enum of the various stencil operations
/// </summary>
/// <remarks>
/// Values must match D3D12_STENCIL_OP
/// </remarks>
enum StencilOp
{
  SOP_KEEP = 1,
  SOP_ZERO,
  SOP_REPLACE,
  SOP_INCREMENT_CLAMP,
  SOP_DECREMENT_CLAMP,
  SOP_INVERT,
  SOP_INCREMENT_ROLLOVER,
  SOP_DECREMENT_ROLLOVER
};

/// <summary>
/// Configuration for processing pixels
/// </summary>
struct StencilOpConfig
{
  /// <summary>
  /// Stencil operation to perform when stencil testing fails
  /// </summary>
  StencilOp stencil_fail;

  /// <summary>
  /// Stencil operation to perform when stencil testing passes, but depth testing fails
  /// </summary>
  StencilOp depth_fail;

  /// <summary>
  /// Stencil operation to perform when both stencil and depth testing pass
  /// </summary>
  StencilOp pass;

  /// <summary>
  /// Comparison function to use to compare stencil data against existing stencil data
  /// </summary>
  CompareFuncs comparison;
};

/// <summary>
/// Configuration for the depth stencil
/// </summary>
struct DepthStencilConfig
{
  /// <summary>
  /// true if depth testing is enabled.  false otherwise
  /// </summary>
  bool depth_enable;

  /// <summary>
  /// true if stencil testing is enabled.  false otherwise
  /// </summary>
  bool stencil_enable;

  /// <summary>
  /// Format of the depth stencil view.  Must be correctly set if either depth_enable or stencil_enable is set to true.
  /// </summary>
  GraphicsDataFormat dsv_format;

  /// <summary>
  /// true if writing to the depth portion of the depth stencil is allowed.  false otherwise.
  /// </summary>
  bool depth_write_enabled;

  /// <summary>
  /// Comparison function to use to compare depth data against existing depth data
  /// </summary>
  CompareFuncs depth_comparison;

  /// <summary>
  /// Bitmask for identifying which portion of the depth stencil should be used for reading stencil data
  /// </summary>
  UINT8 stencil_read_mask;

  /// <summary>
  /// Bitmask for identifying which portion of the depth stencil should be used for writing stencil data
  /// </summary>
  UINT8 stencil_write_mask;

  /// <summary>
  /// Configuration for processing pixels with a surface normal towards the camera
  /// </summary>
  StencilOpConfig stencil_front_face;

  /// <summary>
  /// Configuration for processing pixels with a surface normal away from the camera
  /// </summary>
  StencilOpConfig stencil_back_face;
};

After those changes it was onto an algorithm to use as a test case.  While over the years I've read up on different algorithms that use the stencil, I haven't implemented one before.  I ended up picking depth-fail shadow volume using both the Wikipedia article and http://joshbeam.com/articles/stenciled_shadow_volumes_in_opengl/ for reference (I don't plan for this entry to be a tutorial on depth-fail, so I'd recommend those links if you want to read up on the algorithm).  The scene is a simple one comprised of an omnidirectional light source at (8, 0, 0), an occluder at (1, 0, 0), and a textured cube that can be moved in y and z with the arrow keys that is initially positioned at (-7, 0, 0).  The textured cube is initially in shadow, so the up, down, and left arrows allowed it to be moved so it can be partially or completely out of shadow or back into shadow.  For the right arrow key, there was an issue where the framework was always assuming D3D12_CULL_MODE_BACK which prevented the stencil buffer from being correct.  Since the stencil configuration in D3D12 allows different stencil operations for front faces and back faces, only 1 pass is needed for setting the stencil buffer when the cull mode is set to none.  By doing that, the model was correctly lit when moving out the shadow volume with the right arrow key as well.

Previous Entry Cleanup work
1 likes 0 comments

Comments

Nobody has left a comment. You can be the first!
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement