Advertisement

Binding buffers and then updating them

Started by October 16, 2017 05:27 AM
15 comments, last by noodleBowl 7 years, 3 months ago
1 hour ago, Hodgman said:

You can have a single vertex shader that takes positions + colours as inputs, but create three different input layouts so that you can use that one shader with three different storage formats, e.g.

When it comes to your structs like Layout1Stream0 and my VertexTypeA I think we are talking about the same thing here. EG:


//The vertex types
class VertexTypeA
{
	Vector3 pos; //Holds 3 floats: x, y, z
	Color color; //Holds 4 floats: r, g, b, a
};

class VertexTypeB
{
	Vector3 pos; //Holds 3 floats: x, y, z
};

class VertexTypeC
{
	Color color; //Holds 4 floats: r, g, b, a
};

//======== Create and use the first input layout
D3D11_INPUT_ELEMENT_DESC layout1[] = {
	{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },
	{ "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
};
d3dDevice->CreateInputLayout(layout1, 2, shaderCode->GetBufferPointer(), shaderCode->GetBufferSize(), &inputLayout1);

D3D11_BUFFER_DESC bufferDescription;
ZeroMemory(&bufferDescription, sizeof(D3D11_BUFFER_DESC));
bufferDescription.Usage = D3D11_USAGE_DYNAMIC;
bufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
bufferDescription.ByteWidth = sizeof(VertexTypeA) * 3; //Enough for a triangle
graphicsDevice->device->CreateBuffer(&bufferDescription, NULL, &bufferVertexTypeA);

//Use the buffer/layout for input layout 1
UINT stride = sizeof(Vertex);
UINT offset = 0;
graphicsDevice->deviceContext->IASetVertexBuffers(0, 1, &bufferVertexTypeA, &stride, &offset);


//======== Create and use the second input layout
D3D11_INPUT_ELEMENT_DESC layout2[] = {
	{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },
	{ "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
};
d3dDevice->CreateInputLayout(layout2, 2, shaderCode->GetBufferPointer(), shaderCode->GetBufferSize(), &inputLayout2);

//Position buffer
D3D11_BUFFER_DESC bufferDescription;
ZeroMemory(&bufferDescription, sizeof(D3D11_BUFFER_DESC));
bufferDescription.Usage = D3D11_USAGE_DYNAMIC;
bufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
bufferDescription.ByteWidth = sizeof(VertexTypeB) * 3; //Enough for a triangle
graphicsDevice->device->CreateBuffer(&bufferDescription, NULL, &bufferVertexTypeB);

//Color buffer
D3D11_BUFFER_DESC bufferDescription;
ZeroMemory(&bufferDescription, sizeof(D3D11_BUFFER_DESC));
bufferDescription.Usage = D3D11_USAGE_DYNAMIC;
bufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
bufferDescription.ByteWidth = sizeof(VertexTypeC) * 3; //Enough for a triangle
graphicsDevice->device->CreateBuffer(&bufferDescription, NULL, &bufferVertexTypeC);

//Use the buffers/layout for input layout 2
UINT strides[2];
strides[0] = sizeof(VertexTypeB); 
strides[1] = sizeof(VertexTypeC); 

UINT offsets[2];
offsets[0] = 0;
offsets[1] = 0;

ID3D11Buffer* buffers[2];
buffers[0] = bufferVertexTypeB;	
buffers[1] = bufferVertexTypeC;

deviceContext->IASetInputLayout(inputLayout2);
deviceContext->IASetVertexBuffers(0, 2, buffers, strides, offsets);

But yeah that makes sense that you could just have the one vertex shader to handle all of the different formats assuming they are more or less the same

I feel like it might be better to just create a new layout, but could you omit a property from use? Eg: you setup a layout to have 3 properties (Position, Color, Normal), but one of the things you want to draw only uses the properties Position and Normal where potentially everything else uses all 3 properties

It's totally fine to have a "fat" vertex with many attributes and only use a subset in various vertex shaders. For example the shadow map VS will only need the position and maybe uv0 (for alpha test), but the gbuffer VS will use all the attributes.

You don't need to prepare two vertex buffers of the same mesh, just reuse the same one in all the necessary passes. It doesn't have to be multiple "streams", the data can be interleaved. Both options have slightly different performances, but I wouldn't be concerned with this at all, in the beginning.

A layout is a "view" of the vertex data, which is just a bunch of bytes in memory. Layout tells the VS which offset each "variable" rests on, where to fetch it from.

Switching a vertex shader has a cost. Switching layouts also has a cost. Switching buffers has no cost.

Advertisement
On 10/16/2017 at 2:27 AM, noodleBowl said:

I got a quick question about buffers when it comes to DirectX 11. If I bind a buffer using a command like:



IASetVertexBuffers
IASetIndexBuffer
VSSetConstantBuffers
PSSetConstantBuffers

 and then later on I update that bound buffer's data using commands like Map/Unmap or any of the other update commands.

You can do that. What you cannot do is to issue Draw commands (or compute dispatches) and update the buffers later; which is something you could do with D3D12 as long as the command buffer hasn't been submitted.

 

As for performance, if you use D3D11_MAP_WRITE_NO_OVERWRITE and then issue one D3D11_MAP_WRITE_DISCARD when bigBufferIsNotFull is false (do not forget to reset this bool! the pseudo code you posted doesn't reset it!) you'll be fine.

Also allocating everything dynamic in one big pool is fine. Just a few caveats to be aware:

  • Texture buffers you cannot use D3D11_MAP_WRITE_NO_OVERWRITE unless you're on D3D11.1 on Windows 8 or higher. You always have to issue D3D11_MAP_WRITE_DISCARD.
  • Discarding more than 4MB per frame overall will cause stalls on AMD drivers. And while NVIDIA drivers can handle more than 4MB, it will likely break in really bad ways (I've seen HW bugs to pop up)
  • In Ogre3D 2.1 we do the following on D3D11 systems (i.e. not D3D11.1 and Win 8):
    • Dynamic vertex & index buffers in one big dynamic pool with the no_overwrite / then discard pattern.
    • Dynamic const buffers separately; one API const buffer per "buffer" as in our representations. Though the ideal with D3D11 is to reuse the same const buffer over and over again using MAP DISCARD. We do not use many const buffers though.
    • Dynamic texture buffers also separately, one API tex buffer per "buffer" as in our representations.

 

On 10/19/2017 at 12:55 PM, Matias Goldberg said:

What you cannot do is to issue Draw commands (or compute dispatches) and update the buffers later

Not sure what you mean here? Wouldn't this be issuing draw commands and then updating the buffers?


//Draw a Cube
UpdateBufferWithCubeData();
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

//Draw a Pyramid
UpdateBufferWithPyramidData();
graphicsDevice->deviceContext->Draw(pyramid.vertexCount, 0);

//Present everything
graphicsDevice->swapChain->Present(0, 0);

 

On 10/19/2017 at 12:55 PM, Matias Goldberg said:

one API const buffer per "buffer" as in our representations

one API tex buffer per "buffer" as in our representations

Not entirely sure what you mean here either. Do you mean that you have one const/tex buffer per set of APIs as in you have const buffer for the camera, a const buffer for setting colors on primitives, etc. Instead of having a singular const/tex buffer that could handle all of that?

On 10/19/2017 at 12:55 PM, Matias Goldberg said:

As for performance, if you use D3D11_MAP_WRITE_NO_OVERWRITE and then issue one D3D11_MAP_WRITE_DISCARD when bigBufferIsNotFull is false (do not forget to reset this bool! the pseudo code you posted doesn't reset it!) you'll be fine.

With the D3D11_MAP_WRITE_NO_OVERWRITE/D3D11_MAP_WRITE_DISCARD pattern or even in general is mesh/vertex data held in an intermediate place traditionally and then copied into the buffer?

I've seen a lot of tutorials like this one Lesson 5: Drawing a Triangle where they just place the data into an array and copy it into the buffer. Wasn't sure if this is just because its a beginners tutorial and they are showing the basics or if there is a better way to do it

2 hours ago, noodleBowl said:

Not sure what you mean here? Wouldn't this be issuing draw commands and then updating the buffers?



//Draw a Cube
UpdateBufferWithCubeData();
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

//Draw a Pyramid
UpdateBufferWithPyramidData();
graphicsDevice->deviceContext->Draw(pyramid.vertexCount, 0);

//Present everything
graphicsDevice->swapChain->Present(0, 0);

 

The example you posted is fine. What I meant is that you cannot do the following:


//Draw a Cube
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);
UpdateBufferWithCubeData(); //Update the cube that will be used in the draw above^

This is not valid in D3D11, but it is possible (with certain care taken) in D3D12 and Vulkan.

 

2 hours ago, noodleBowl said:

Not entirely sure what you mean here either. Do you mean that you have one const/tex buffer per set of APIs as in you have const buffer for the camera, a const buffer for setting colors on primitives, etc. Instead of having a singular const/tex buffer that could handle all of that?

No, I meant what is explained here and here. Basically the following is preferred:


//Draw a Cube
void *data = constBuffer->Map( DISCARD );
memcpy( data, ... );
bindVertexBuffer( constBuffer );
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

//Draw a Sphere
data = constBuffer->Map( DISCARD );
memcpy( data, ... );
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

over the following:


//Draw a Cube
void *data = constBuffer0->Map( DISCARD );
memcpy( data, ... );
bindVertexBuffer( constBuffer0 );
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

//Draw a Sphere
data = constBuffer1->Map( DISCARD ); //Notice it's constBuffer1, not constBuffer0
memcpy( data, ... );
bindVertexBuffer( constBuffer1 );
graphicsDevice->deviceContext->Draw(cube.vertexCount, 0);

This difference makes sense if we're talking about lots of const buffer DISCARDS per frame (e.g. 20k const buffer discards per frame). It doesn't make a difference if you have like 20 const buffer discards per frame.

Btw I personally never have 20k const buffer discards, as I prefer to keep large data (such as world matrices) in texture buffers.

 

2 hours ago, noodleBowl said:

With the D3D11_MAP_WRITE_NO_OVERWRITE/D3D11_MAP_WRITE_DISCARD pattern or even in general is mesh/vertex data held in an intermediate place traditionally and then copied into the buffer?

I've seen a lot of tutorials like this one Lesson 5: Drawing a Triangle where they just place the data into an array and copy it into the buffer. Wasn't sure if this is just because its a beginners tutorial and they are showing the basics or if there is a better way to do it

This pattern is used with  D3D11_USAGE_DYNAMIC buffers. These buffers are visible to both CPU and GPU. This means that actual memory is either stored in GPU RAM and your writes from CPU go directly through the PCIE bus, or that the buffer is stored in CPU RAM and GPU reads fetch directly via the PCIE bus. Whether is one or the other is controlled by the driver, though probably D3D11_CPU_ACCESS_READ and D3D11_CPU_ACCESS_WRITE provide good hints (a buffer that needs read access will likely end up CPU side, a buffer that has no read access will likely end up GPU side, but this is not a guarantee!).

 

What you're saying about an intermediate place, must be done by hand via staging buffers. Create the buffer with D3D11_USAGE_STAGING instead of DYNAMIC. Staging buffers are visible to both CPU and GPU, but the GPU can only use them in copy operations.

The idea is that you copy to the staging area from CPU, and then you copy from staging area to the final GPU RAM that is only visible to the GPU (i.e. the final buffer was created with D3D11_USAGE_DEFAULT). Or vice versa as well (copy from GPU to staging area, then read from CPU).

There's a gotcha: with staging buffers you can't use D3D11_MAP_WRITE_NO_OVERWRITE nor D3D11_MAP_WRITE_DISCARD. But you have the D3D11_MAP_FLAG_DO_NOT_WAIT flag. If you get a DXGI_ERROR_WAS_STILL_DRAWING when you tried to map the staging buffer with this flag, then the GPU is not done yet copying from/to the staging buffer and you must use another one (i.e. create a new one, or reuse an old one from a pool).

What's the difference between STAGING and DYNAMIC approaches? The PCIE has lower bandwidth than GPU's dedicated memory (and probably higher latency). If you write from CPU once, and GPU reads that data once, then use DYNAMIC.

But if the data will be read by the GPU over and over again, you may end up fetching the data multiple times from CPU RAM through the PCIE; therefore use the STAGING approach to perform the transfer through the PCIE once, and then the data is kept in the fastest RAM available.

This advice holds for dedicated GPUs. Integrated GPUs using staging aggressively may hurt since there is no PCIE, you'll just be burning CPU RAM bandwidth doing useless copies.

And for reading GPU -> CPU, you have no choice but to use staging.

So it's a good idea to write a system that can switch between strategies based on what's faster depending on each system.

On 10/23/2017 at 12:30 AM, Matias Goldberg said:

What you're saying about an intermediate place, must be done by hand via staging buffers. Create the buffer with D3D11_USAGE_STAGING instead of DYNAMIC. Staging buffers are visible to both CPU and GPU, but the GPU can only use them in copy operations.

The idea is that you copy to the staging area from CPU, and then you copy from staging area to the final GPU RAM that is only visible to the GPU (i.e. the final buffer was created with D3D11_USAGE_DEFAULT). Or vice versa as well (copy from GPU to staging area, then read from CPU).

I'm not sure if the intermediate place you are describing and the one that I'm thinking about are the same based on the description of CPU to Staging area to GPU. I was thinking like this


//Declared in renderItem's class. Intermidiate place that holds the renderItem's vertex data
Vertex vertices[6];

//Init vertex data for the renderItem in its constructor
vertices[0] = Vertex(Vector3(0.0f, 0.0f,     -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[1] = Vertex(Vector3(0.0f, 100.0f,   -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[2] = Vertex(Vector3(100.0f, 100.0f, -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[3] = Vertex(Vector3(0.0f, 0.0f,     -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[4] = Vertex(Vector3(100.0f, 100.0f, -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));
vertices[5] = Vertex(Vector3(100.0f, 0.0f,   -1.0f), Color(1.0f, 0.0f, 0.0f, 1.0f));


//Somewhere else in the application outside of the renderItem class
//Create the vertex buffer
VertexBuffer *vertexBuffer = createVertexBuffer(D3D11_USAGE_DYNAMIC, D3D11_CPU_ACCESS_WRITE, sizeof(Vertex) * 6);

//Place the vertex data from the renderItem ( data in vertex array [the intermidiate place] ) in the vertex buffer
D3D11_MAPPED_SUBRESOURCE resource = vertexBuffer->map(D3D11_MAP_WRITE_DISCARD);
Vertex *data = (Vertex*)resource.pData;
data[0] = renderItem.vertices[0];
data[1] = renderItem.vertices[1];
data[2] = renderItem.vertices[2];
data[3] = renderItem.vertices[3];
data[4] = renderItem.vertices[4];
data[5] = renderItem.vertices[5];
vertexBuffer->unmap();

 

On 10/23/2017 at 12:30 AM, Matias Goldberg said:

(such as world matrices)

Wait! Speaking of model matrix (another name for the world matrix right?) when using this big buffer / D3D11_MAP_WRITE_NO_OVERWRITE and D3D11_MAP_WRITE_DISCARD pattern I need to pretransform everything before I put it in the big buffer don't I? In order to achieve 1 draw call per full buffer excluding any state changes (shader change, texture change, etc). Otherwise I need to issue multiple draw calls because of potential differences in the model matrix per renderable. Right?

I'm guessing this D3D11_MAP_WRITE_NO_OVERWRITE and D3D11_MAP_WRITE_DISCARD pattern is only good for certain situations such as rendering sprites and particles. Where as rendering things like meshes/models should use some other technique

This topic is closed to new replies.

Advertisement