I am making my first attempt at D3D's `DrawInstanced()` and I've reached a point where I'm genuinely stumped after a couple of days. My setup is comparatively simple: only intended for drawing zillions of quads into a space using an orthographic projection. Instead, the behavior I'm seeing is that it only draws the first quad of the batch.
First, here's the data structure that becomes the vertex shader's `cbuffer`:
struct PerObjectData {
glm::mat4 world_matrix;
glm::vec4 color;
uint32_t texture_index;
float tiling_factor;
char padding[8];
};
And here's how it looks shader side:
cbuffer PerObjectData {
float4x4 world_matrix;
float4 color;
uint texture_index;
float tiling_factor;
};
During initialization, I set up what's necessary, including the above:
//
// Per Object Buffer
//
UINT perobject_buffer_size = sizeof(PerObjectData) * _batch.max_quads;
D3D11_BUFFER_DESC perobject_buffer_desc { };
perframe_buffer_desc.Usage = D3D11_USAGE_DYNAMIC;
perframe_buffer_desc.BindFlags = D3D11_BIND_CONSTANT_BUFFER;
perframe_buffer_desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
perframe_buffer_desc.ByteWidth = perobject_buffer_size;
hr = device->CreateBuffer(&perframe_buffer_desc, 0, &perobject_buffer);
if(FAILED(hr)) {
PDR_ENGINE_ERROR("Failed to create world matrix buffer: ({}) {}",
hr, DX11Window::get_last_error_as_string());
assert(false);
return;
}
DX11Context::device_context()->
VSSetConstantBuffers(1, 1, &perobject_buffer);
There's a per-frame buffer living in `StartSlot` 0, hence this one getting slot 1.
The user/game code is supposed to call `BeginScene()`, at which point the batching begins. The data looks good while it lives in system memory. Once there's a call to `EndScene()`, I "flush" the batch and call draw. For my testing, I've only got a batch of two.
void Renderer2D::_flush() {
HRESULT hr = S_OK;
ID3D11DeviceContext *device_context = DX11Context::device_context();
// Update per object data/constant buffer
D3D11_MAPPED_SUBRESOURCE perobject_buffer_map;
hr = device_context->Map(perobject_buffer, 0,
D3D11_MAP_WRITE_DISCARD, 0,
&perobject_buffer_map);
if(FAILED(hr)) {
PDR_ENGINE_ERROR("Could not map instance buffer: ({}) {}",
hr, DX11Window::get_last_error_as_string());
assert(false);
return;
}
size_t ob_size = sizeof(PerObjectData) * quad_count();
memcpy(perobject_buffer_map.pData, _batch.instances, ob_size);
device_context->Unmap(perobject_buffer, 0);
_scene->texture_shader->bind();
for(uint32_t tex = 0; tex < _batch.texture_count; tex++) {
_batch.texture_slots[tex]->bind(tex);
}
UINT indices = sizeof(_indices) / sizeof(_indices[0]);
device_context->DrawIndexedInstanced(indices, quad_count(), 0, 0, 0);
_stats.draw_calls++;
}
Now... what's weird is that if I do the copy like this:
auto data = reinterpret_cast<PerObjectData *>(perobject_buffer_map.pData);
for(uint32_t instance = 0; instance < quad_count(); instance++) {
data->world_matrix = _batch.instances[instance].world_matrix;
data->color = _batch.instances[instance].color;
data->texture_index = _batch.instances[instance].texture_index;
data->tiling_factor = _batch.instances[instance].tiling_factor;
data++;
}
It produces the exact same results. But if I omit the `data++` (which I did forget originally) it only draws the _second_ quad. With the increment included, both the `memcpy()` and `for` loop behave identically, only drawing the first.
I've checked with RenderDoc, and I am only getting one `PerObjectData` being copied over. The `struct` is 96 bytes, and the buffer on the GPU is 96,000 bytes (max of 1,000 quads for now). So did I manage to mark the rest of the buffer as read-only or something? `quad_count()` is returning the correct value, `ob_size` is being set to 192 bytes, so `memcpy()` should be fine. But even if it wasn't, the `for` loop should be so explicit as to be idiot proof. And yet... =)
Any pointers or hints would be most welcome. Thanks in advance!