Advertisement

how to make 2d rendering on directx 11 more faster

Started by January 13, 2022 09:05 PM
2 comments, last by Geri 2 years, 11 months ago

I make an editor for own game engine and I used for 2d rendering Direct2D but I wanted to render a directx texture

on window in editor, I didn't find convenient way to convert directx texture to direct2d bitmap. I decided to make own 2d render on directx 11 and I made that but I have poor performance. I knew that my realization to be slower

than direct2d but not in 4 times, how I can make better performance. I made simple profiling, more time to take

call DrawIndexed from directx 11 I don't understand why drawing is so slow. which optimization can I make ?

My 2d render pipeline looks simple. When call draw_rect fucntion function generates vertices and indices for a rect

than a struct which holds vertices and indices is added to na array this from the array draw_primitives takes data and

gets started to render and in the end of frame the array is reseted.

void Render_2D::draw_primitives()
{
	if (total_vertex_count == 0) {
		return;
	}

	static u32 privious_total_vertex_count;
	static u32 privious_total_index_count;

	if (!vertex_buffer || (privious_total_vertex_count != total_vertex_count)) {
		privious_total_vertex_count = total_vertex_count;
		
		free_com_object(vertex_buffer);
		
		vertex_buffer = make_vertex_buffer(sizeof(Vertex_XC), total_vertex_count, NULL, D3D11_USAGE_DYNAMIC, D3D11_CPU_ACCESS_WRITE);
	}

	if (!index_buffer || (privious_total_index_count != total_index_count)) {
		privious_total_index_count = total_index_count;

		free_com_object(index_buffer);
		
		index_buffer = make_index_buffer(total_index_count, NULL, D3D11_USAGE_DYNAMIC, D3D11_CPU_ACCESS_WRITE);
	}

	D3D11_MAPPED_SUBRESOURCE buffer;
	ZeroMemory(&buffer, sizeof(D3D11_MAPPED_SUBRESOURCE));
		
	HR(directx11.device_context->Map(vertex_buffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &buffer));

	D3D11_MAPPED_SUBRESOURCE i_buffer;
	ZeroMemory(&i_buffer, sizeof(D3D11_MAPPED_SUBRESOURCE));

	HR(directx11.device_context->Map(index_buffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &i_buffer));

	Vertex_XC *p1 = (Vertex_XC *)buffer.pData;
	u32 *p2 = (u32 *)i_buffer.pData;
	
	Primitive_2D *primitive = NULL;
	For(primitives, primitive) {

		memcpy((void *)p1, primitive->vertices.items, primitive->vertices.count * sizeof(Vertex_XC));
		memcpy((void *)p2, primitive->indices.items, primitive->indices.count * sizeof(u32));
		p1 += primitive->vertex_offset;
		p2 += primitive->index_offset;
	}

	directx11.device_context->Unmap(index_buffer, 0);
	directx11.device_context->Unmap(vertex_buffer, 0);

	Fx_Shader *shader = fx_shader_manager.get_shader("color");
	shader->bind("world_view_projection", &render_sys.view_info->orthogonal_matrix);
	shader->attach("draw_vertex_on_screen");


	D3D11_DEPTH_STENCIL_DESC depth_stencil;
	ZeroMemory(&depth_stencil, sizeof(D3D11_DEPTH_STENCIL_DESC));
	depth_stencil.DepthEnable = false;
	depth_stencil.DepthFunc = D3D11_COMPARISON_ALWAYS;
	depth_stencil.DepthWriteMask = D3D11_DEPTH_WRITE_MASK_ALL;
	depth_stencil.StencilEnable = false;

	ID3D11DepthStencilState *state = NULL;
	directx11.device->CreateDepthStencilState(&depth_stencil, &state);

	directx11.device_context->OMSetDepthStencilState(state, 0);

	primitive = NULL;
	For(primitives, primitive) {
		draw_indexed_traingles(vertex_buffer, sizeof(Vertex_XC), "vertex_color", index_buffer, primitive->indices.count, primitive->vertex_offset, primitive->index_offset);
	}

	directx11.device_context->RSSetState(0);
}

How often are you going into that branch where you re-recreate the vertex and index buffers? If you're constantly hitting that you could easily destroy your performance. Gathering some stats or running a sampling profiler should help you find out pretty quickly. Either way you really shouldn't need to re-create the buffers just because the number of primitives changed…you only need to do that if the number of primitives is larger than what you can handle. You might want to try pre-creating the buffers to some size you think is “big enough” and then only re-creating if you go over that, and in that case growing the buffer by some large amount (perhaps doubling it, or making it 1.5x as big)

Beyond that, you generally don't want to do a single draw per quad if you can help it. You'll get a lot of CPU overhead, and the GPU performance also won't be optimal. It looks like you're already putting all of your quads into a big combined vertex + index buffer, why not just do one draw? If you're changing textures or other states between each primitive then you may want to consider sorting your primitives by state and then drawing everything you can in a single batch.

Advertisement

you seem to be hammering the api and the memory system for no reason.

if (!vertex_buffer || (privious_total_vertex_count != total_vertex_count)) {
to:
if ((!vertex_buffer) || (privious_total_vertex_count <= total_vertex_count)) {

if (!index_buffer || (privious_total_index_count != total_index_count)) {
to

if ((!index_buffer) || (privious_total_index_count <= total_index_count)) {

the first two zeromemory on buffers is even required? you seem to overwrite the memory anyway.

This topic is closed to new replies.

Advertisement