Advertisement

State vs Stateless Designing a modern GPU Interface

Started by June 08, 2018 04:12 PM
14 comments, last by Jman2 6 years, 7 months ago

I've also been thinking if there are better alternatives to picking the rendering order than simple radix sort, which can have abysmal results in some cases (i.e. 0111111 -> 1000000 -> 1111111 -> 2000000 etc). It is essentially a traveling salesman problem, which has plenty of decent approximate solutions, the question is, how much time are we prepared to dedicate to sorting.

I'm guessing the most reasonable way would be to always pick the draw item closest to the current state, using some LSH or tree-based structure to preprocess the draw list. It also raises a question of whether we need "don't care" values, because they can significantly reduce the cost of switching states.

21 hours ago, d07RiV said:

Another thing - when you put all passes in the same shader file, do you run a lexer on them, or do you just feed everything to shader compiler and let it figure out what to optimize away? The former option would us to know which options affect which passes, so we don't have to make redundant copies (instead of having to manually specify them for every pass).

edit: I guess this is partially answered by bonus slides.

Having a custom shader language / a full lexer would be great, but I did not spend the effort in this area.

Instead, I use HLSL and only parse the outputs from the HLSL compiler. This allows you to discover things like the resource bindings that are actually used by the optimized code, but does not let you discover things such as which options actually had an effect on the code generation (unless you want to brute force it by repeatedly compiling with different options enabled and comparing the compiler outputs for differences...). To declare extra shader meta-data (such as passes, options, resource-lists, etc), I embed Lua code within the shader source files that does this.

14 hours ago, d07RiV said:

've also been thinking if there are better alternatives to picking the rendering order than simple radix sort, which can have abysmal results in some cases (i.e. 0111111 -> 1000000 -> 1111111 -> 2000000 etc). It is essentially a traveling salesman problem, which has plenty of decent approximate solutions, the question is, how much time are we prepared to dedicate to sorting.

I've never really thought about putting that much work into sorting, but yeah I guess you could do quite a bit of analysis there :) My simple advice is to use a radix sort (or quick sort, etc), on integer state keys, where more significant bits represent more costly state changes (like render-targets or shaders) and less significant bits represent cheaper state changes (like constant/uniform values).

It highly depends on your content too -- in some games, you might often be changing just one texture per draw, but keeping 10 other texture bindings the same... But in other games, you might use 11 unique texture bindings per draw.

There's plenty of other optimization techniques that an rendering engine has to look into supporting too, which reduce the number of draw-calls in total: instanced draw-calls, dynamic instancing (appending multiple different meshes into a single contiguous set so they can be drawn at one time), CPU-side vertex transformations, GPU-side skinning / transformation arrays, texture arrays, texture atlases, material arrays in cbuffers instead of individual material constants/uniforms, indirect draw-calls, compute-shader draw culling, etc, etc... If you implemented all of these, hopefully you'd have a much smaller number of draws with much more unique state per draw.

Advertisement

Hm thanks, I'll try to play around with sorting, because quicksort in JS isn't all that fast anyway (since it runs a callback for every comparison).

Got a couple more questions if you don't mind.

1. How do you deal with non-discrete data like model matrices? You can't encode them in a 128-bit draw call, unless you put them in a big list or something.

2. Are draw calls supposed to be "compiled" on every frame, or are they cached inside objects?

1. I use the UBO/CBV abstraction, even on APIs that don't natively support them. Matrices are placed in a constant buffer, which is then represented with a 16bit CB ID. 

2. Ideally I compile draws once and then reuse them many times. My generic model renderer does this, and it saves a lot of work per frame. Some other more dynamic systems generate new, temporary draw items each frame, which are thrown away after submission. 

There is another thing, if everything is stored as an ID then don't you loose some of the details for example in an oop environment we could do.


//just some random example
Rectangle CalculateBounds()
{
  return Rectangle(m_pos.x, m_pos.y, myTextureObject.Width, myTextureObject.Height);
}

But if your sprite or "insert object with texture" just stores a Handle to a texture then you cant easily grab the data, you would have to always have a reference to the GraphicsDevice hanging around in order to do a long fetch to the pool.


//Just some random example
Rectangle CalculateBounds(GraphicsDevice* graphicsDevice)
{
  Texture* myTextureObject = m_graphicsDevice->GetTexture(m_texID);
  return Rectangle(m_pos.x, m_pos.y, myTextureObject->Width, myTextureObject->Height);
}

So Pools are nice and cache friendly but everywhere else becomes less efficient because of it, so is the solution some hybrid approach such that:


struct Texture2D
{
	unsigned int  m_width;
	unsigned int  m_height;
	unsigned int  m_levelCount;
	SurfaceFormat m_surfaceFormat;
	TextureHandle m_textureID;
}

class GraphicsDevice
{
	//...Other stuff
	TexturePool m_texturePool; //Stores 4096 texture ptr? with a Index and Generation?
	void CreateTextureFromImage(ImageFile file, Texture2D* pResult){//Creats sets data, returns;}
}

//Back in some other User Class
Rectangle CalculateBounds()
{
	return Rectangle(m_pos.x, m_pos.y, m_texture.m_width, m_texture.m_height);
}

Designing the GraphicsDevice in a stateless way changes the entire low-level framework, its important to make sure its efficient but also easy to use.

This topic is closed to new replies.

Advertisement