State vs Stateless Designing a modern GPU Interface

James e · 2018-06-20T14:26:00

Hello State based render architecture has many problems such as leakage of states and naive setting of states on each draw call, a lot of different sources recommend stateless rendering architecture which makes sense for DX12 as it uses a single object bind the PSO. Take a look at the following: Designing a Modern GPU Interface stateless-layered-multi-threaded-rendering-part-2-stateless-api-design Firaxis Lore System: CIV V Is this not causing the same problem though? as you are passing all the state commands within a DrawCommand object for them to be set during the draw call? yes you are hiding the state machine by not exposing the functions directly but you are just deferring the state changes to the command queue. You can sort by index using this method: Real Time Collision: Draw Call Key But that means each DrawCommand is passing in the entire PSO structure (as in the state's you want) with each command and storing it, just for you to sort by the key and elect the first object to bind its PSO for the rest within the group to use. It seems like a lot of wasted memory to pass all the PSO in to use just one, although it does prevent any slow down from swapping PSO for every single object. How are you handling state changes? am i missing some critical piece of information about stateless (note i am aiming towards the stateless method for DX12 just want some opinions on it :)) Thanks.

Graphics and GPU Programming Programming 3D

Started by Jman2 June 08, 2018 04:12 PM

14 comments, last by Jman2 6 years, 7 months ago

d07RiV

258

June 15, 2018 10:26 PM

I've also been thinking if there are better alternatives to picking the rendering order than simple radix sort, which can have abysmal results in some cases (i.e. 0111111 -> 1000000 -> 1111111 -> 2000000 etc). It is essentially a traveling salesman problem, which has plenty of decent approximate solutions, the question is, how much time are we prepared to dedicate to sorting.

I'm guessing the most reasonable way would be to always pick the draw item closest to the current state, using some LSH or tree-based structure to preprocess the draw list. It also raises a question of whether we need "don't care" values, because they can significantly reduce the cost of switching states.

Hodgman

52,718

June 16, 2018 01:20 PM

21 hours ago, d07RiV said:
Another thing - when you put all passes in the same shader file, do you run a lexer on them, or do you just feed everything to shader compiler and let it figure out what to optimize away? The former option would us to know which options affect which passes, so we don't have to make redundant copies (instead of having to manually specify them for every pass).
edit: I guess this is partially answered by bonus slides.

Having a custom shader language / a full lexer would be great, but I did not spend the effort in this area.

Instead, I use HLSL and only parse the outputs from the HLSL compiler. This allows you to discover things like the resource bindings that are actually used by the optimized code, but does not let you discover things such as which options actually had an effect on the code generation (unless you want to brute force it by repeatedly compiling with different options enabled and comparing the compiler outputs for differences...). To declare extra shader meta-data (such as passes, options, resource-lists, etc), I embed Lua code within the shader source files that does this.

14 hours ago, d07RiV said:
've also been thinking if there are better alternatives to picking the rendering order than simple radix sort, which can have abysmal results in some cases (i.e. 0111111 -> 1000000 -> 1111111 -> 2000000 etc). It is essentially a traveling salesman problem, which has plenty of decent approximate solutions, the question is, how much time are we prepared to dedicate to sorting.

I've never really thought about putting that much work into sorting, but yeah I guess you could do quite a bit of analysis there My simple advice is to use a radix sort (or quick sort, etc), on integer state keys, where more significant bits represent more costly state changes (like render-targets or shaders) and less significant bits represent cheaper state changes (like constant/uniform values).

It highly depends on your content too -- in some games, you might often be changing just one texture per draw, but keeping 10 other texture bindings the same... But in other games, you might use 11 unique texture bindings per draw.

There's plenty of other optimization techniques that an rendering engine has to look into supporting too, which reduce the number of draw-calls in total: instanced draw-calls, dynamic instancing (appending multiple different meshes into a single contiguous set so they can be drawn at one time), CPU-side vertex transformations, GPU-side skinning / transformation arrays, texture arrays, texture atlases, material arrays in cbuffers instead of individual material constants/uniforms, indirect draw-calls, compute-shader draw culling, etc, etc... If you implemented all of these, hopefully you'd have a much smaller number of draws with much more unique state per draw.

. 22 Racing Series .

d07RiV

258

June 18, 2018 12:42 PM

Hm thanks, I'll try to play around with sorting, because quicksort in JS isn't all that fast anyway (since it runs a callback for every comparison).

Got a couple more questions if you don't mind.

1. How do you deal with non-discrete data like model matrices? You can't encode them in a 128-bit draw call, unless you put them in a big list or something.

2. Are draw calls supposed to be "compiled" on every frame, or are they cached inside objects?

Hodgman

52,718

June 18, 2018 10:07 PM

1. I use the UBO/CBV abstraction, even on APIs that don't natively support them. Matrices are placed in a constant buffer, which is then represented with a 16bit CB ID.

2. Ideally I compile draws once and then reuse them many times. My generic model renderer does this, and it saves a lot of work per frame. Some other more dynamic systems generate new, temporary draw items each frame, which are thrown away after submission.

. 22 Racing Series .

Jman2

Author

161

June 20, 2018 02:26 PM

There is another thing, if everything is stored as an ID then don't you loose some of the details for example in an oop environment we could do.


//just some random example
Rectangle CalculateBounds()
{
  return Rectangle(m_pos.x, m_pos.y, myTextureObject.Width, myTextureObject.Height);
}

But if your sprite or "insert object with texture" just stores a Handle to a texture then you cant easily grab the data, you would have to always have a reference to the GraphicsDevice hanging around in order to do a long fetch to the pool.


//Just some random example
Rectangle CalculateBounds(GraphicsDevice* graphicsDevice)
{
  Texture* myTextureObject = m_graphicsDevice->GetTexture(m_texID);
  return Rectangle(m_pos.x, m_pos.y, myTextureObject->Width, myTextureObject->Height);
}

So Pools are nice and cache friendly but everywhere else becomes less efficient because of it, so is the solution some hybrid approach such that:


struct Texture2D
{
	unsigned int  m_width;
	unsigned int  m_height;
	unsigned int  m_levelCount;
	SurfaceFormat m_surfaceFormat;
	TextureHandle m_textureID;
}

class GraphicsDevice
{
	//...Other stuff
	TexturePool m_texturePool; //Stores 4096 texture ptr? with a Index and Generation?
	void CreateTextureFromImage(ImageFile file, Texture2D* pResult){//Creats sets data, returns;}
}

//Back in some other User Class
Rectangle CalculateBounds()
{
	return Rectangle(m_pos.x, m_pos.y, m_texture.m_width, m_texture.m_height);
}

Designing the GraphicsDevice in a stateless way changes the entire low-level framework, its important to make sure its efficient but also easy to use.

State vs Stateless Designing a modern GPU Interface

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

State vs Stateless Designing a modern GPU Interface

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines