Advertisement

Statless Rendering XOR and other "misunderstandings"

Started by January 13, 2020 04:38 PM
3 comments, last by Hodgman 5 years ago

Hello,


Hodgman has talked extensively about stateless rendering it is definitely the way to go but I’m still not happy with my implementation and need some pointers. (Note I am trying to support DX11 and DX12)

http://www.goatientertainment.com/downloads/Designing%20a%20Modern%20GPU%20Interface%20NOTES.pdf


State Cache and Redundancy:

My current solution is to generate a PSO from various abstract descriptors basically DX11/12 BlendDesc etc. When CreatePiplineState() is called I hash the descriptor and store it in a hash map if its unique and store the pointer in the pipeline state object which is stored in a (non-hashed) pool. The function returns a PiplineStateHandle too access the pipeline object when setting it. When drawing you fetch the PSO and check each state too see if its already set, else set it and note that it’s the current set state. Also, I currently cannot delete hash mapped states.

With Hodgman’s solution, you store the ID’s of the states in the draw item and each one takes up a certain number of bits: BlendState == 8 bits etc. So Instead of storing the pointer (ID3D11BlendState* m_pBlendPrevious) You store the previous DrawItem and do an XOR against it and check which states are non-zero and need to be bound.

So, if you use DrawItems, where do PiplineStates come into it? If you store the pipeline state as an ID you basically drop all shaders and states into a single value. If you simulate a PSO in DX11 then you can store the bit ID’s of each state in a PSO object in the graphics device pool and then perform the XOR with the object to resolve redundancies? On DX12 do you just check if the pipeline object is the bound object and leave it at that?

Can Hash Map be replaced:

Storing all the states in a hash map doesn’t really feel right, im thinking of creating a pool that is split into 2 sections, Engine and User generated states.

So, for instance the first 4 BlendStates will be:

1. Opaque

2. AlphaBlend

3. Additive

4. NonPremultiplied.

The remaining 252 (if 8 bit ID) will be user generated states.

But what about a clash? With a map you can hash and find a duplicate and reuse it, with a pool you would have to linear scan from 4-256 to find the element with the same hash. You could store a map into the pool but that means storing a pool and a map for each state.

Would it be better to just hardcode the most common states and leave it at that?

Also currently I do not delete the states because there’s no way of knowing if someone else is using it, is it okay to just leave the states in a pool until the engine closes? Chances are there going to be reused often with going in between levels and such and the relative number of states being quite low, for instance the 4 base blend states are probably all you will ever need for most cases.

Sampler State Redundancy and Sharing

From DX11 onwards you suppose to store a Texture and a Sampler separately allowing for efficient reuse of samplers. How would you handle this though? For instance in unity a Texture is a texture sampler bundle just like in dx9, but this would limit us to 16 textures, if the shader is written to reuse samplers though then when you bind textures there sampler binds would be ignored and share what ever was set in the slow they wanted to use. Does anyone have an elegant solution for this?

Architecture Goal:

Im using ECS for certain reusable aspects, so for the rendering I will have a render component that basically generates and caches the DrawItems for its mesh/Submeshs and submits them to the renderer too be sorted into the queues (Opaque, Geometry, Alpha test, Transparent etc) and deals with the submission. Guessing that’s how most people would do that.

Conclusion:

If anyone has insight into anything, I mentioned above it would be great to hear your opinion on the subject. Particularly in reference to the caching and sampler slot issue as that’s one of the major refactors I’m trying to do and make it more like Hodgman’s with the DrawItem and RenderPass submissions.

Thanks!

Jman2 said:
With Hodgman’s solution, you store the ID’s of the states in the draw item and each one takes up a certain number of bits: BlendState == 8 bits etc. So Instead of storing the pointer (ID3D11BlendState* m_pBlendPrevious) You store the previous DrawItem and do an XOR against it and check which states are non-zero and need to be bound. So, if you use DrawItems, where do PiplineStates come into it? If you store the pipeline state as an ID you basically drop all shaders and states into a single value. If you simulate a PSO in DX11 then you can store the bit ID’s of each state in a PSO object in the graphics device pool and then perform the XOR with the object to resolve redundancies? On DX12 do you just check if the pipeline object is the bound object and leave it at that?

Yeah on DX12 you just check if the PSO ID has changed (bind new PSO if so, otherwise don't).
On DX11 I don't use PSO emulation – so the draw item contains blend ID, raster ID, etc… A single XOR is used to check which of those sub-pipline-states has changed, and new ones are bound.
If you did use PSO emulation on DX11, the draw item would contain a PSO ID (just like in DX12), but your PSO would be a small structure (or int) containing the blend ID, raster ID, etc… (or the pointers to those state objects).

Jman2 said:
The remaining 252 (if 8 bit ID) will be user generated states

FWIW I use 7 bits for blend ID, and even then I feel that 128 potential blend modes is overkill ?

Jman2 said:
Would it be better to just hardcode the most common states and leave it at that?

That's an approach that I have seen quite often in engines, yep. It also means you know exactly how many bits you need (in some cases, maybe only as few as 2!)

Jman2 said:
Also currently I do not delete the states because there’s no way of knowing if someone else is using it, is it okay to just leave the states in a pool until the engine closes? Chances are there going to be reused

I do exactly that in my engine.

Jman2 said:
if the shader is written to reuse samplers though then when you bind textures there sampler binds would be ignored

I don't quite understand the issue.
Either you design your engine around the DX9/GL/Unity model, where texture slots are 1:1 matched with sampler slots (and force shader authors to write shaders this way).
Or you design your engine around the DX11 model where texture and sampler slots are independent (and write your shaders appropriately). In this design, the draw item will include texture bindings and sampler bindings. Also, as of DX12 you'll want to use static-samplers that don't need any binding operations performed ?

Advertisement

Thanks Hodgman for the clarifications, really helps put the pieaces into place.

Hodgman said:
f you did use PSO emulation on DX11, the draw item would contain a PSO ID (just like in DX12), but your PSO would be a small structure (or int) containing the blend ID, raster ID, etc… (or the pointers to those state objects).

Yeah, my PSO Desc is filled with Descs of the states and the shader blobs required,I pass that too the GraphicsDevice which creates a emulated (internal) version which contains ID3D11Resource* pointers to therestates contained in the hash map.

The problem is my Hash ID's are 32 bits, Are you using the hash map to point into a contigious array of your states, rather than storing say a Cache that has Key Pair <HashId, ID3D11BlendState> you store, <HashID, ArrayOffsetID>, with a fixed array of ID3D11BlendStates allowing you to have 7 bit ID's?

Hodgman said:
I do exactly that in my engine

Thats great! i was a bit worried about that but checking BGFX they do the same as well, no need to recreate throughout the lifetime of the app if your just going to recreatethem as the player traverses scenes…

Hodgman said:
I don't quite understand the issue

Basically in the Unity style, if you have 2 textures which both use a LinearWrap, you would have to bind LinearWrap sampler to slot 0 and slot 1, does that have any additional overhead?

In the DX11 version your shader programmer would set shaders up for textures to reference slots, my shader syntax is a bit like Unity's in the sense that you define a Properties block, i would have to add a tag to reference the correct sampler so when the material parses it,it can create the right linkage. My shaders just use the fixed typeslike LinearWrap, LinearClamp etc.

Jman2 said:
Are you using the hash map to point into a contigious array of your states, rather than storing say a Cache that has Key Pair you store, , with a fixed array of ID3D11BlendStates allowing you to have 7 bit ID's?

I use a hash-map with a fixed capacity, so all the values are pre-allocated as a contiguous array that can be indexed, yeah. Also, the key for my map is the descriptor that created the state, not a hash of that descriptor struct – a hash-map implementation should store the full key structure internally, but use the key's hash during lookups.

Jman2 said:
Basically in the Unity style, if you have 2 textures which both use a LinearWrap, you would have to bind LinearWrap sampler to slot 0 and slot 1, does that have any additional overhead?

Ah ok, so if you're using the Unity style in your own API, but implementing on top of DX11 — then yeah, the easiest option is to pair up your texture and sampler slots 1:1 and do that. Yes, it's a little bit of extra overhead…. but inside the NVidia driver, they're (AFAIK) doing extra work to take the separate sampler/texture bindings and recombine them into a single GL-style texture-sampler binding before they send it to the HW anyway �

This topic is closed to new replies.

Advertisement