Advertisement

Is there a hidden cost for creating all PSOs at load time?

Started by February 21, 2018 06:22 AM
2 comments, last by Hodgman 6 years, 11 months ago

I am working on a rendering framework. We have adopted the DX12 style where you create all pipelines for all permutations at load time. I am just wondering whether there is a limit to the number of pipelines you can create or if you have to pay some hidden cost if you have pipelines just lying around until you have to actually use it (For example: Choose MSAAX2 pipeline after the user picks it from the settings menu)

Or should I only create the pipelines I need at load time and then re-create them whenever necessary?

ID3D12PipelineState::GetCachedBlob might give you an idea about the size of a PSO. It is driver dependent and also NOT anyhow ABI compatible. Creating it isn't very cheap, the driver has to patch pixel shaders for the desired render target formats, create fetch shaders (or patch vertex shaders to account for the IA layout), etc.. This time is measurable and isn't negligible.

Unless you suffer from lack of RAM badly, I'd keep them around from load time, if you already have the framework in place.

Advertisement

Yeah I would assume that wasted RAM is the only downside (and load times). 

In my recent test on a game that's already shipped using D3D11 (internal D3D12 port), it took about 5 seconds to create "base" PSOs for each permutation (where "base" is a default raster/blend/depth-stencil, but all other data permuted to cover all possible uses as declared by the game's metadata). That's a long time to block a main thread for initially - you'd probably want to do it separately for your main menu shaders to get into game quickly, and then do the rest afterwards. 

We actually  support creating PSOs on demand (a thread safe hashmap is searched for a PSO when creating a drawable, and failures result in the requested PSO being compiled and added to the map) so all we do is spin up a background thread that starts doing this many-seconds worth of work concurrently. During that time we might get 'cache misses' where a PSO lookup in the hash table fails and one of the main threads has to create it on demand, but after the background thread completes, the cache should be fully warmed up.

The other cost is simply managerial - how do you know all your PSOs at load time? We force shader authors to declare which render target formats, MSAA modes, primitive topologies, index buffer formats (for the IB strip cut value) and vertex buffer layouts their shaders are compatible with. This is a pain for shader authors, but does give you a lot of knowledge about PSOS requirements at build time. We also use this data in the other rendering back-ends in development builds (e.g. D3D11) and assert if a shader is used in a way that wasn't declared by its author. 

This topic is closed to new replies.

Advertisement