Advertisement

So...what's up with these poor pc ports?

Started by February 26, 2022 10:54 PM
2 comments, last by MJP 2 years, 10 months ago

I was going to download Elden Ring and give it a go, but from what I've gathered the pc community is saying it's next to unplayable with all of the random stuttering. One video I watched the author had alluded to the cause being do to how the shader programs are compiled at runtime when they are required to render specific elements.

Now, I've been working on an opengl engine for the past half decade at this point and I can for sure see how this could cause the stuttering. If you've got a large shader program and it takes (just for example sake) 100ms to send the string to the gpu and compile, you'll block for that amount of time causing a stutter.

However, and this is probably naivety on my part, but why don't they just preload all of the shaders before the scene begins? This is what I do in my engine (although I only have maybe 10 different shader programs, none of which are massive in size) and it's worked out fine for me up to this point.

Is there some specific reason why they choose to compile shaders at runtime?

whitwhoa said:
However, and this is probably naivety on my part, but why don't they just preload all of the shaders before the scene begins?

Many games do this. At first launch, they precompile all shaders, so they are cached and ready once needed.
Though, this can take 10 minutes easily. I often think the game hangs and terminate it. Then on second launch they start up quickly, but probably compilation work is left and happening sometime during gameplay later.
I'm not even sure if this is related to shader compilation. Probably we should give feedback and a progress bar to prevent users from interrupting it.

For my own project, compiling shaders even took more than 30 minutes on NVidia, which prevented me form finding the algorithm variants performing best on the specific GPU i wanted to optimize for. It simply took to long to test all combinations. (This was because i use one shader per tree level, so the level can be a compile time constant. Thus many shaders generate 20 permutations otherwise equal. I need to try some alternatives to prevent those permutations…)
Interestingly, on AMD i had no issues at all - it was a matter of seconds to recompile.
Though, that was many years ago, using Vulkan. Not sure what has changed since then, how it depends on API, vendors, etc.
(I also have no problems with Elden Ring and wonder why people complain. Played 1-2 hours, got outside to open world, but no issues yet.)

Here is an entire related forum thread about the issue, which i did not follow: https://forum.beyond3d.com/threads/shader-complilation-on-pc-about-to-become-a-bigger-bottleneck.61929/

Personally i think we should treat shaders less like assets, allowing artists to create thousands of them. Not sure if we really need that many variants.
On the other hand, permutations are usually about optimization, so it's not easy ofc.
Also, with streaming open worlds, there is no simple solution of doing it on level load anymore :/

Advertisement

Personally I would not want to speculate about a specific game without doing some actual profiling, and even then I would be very careful since I would not want to inadvertently say something misleading or subtly wrong. Therefore I don't think I will comment about Elden Ring specifically.

Now in terms of a generic D3D12 or Vulkan game, there's a bunch of different factors here that can make shaders and PSOs (the two are closely related) rather complicated to deal with. Again none of this applies to Elden Ring in particular since I know nothing about their tech, but just applies in general to any engine:

  • Both of these APIs push you towards pre-compiling your shaders from high-level source code to an intermediate bytecode format (DXIL for D3D12, SPIR-V for Vulkan). You can compile to bytecode at runtime, but in practice most games won't do this. Instead they will pre-compile as part of their code compilation or asset processing pipelines, and ship bytecode with the game.
  • Intermediate bytecode is just that: it's an intermediate format. It cannot be directly consumed by a GPU shader core: the driver needs to perform another JIT compile step at runtime to convert the bytecode into the native ISA of that shader core. In many cases the driver uses a full optimizing compiler (such as LLVM) to do this step in order to make the final ISA as optimal as possible. Since this generated ISA is specific to the video card and driver, it can't be done ahead-of-time.
  • In D3D12 and Vulkan this JIT compile step doesn't happen until you create a full PSO, which provides enough render state to be able to fully compile down the shader to ISA (since things like blend states or MSAA sample counts can require the driver to compile things into the shader to make them work). This can make things tricky if the engine is not setup to know in advance the final render states that will be used for a draw call. In particular it's bad for the old OpenGL/D3D9/10/11 style of setting states one at a time before a draw like a giant state machine: in that case you would not be able to create the PSO until draw time, which as you pointed out is pretty bad time to suddenly stall for hundreds of milliseconds.
  • Even if your engine is setup to generate PSOs ahead of time, creating them all can still take a significant amount of time. I know you mentioned you only had 10 shaders in your project, most game engines will have much much much more than that in typical scenes. This can be either a product of needing lots of permutations for performance, or from allowing artist-authored shaders (like Unreal shader graphs), or all of the above. I wrote a whole big blog post on this subject if you're interested: https://therealmjp.github.io/posts/shader-permutations-part1/
  • Even small numbers of PSOs can take a long time if it takes the driver a long time to JIT compile and optimize the bytecode into ISA. It can vary significantly across vendors (as JoeJ pointed out), and it is not something that necessarily scales linearly with source code size or bytecode size (it's very easy to have O(N^2) or worse scaling when it comes to optimization passes).
  • Both APIs have various forms of caching for PSOs and compiled shaders that might involve the driver, the operating system, or even the app itself. However all of these can only cache things after going through the initial JIT compile step, which has to happen at least once. And even if there's a cache hit, if it involves opening and reading a file that still may not be fast enough to avoid a hitch if it's happening on the critical path.

This topic is closed to new replies.

Advertisement