21 hours ago, d07RiV said:
Another thing - when you put all passes in the same shader file, do you run a lexer on them, or do you just feed everything to shader compiler and let it figure out what to optimize away? The former option would us to know which options affect which passes, so we don't have to make redundant copies (instead of having to manually specify them for every pass).
edit: I guess this is partially answered by bonus slides.
Having a custom shader language / a full lexer would be great, but I did not spend the effort in this area.
Instead, I use HLSL and only parse the outputs from the HLSL compiler. This allows you to discover things like the resource bindings that are actually used by the optimized code, but does not let you discover things such as which options actually had an effect on the code generation (unless you want to brute force it by repeatedly compiling with different options enabled and comparing the compiler outputs for differences...). To declare extra shader meta-data (such as passes, options, resource-lists, etc), I embed Lua code within the shader source files that does this.
14 hours ago, d07RiV said:
've also been thinking if there are better alternatives to picking the rendering order than simple radix sort, which can have abysmal results in some cases (i.e. 0111111 -> 1000000 -> 1111111 -> 2000000 etc). It is essentially a traveling salesman problem, which has plenty of decent approximate solutions, the question is, how much time are we prepared to dedicate to sorting.
I've never really thought about putting that much work into sorting, but yeah I guess you could do quite a bit of analysis there
My simple advice is to use a radix sort (or quick sort, etc), on integer state keys, where more significant bits represent more costly state changes (like render-targets or shaders) and less significant bits represent cheaper state changes (like constant/uniform values).
It highly depends on your content too -- in some games, you might often be changing just one texture per draw, but keeping 10 other texture bindings the same... But in other games, you might use 11 unique texture bindings per draw.
There's plenty of other optimization techniques that an rendering engine has to look into supporting too, which reduce the number of draw-calls in total: instanced draw-calls, dynamic instancing (appending multiple different meshes into a single contiguous set so they can be drawn at one time), CPU-side vertex transformations, GPU-side skinning / transformation arrays, texture arrays, texture atlases, material arrays in cbuffers instead of individual material constants/uniforms, indirect draw-calls, compute-shader draw culling, etc, etc... If you implemented all of these, hopefully you'd have a much smaller number of draws with much more unique state per draw.