Advertisement

Optimizing POM shader texture fetches.

Started by November 15, 2017 07:46 PM
5 comments, last by knarkowicz 7 years, 2 months ago

So I've recently started learning some GLSL and now I'm toying with a POM shader. I'm trying to optimize it and notice that it starts having issues at high texture sizes, especially with self-shadowing.

Now I know POM is expensive either way, but would pulling the heightmap out of the normalmap alpha channel and in it's own 8bit texture make doing all those dozens of texture fetches more cheap? Or is everything in the cache aligned to 32bit anyway? I haven't implemented texture compression yet, I think that would help? But regardless, should there be a performance boost from decoupling the heightmap? I could also keep it in a lower resolution than the normalmap if that would improve performance.

Any help is much appreciated, please keep in mind I'm somewhat of a newbie. Thanks!

Hi, I am not sure what optimizations you have already tried, I have some for you:

  • Use texture compression for the heightmap like you already mentioned: Choose an appropriate format, for example DXT1.
  • Sample only a single channel explicitly
  • Use mipmaps: you will need to calculate the texture uv gradients upfront before the ray marching loop and feed them into textureGrad (I think that's what it's called in glsl?)
  • Use a dynamic loop for the raymarching and terminate early once you found the intersection
  • Use fewer amount of steps in the ray marching.
  • Use smaller resolution heightmap
  • When calculating uv derivatives, you could try using dFdyCoarse instead of standard dFdy (or dFdx)

You can check out my implementation, but it is written in HLSL.

Hope I could help! :)

Advertisement

Thanks a lot for the very complete answer! I've looked at your version and it's very close to my own, I do use textureLod (SampleLevel) inside the loop since it throws anisotropic filtering out of the window [1] and doesn't look all that different.

[1] http://www.diva-portal.org/smash/get/diva2:831762/FULLTEXT01.pdf

Will look into it some more once I implement texture compression.

You can also introduce an angle control step count, if you are looking straight to a wall, you won't see much parallax to start with.

Same thing, you can transition to an height of zero on the distance and doing so, skip the stepping.

All the subsequent texture fetch, not only to the depth map needs to use a custom gradient or you gonna get edges artifacts. And this is sad because SampleGrad is half rate. I personally use the original uv gradient over everything :(

Using a BC4_UNORM definitely matters on bandwidth too.

 

And last, most attempt to do quad tree are a false good idea, the shader logic overhead is usually crushed by the bruteforce version… And i am not even talking about silhouete aware versions… Maximum sadness :( 

 

Yeah I'm already scaling samplecount by angle and distance (and offset by distance, you don't really see it, so that's great).

I've added clamping of the detail in the heightmap by massaging the mipmapping, and that's giving me a huge speed boost on large textures, since most of my textures are fairly smooth (like medieval brick and such). I'm doing it like this at the moment, and it works fine, but since I'm a shader noob perhaps there's a better way?


// Two helper functions...

float GetMipLevel(sampler2D tex, vec2 uv) {
	return textureQueryLOD(tex, uv).y;
}

float GetMipLimit(sampler2D tex, float limit) {

	// Get texture size in pixels, presume square texture (!).
	float size = textureSize(tex, 0).x;
	// Convert to power-of-two to get number of mipmaps.
	size = log2(size);
	// mipmap 0 = nearest and largest sized texture. Get the
	// smallest required mip-offset to avoid large textures.
	if (limit < size) {
		return size - limit;
	} else {
		return size;
	}
}

// Then inside the parallax function, but outside the loop...

// Limit heightmap detail.
float mipLimit = GetMipLimit(tex, 7);
float mipLevel = GetMipLevel(tex, uv);
float mipLod   = max(mipLevel, mipLimit);

// And sample inside the loop...

textureLod(tex, uv, mipLod);


Yeah, the hierarchical traversal doesn't seem to be worth it in practice, shame really. Maybe worth it for soft shadows, the QDM paper seems to have an interesting approximation for shadowing.

Another interesting thing I read was in the Cone Step Mapping paper, where he ditches the normals and instead uses vertical/horizontal derivatives, that allows him to trivially scale the normals alongside the height. Generating the derivative textures could also be crazy fast I think... perhaps even worth doing that at load/async and shipping only with a heightmap. Seems kinda neat, but I'm not sure how much you buy with that.

Thanks for the tips, I'll remember the BC4 unorm thing.

There are two good ways:

1. Brute force - read 4 samples in a loop (4 samples per loop iteration are a sweat spot on most GPUs). You may want to finish your iterations with a linear interpolation between two last samples for better quality.

2. CSM (Cone Step Mapping) - much less samples than bruteforce, but every sample is slower as you can do only 1 tfetch per loop iteration and you are fetching "fatter" texels. It also requires a special pre-computed texture with height and cone angle, which may be an issue for your asset pipeline.

In any way, first you should calculate tex LOD using GLSL function or emulating it inside the shader (faster, but requires to pass tex size to shader). Then derive number of steps from tex LOD. Finally, inside the loop just use previously calculated tex LOD level.

 

This topic is closed to new replies.

Advertisement