Optimizing POM shader texture fetches.

Graphics and GPU Programming Programming OpenGL Textures

Started by Abecederia November 15, 2017 07:46 PM

5 comments, last by knarkowicz 7 years, 2 months ago

Author

November 15, 2017 07:46 PM

So I've recently started learning some GLSL and now I'm toying with a POM shader. I'm trying to optimize it and notice that it starts having issues at high texture sizes, especially with self-shadowing.

Now I know POM is expensive either way, but would pulling the heightmap out of the normalmap alpha channel and in it's own 8bit texture make doing all those dozens of texture fetches more cheap? Or is everything in the cache aligned to 32bit anyway? I haven't implemented texture compression yet, I think that would help? But regardless, should there be a performance boost from decoupling the heightmap? I could also keep it in a lower resolution than the normalmap if that would improve performance.

Any help is much appreciated, please keep in mind I'm somewhat of a newbie. Thanks!

turanszkij

545

November 16, 2017 11:58 AM

Hi, I am not sure what optimizations you have already tried, I have some for you:

Use texture compression for the heightmap like you already mentioned: Choose an appropriate format, for example DXT1.
Sample only a single channel explicitly
Use mipmaps: you will need to calculate the texture uv gradients upfront before the ray marching loop and feed them into textureGrad (I think that's what it's called in glsl?)
Use a dynamic loop for the raymarching and terminate early once you found the intersection
Use fewer amount of steps in the ray marching.
Use smaller resolution heightmap
When calculating uv derivatives, you could try using dFdyCoarse instead of standard dFdy (or dFdx)

You can check out my implementation, but it is written in HLSL.

Hope I could help!

Wicked Engine

Abecederia

Author

November 16, 2017 09:53 PM

Thanks a lot for the very complete answer! I've looked at your version and it's very close to my own, I do use textureLod (SampleLevel) inside the loop since it throws anisotropic filtering out of the window [1] and doesn't look all that different.

[1] http://www.diva-portal.org/smash/get/diva2:831762/FULLTEXT01.pdf

Will look into it some more once I implement texture compression.

galop1n

1,046

November 17, 2017 12:06 AM

You can also introduce an angle control step count, if you are looking straight to a wall, you won't see much parallax to start with.

Same thing, you can transition to an height of zero on the distance and doing so, skip the stepping.

All the subsequent texture fetch, not only to the depth map needs to use a custom gradient or you gonna get edges artifacts. And this is sad because SampleGrad is half rate. I personally use the original uv gradient over everything

Using a BC4_UNORM definitely matters on bandwidth too.

And last, most attempt to do quad tree are a false good idea, the shader logic overhead is usually crushed by the bruteforce version… And i am not even talking about silhouete aware versions… Maximum sadness

Abecederia

Author

November 17, 2017 06:52 PM

Yeah I'm already scaling samplecount by angle and distance (and offset by distance, you don't really see it, so that's great).

I've added clamping of the detail in the heightmap by massaging the mipmapping, and that's giving me a huge speed boost on large textures, since most of my textures are fairly smooth (like medieval brick and such). I'm doing it like this at the moment, and it works fine, but since I'm a shader noob perhaps there's a better way?


// Two helper functions...

float GetMipLevel(sampler2D tex, vec2 uv) {
	return textureQueryLOD(tex, uv).y;
}

float GetMipLimit(sampler2D tex, float limit) {

	// Get texture size in pixels, presume square texture (!).
	float size = textureSize(tex, 0).x;
	// Convert to power-of-two to get number of mipmaps.
	size = log2(size);
	// mipmap 0 = nearest and largest sized texture. Get the
	// smallest required mip-offset to avoid large textures.
	if (limit < size) {
		return size - limit;
	} else {
		return size;
	}
}

// Then inside the parallax function, but outside the loop...

// Limit heightmap detail.
float mipLimit = GetMipLimit(tex, 7);
float mipLevel = GetMipLevel(tex, uv);
float mipLod   = max(mipLevel, mipLimit);

// And sample inside the loop...

textureLod(tex, uv, mipLod);

Yeah, the hierarchical traversal doesn't seem to be worth it in practice, shame really. Maybe worth it for soft shadows, the QDM paper seems to have an interesting approximation for shadowing.

Another interesting thing I read was in the Cone Step Mapping paper, where he ditches the normals and instead uses vertical/horizontal derivatives, that allows him to trivially scale the normals alongside the height. Generating the derivative textures could also be crazy fast I think... perhaps even worth doing that at load/async and shipping only with a heightmap. Seems kinda neat, but I'm not sure how much you buy with that.

Thanks for the tips, I'll remember the BC4 unorm thing.

knarkowicz

2,407

November 18, 2017 08:07 PM

There are two good ways:

1. Brute force - read 4 samples in a loop (4 samples per loop iteration are a sweat spot on most GPUs). You may want to finish your iterations with a linear interpolation between two last samples for better quality.

2. CSM (Cone Step Mapping) - much less samples than bruteforce, but every sample is slower as you can do only 1 tfetch per loop iteration and you are fetching "fatter" texels. It also requires a special pre-computed texture with height and cone angle, which may be an issue for your asset pipeline.

In any way, first you should calculate tex LOD using GLSL function or emulating it inside the shader (faster, but requires to pass tex size to shader). Then derive number of steps from tex LOD. Finally, inside the loop just use previously calculated tex LOD level.

Optimizing POM shader texture fetches.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Optimizing POM shader texture fetches.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines