Advertisement

low precision formats for vertex data?

Started by June 15, 2013 05:22 PM
11 comments, last by 21st Century Moose 11 years, 7 months ago

Right now I'm working on getting a skeletal animation system integrated into my game but I'm starting to realize that the extra data carried per vertex could mess up the 64 byte alignment of my vertex structure. But then I thought that since normals and tangents are generally guaranteed to contain numbers between -1.0 and 1.0 that I might be able to get away with using formats to store this data with smaller formats and therefore being able to fit all the data in 64 bytes. The problem I encountered is that C++ doesn't have a half float type and half floats aren't supported below OpenGL 3.0. Theoretically I could use scaled 16-bit integers but this would add extra computations in the vertex shader to re-scale them. Right now my structure contains:


struct meshVertex
{
	float x, y, z; //12 bytes for position
	float u, v; //8 bytes for texcoords
	float nX, nY, nZ; // 12 bytes for normals
	float tX, tY, tZ, tW; //16 bytes for tangents
	
	unsigned short b1, b2, b3, b4; //8 bytes of bone IDs

	float w0, w1, w2, w3; //16 bytes for bone weights
};

that adds up to 72 bytes which is 8 bytes too much. Can anybody post what their animated vertex structure looks like? Does anyone have any other methods for decreasing the size of the vertex structure?

Could you combine 2 floats into one?

something like:

float packed = round(a*10000.0) + b //a and b in range 0-1

the magic number should probably be so that both floats get equal precision, not sure if its even possible.

o3o

Advertisement

Is there some reason that you care about 64-byte alignment?

The only thing you should need full 32-bit precision for is position, everything else you could compress. For texture coordinates 16-bit should be sufficient, either using an integer or half-precision float depending on whether you need values > 1 or < 0. Normals should be 16-bit integers with a sign bit, since they're always in the [-1, 1] range. Same for tangents. Typically you store bone weights as 4 8-bit integers, since they're in the [0, 1] range.

EDIT: I forgot to mention you can possibly compress normals and tangents even further by taking advantage of the fact that they are direction vectors, if you're willing to introduce some unpacking code into your vertex shader. Most of the techniques listed here are applicable, or if your tangent frame is orthogonal then you can store the entire thing as a single quaternion.

I'm inclined to agree that the precious 64-byte alignment probably isn't worth as much as you think it'll be. But I'll bite anyway:

Presumably your 4 weights are always going to add up to 1.0 exactly, so one of them doesn't need to be stored. That saves you 4 bytes.

Next, presumably your tW value is actually just the sign value for your binormal, so it really only needs one bit of information. Since you know that all your weights are positive, and the first weight is (hopefully) guaranteed to be non-zero you could use the sign bit of the first weight and eliminate tW entirely.

Is there some reason that you care about 64-byte alignment?

L. Spiro says that it makes a difference.


Direct3D gains a lot of performance (and is more stable) when the vertices within the vertex buffers are padded to 32 bytes. Even though you are sending more data across the bus, vertices of 54 bytes are slower than the same vertices padded to 64 bytes. In fact, even if you follow my advice above and keep vertices in their own buffer, it is still faster to send 32 bytes per vertex than to send 12.

Or it doesn't?

I can't say that I have ever observed such behavior on any hardware that I've worked on extensively, save for one console GPU that really liked fetching 32-byte vertex chunks. Any modern (DX10+) GPU doesn't even have dedicated vertex fetching hardware anymore, and will read the vertex data the same way it reads any other buffer.

Advertisement

I just tried a quick experiment in DX9 on my macbook (Intel GPU), and there was hardly any difference between processing 65536 12 byte vertices vs 65536 ones that were padded to 32 bytes. (Over a couple of runs, the 32 byte vertex format seemed to be about half a percent faster in its vertex processing - so almost nothing).

I also compared 28 byte vs 32 byte, again barely any difference.

MJP, was the console the Xbox 360? I've noticed pretty significant improvements compressing vertices there. But I don't recall trying to check if it was the alignment along 32 bytes, or simply the reduced bandwidth that caused the improvement.

MJP, was the console the Xbox 360? I've noticed pretty significant improvements compressing vertices there. But I don't recall trying to check if it was the alignment along 32 bytes, or simply the reduced bandwidth that caused the improvement.

Yup.

In my experience:

* both bone indices and bone weights are usually store in a 4x8bit format.
* integer formats can either be normalized or non-normalized. The hardware expands 0xFF to 1.0f when using a normalized format, or 255.0f for non-normalized. No shader rescaling is required.
* 16-bit precision is more than enough for normals. A 3x16bit format often isn't supported though. The normal and tanget can be packed into one 4x16bit and one 2x16-bit to save space if required. Alternatively, the tex-coords or other data can be stored in the w component of the normal and tangent.
* For current game texture resolutions, 16-bit tex-coords are usually precise enough.
* For object-space positions, if the model is small enough, then 16-bit positions with custom rescaling can be used.
* You can make a half-float data type in C++ to easily convert floats to halfs.


struct meshVertex
{
	float x, y, z; //12 bytes for position
	half u, v; //4 bytes for texcoords
	half nX, nY, nZ, pad; // 8 bytes for normals
	half tX, tY, tZ, tW; //8 bytes for tangents
	unsigned byte b1, b2, b3, b4; //4 bytes of bone IDs
	unsigned byte w0, w1, w2, w3; //4 bytes for bone weights
};

That adds up to 40 bytes ;)


But I don't recall trying to check if it was the alignment along 32 bytes, or simply the reduced bandwidth that caused the improvement.
The 360 has two fetch instructions -- one (named "large") fetches 32 bytes, the other (named "small") then reads a particular attribute out of an already-fetched 32-byte block. Large fetches are expensive, small fetches are cheap. By using correct alignment and small data types, you reduce the number of large fetches.

Bandwidth is also a big deal. All GPUs from around that time (e.g. PS3 included) have much lower bandwidth than current ones, and many of them don't have unified shading architecture, so there's dedicated VS hardware and PS hardware. This puts a hard limit on the ratio of the amount of data that each vertex shader can use compared to the amount of processing that it does. Often, fetching data becomes the main bottleneck instead of processing, so reducing data sizes can help a lot.

So I managed to move some stuff around and I have got the structure down to 64 bytes. Using the same approach I managed to trim non animated data (scenery etc.) down to 32 bytes. I think regardless of the alignment it is still a good idea to reduce the size of vertex structures. Does anyone know if 32 byte alignment will have an effect on older GPUs DX9/GL2 era?

This topic is closed to new replies.

Advertisement