HLSL Addition of two float4 yields zero

Tim Akgayev · 2019-08-11T20:51:14

I'm running into a somewhat unexpected situation while trying to get my lighting to work in the vertex shader. I'm getting black color throughought the frame for some reason and I've decided to run it through the graphics debugger. What I found was that for some reason when I have two valid float4, one being float4(0.57, 0.57, 0.57, 1.0) and the other being (0.0, 0.0, 0.0, 0.0) when I add them I _sometimes_ get zero as a result. I will show what I mean in the video below. (Read this after you watch the video) You can see when I run the shader on individual vertices of a triangle then on the first vertex of the triangle the result comes back as it should, on the second and third vertices though the result comes back as zero. Why is that?

VanillaSnake21

Author

211

August 11, 2019 07:43 PM

26 minutes ago, Magogan said:
You do realize that the dot product can be negative and saturate clamps it to the interval from 0 to 1, so it would be 0?

Oh ok, I didn't realize that if it's negative it would clamp to zero. Then it makes sense.

So all the buffers need to be multiples of 16 bytes? So the declspec just makes sure that they are aligned to boundaries when allocated but doesn't pad them automatically?

And should I also do it on this struct as well?


__declspec(align(16)) struct CameraPosition
	{
		DirectX::XMFLOAT3 EyePosition;
	};

Hold on, I just tested it and it appears declspec automatically pads the class as well.

I did sizeof(CameraPosition) without declspec and it was 12, with declspec the size was 16. So why do I need to pad anything?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

Beosar

572

August 11, 2019 08:05 PM

21 minutes ago, VanillaSnake21 said:
So all the buffers need to be multiples of 16 bytes? So the declspec just makes sure that they are aligned to boundaries when allocated but doesn't pad them automatically?

Yes and no. __declspec(align(16)) does only work for some stack allocations (normal definitions, but apparently not on function args), not on the heap - it adds padding at the end to make the size a multiple of 16 though. But the structure itself doesn't even need to be aligned on the CPU side unless you use XMVECTOR or XMMATRIX (which you shouldn't do, just use XMFLOAT(n) or XMFLOAT(n)X(m)). You need to align the vectors so they do not lie in multiple 16 byte chunks, e.g. float3 float3 float float wouldn't work because the first element of the second float3 is in the first 16 bytes and the other 2 elements are in the second 16 byte chunk. Just add padding such that this doesn't happen (in this case float3 float float3 float) and you'll be fine.

You should group together the data in the constant buffers based on when you update them (once per frame, multiple times per frame, etc.) instead of using a lot of constant buffers.

VanillaSnake21

Author

211

August 11, 2019 08:26 PM

2 minutes ago, Magogan said:
Yes and no. __declspec(align(16)) does only work for some stack allocations (normal definitions, but apparently not on function args), not on the heap - it adds padding at the end to make the size a multiple of 16 though. But the structure itself doesn't even need to be aligned on the CPU side unless you use XMVECTOR or XMMATRIX (which you shouldn't do, just use XMFLOAT(n) or XMFLOAT(n)X(m)). You need to align the vectors so they do not lie in multiple 16 byte chunks, e.g. float3 float3 float float wouldn't work because the first element of the second float3 is in the first 16 bytes and the other 2 elements are in the second 16 byte chunk. Just add padding such that this doesn't happen (in this case float3 float float3 float) and you'll be fine.
You should group together the data in the constant buffers based on when you update them (once per frame, multiple times per frame, etc.) instead of using a lot of constant buffers.

This is just really new stuff to me so I'm trying to get a hang of it. I currently use XMVECTOR and XMMATRIX liberally on the cpu structures anytime I need computations, you're saying I shouldn't do that for speed reasons? So what can I use then because XMFLOAT doesn't allow any kind of math operations (can't add them multiply etc).

Also you're saying that I can have unaligned structures on the CPU but then have them aligned on the GPU, I just want to make sure I understand that correctly. So on the cpu it's sufficient to just use declspec, but I dont' necessarily have to add the elements to the structure?

This is what I mean:


//CPU
	__declspec(align(16)) struct DirectionalLight
	{
		DirectX::XMFLOAT3 LightDirection;
		DirectX::XMFLOAT4 LightColor;

	};


//SHADER

cbuffer dlight
{
   float3 direction;
   float _padding;
   float4 color;
};

So that should be fine and it's not going to mess with the buffer copying or anything like that (since the cpu structure has one less element)?

Quote
You should group together the data in the constant buffers based on when you update them (once per frame, multiple times per frame, etc.) instead of using a lot of constant buffers.

But what if I want to update the buffer in various places in my code? For example lets say I have this buffer:


__declspec(align(16)) struct cbuff_perframe
	{
		
		DirectX::XMMATRIX View;
		DirectX::XMMATRIX Projection;

		DirectX::XMMATRIX World;
		DirectX::XMFLOAT3 CameraEyePosition;
	};

And I want to update the View and Projection inside main Update method of the app, but I'd like to update the World matrix and the CamEyePosition inside the update method of other objects, how would I do that?

So in other words how can I just update lets say the CameraEyePosition by itself in one place and World in other place if they're in the same buffer?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

Beosar

572

August 11, 2019 08:44 PM

You can use XMVECTOR and XMMATRIX but the thing is that those must be aligned to 16 bytes. And if you want to have those on the heap things get really messy as you would need to use aligned_malloc or whatever it was called. So, for computations use XMVECTOR and XMMATRIX, if you need to store data on the heap in general (that includes the buffers but also other classes and structs), use the XMFLOAT* data types.

The dlight structure in the shader is correct, the one on the CPU side needs to match it though. What I was trying to say is the structure itself doesn't need to be aligned (its memory address doesn't need to be a multiple of 16) but the elements inside need to have the same layout as in the shader and they need to be organized into 16 byte chunks.

So both on the CPU and GPU the layout needs to be:


------ start of structure, cpu memory location doesn't need to be multiple of 16 (e.g. 0xCDF12528 would be fine).
float3
float
------ 16 bytes
float4
------ 32 bytes

I don't know of any way to update only parts of the constant buffer, so just update the whole one or use multiple buffers.

VanillaSnake21

Author

211

August 11, 2019 08:51 PM

6 minutes ago, Magogan said:
You can use XMVECTOR and XMMATRIX but the thing is that those must be aligned to 16 bytes. And if you want to have those on the heap things get really messy as you would need to use aligned_malloc or whatever it was called. So, for computations use XMVECTOR and XMMATRIX, if you need to store data in a buffer or a struct, use the XMFLOAT* data types.
The dlight structure in the shader is correct, the one on the CPU side needs to match it though. What I was trying to say is the structure itself doesn't need to be aligned (it's memory address doesn't need to be a multiple of 16) but the elements inside need to have the same layout as in the shader and they need to be organized into 16 byte chunks.
So both on the CPU and GPU the layout needs to be:
------ start of structure, cpu memory location doesn't need to be multiple of 16 (e.g. 0xCDF12528) would be fine.
float3
float
------ 16 bytes
float4
------ 32 bytes
I don't know of any way to update only parts of the constant buffer, so just update the whole one or use multiple buffers.

Ok, I'll keep it in mind. Thanks for your help!

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

HLSL Addition of two float4 yields zero

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

HLSL Addition of two float4 yields zero

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines