Advertisement

XMVECTOR and XMMATRIX in header?

Started by September 03, 2013 04:21 PM
4 comments, last by backstep 11 years, 5 months ago

Hello! So I've been struggling with getting XMVECTOR and XMMATRIX working when defined in my header file. If I do:


XMMATRIX World;
XMVECTOR camPosition;

in the header file for example, and then do:


camPosition = XMVectorSet( 0.0f, 0.0f, -0.5f, 0.0f );

in my .cpp file, for example. It gives me an "access violation" error. I read that this is because of some kind of alignment thingy? Is there no way to get the XMVECTOR or XMMATRIX defined in the header file and then use it in the .cpp file?

If not, what should I do instead? Because defining them in the .cpp file works, but that looks so bad? Any other solutions, what do you guys do?

Hey, I think the fastest way to tell if your problem is alignment related is to build your project with the Debug configuration instead of Release, since I believe visual studio defaults the Debug configuration to disable SSE intrinsics.

Assuming that is the problem, and you don't get the access violation in Debug configuration, it probably is a memory alignment issue. That's because XMMATRIX and XMVECTOR use (i think) __m128 SSE intrinsic types for storage, and those need to have a 16 byte memory alignment. More info here: http://msdn.microsoft.com/en-us/library/ee418725.aspx

There are a few ways to work around that issue. The easiest is to just build for an x64 target platform, since all allocations are 16 byte aligned for x64 processes (instead of 8 byte aligned for x86).

The next simplest is probably to store your XMMATRIX as a XMFLOAT4X4 type instead (and XMVECTOR as XMFloat4), then use XMLoadxxx and XMStorexxx functions with temporary local XMVECTOR/XMMATRIX variables, to feed them into functions like XMVectorSet that expect 16 byte aligned arguments. More info here: http://msdn.microsoft.com/en-us/library/microsoft.directx_sdk.loading.xmloadfloat4.aspx

Finally you can attempt to align the containing class/struct that the XMMATRIX/XMVECTOR is a member of. You essentially declare the class/struct in your header with __declspec(align(16)), and then make sure you declare the member variables with the XMVECTOR/XMMATRIX types first, to ensure the alignment. For example:




__declspec(align(16)) class CMyClass
{
public:
      bool SomeFunc();
      void SomeOtherFunc();
private:
      XMMATRIX m_world;
      XMVECTOR m_camPosition;
      int m_someOtherMember;
      bool m_yetAnotherMember;
};

That should align the entire class/struct to 16 byte boundaries, and since the aligned XM types are declared first they begin at the requested alignment for the class/struct. I think that can get a bit messy (with virtual functions and inheritance possibly altering the data structure within a class), perhaps someone more experienced could provide better information on that. Personally I use SSE intrinsic DirectXMath types directly, and only compile for x64 target.

Advertisement

The above solution works for stack allocated objects, but might not work with heap allocated objects (i.e. objects allocated via new/delete). That depends on your OS though (some will already allocate memory on 16byte boundaries, but you should probably check with your OS to make sure that is the case). Overloading new/delete, and getting them to call _mm_malloc/_mm_free might be the solution you are after. It's also worth pointing out that you should be careful of virtual functions when using aligned data types (because they will silently pad the structure with an extra pointer value)

For stack allocation goto project option, set the Struct Member Alignment to 16-byte or use the /ZP16 into command line.

For dynamic allocation you can use _aligned_malloc with the "placement new" (requires <new> header).




#include <new>
...
ptrToAlignedObj = (AlignedObj*)_aligned_malloc(sizeof(AlignedObj), 16);
new (PtrToAlignedObj) AlignedObj();

To destroy object allocated with _aligned_malloc you must use _aligned_free after manually calling the destructor.




ptrToAlignedObj->~AlignedObj();
_aligned_free(ptrToAlignedObj);

Note that _aligned_malloc and _aligned_free are not standard.

Another better solution could define a custom allocator or just overload the operators new and delete for classes that need 16-byte alignment.

"Recursion is the first step towards madness." - "Skegg?ld, Skálm?ld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

Hey, I think the fastest way to tell if your problem is alignment related is to build your project with the Debug configuration instead of Release, since I believe visual studio defaults the Debug configuration to disable SSE intrinsics.

Assuming that is the problem, and you don't get the access violation in Debug configuration, it probably is a memory alignment issue. That's because XMMATRIX and XMVECTOR use (i think) __m128 SSE intrinsic types for storage, and those need to have a 16 byte memory alignment. More info here: http://msdn.microsoft.com/en-us/library/ee418725.aspx

There are a few ways to work around that issue. The easiest is to just build for an x64 target platform, since all allocations are 16 byte aligned for x64 processes (instead of 8 byte aligned for x86).

The next simplest is probably to store your XMMATRIX as a XMFLOAT4X4 type instead (and XMVECTOR as XMFloat4), then use XMLoadxxx and XMStorexxx functions with temporary local XMVECTOR/XMMATRIX variables, to feed them into functions like XMVectorSet that expect 16 byte aligned arguments. More info here: http://msdn.microsoft.com/en-us/library/microsoft.directx_sdk.loading.xmloadfloat4.aspx

Finally you can attempt to align the containing class/struct that the XMMATRIX/XMVECTOR is a member of. You essentially declare the class/struct in your header with __declspec(align(16)), and then make sure you declare the member variables with the XMVECTOR/XMMATRIX types first, to ensure the alignment. For example:




__declspec(align(16)) class CMyClass
{
public:
      bool SomeFunc();
      void SomeOtherFunc();
private:
      XMMATRIX m_world;
      XMVECTOR m_camPosition;
      int m_someOtherMember;
      bool m_yetAnotherMember;
};

That should align the entire class/struct to 16 byte boundaries, and since the aligned XM types are declared first they begin at the requested alignment for the class/struct. I think that can get a bit messy (with virtual functions and inheritance possibly altering the data structure within a class), perhaps someone more experienced could provide better information on that. Personally I use SSE intrinsic DirectXMath types directly, and only compile for x64 target.

This helped me alot! I bet there is a more correct way, or even more efficient. But I simpy solved it with #define _XM_NO_INNTRINSICS_

Seemed to be alot easier for me, anything that would make this way of doing things more complicated or even wrong? Seemed to have solved my problem for now though.

Glad that it helped. Using #define _XM_NO_INNTRINSICS_ will just make your Release builds work the same as Debug builds with aligned XM types. It's not wrong per-se, but if you're doing a lot of matrix or vector operations, it's going to cost a significant amount of extra CPU utilization.

If you are doing a fair amount of vector/matrix operations (like 50 or 100+ per frame) and don't want to build for x64 target, I'd strongly recommend using the XMLoad and XMStore functions with temporary XMVECTOR/XMMATRIX local variables. They're really easy to use and you'll see a noticeable drop in cpu usage versus _XM_NO_INNTRINSICS_, despite the Load/Store overhead.

Aligning the allocations on the stack and heap are quite a bit more involved, and I can understand why you'd like to avoid that for now.

On that note I just wanted to say thanks to RobTheBloke and Alessio1989, because I wasn't aware of the details of heap alignment myself either. At some point I want to use AVX instructions and I'll need to align to 32 byte boundaries, so the additional info about aligned_malloc is appreciated.

quick edit:

Just for examples sake, assuming you store the world matrix for each object/mesh, this is how you'd use XMLoad when transposing a world matrix, for setting constant buffers:




//change your functions to take XMFLOAT4x4 arguments instead of XMMATRIX ones
void UpdateObjectConstantBuffer(ID3D11DeviceContext* context, const XMFLOAT4x4& worldMatrix)
{
//just use the XMLoad function as the XMMATRIX argument for the XMMatrixTranspose function
XMMATRIX transposedWorld = XMMatrixTranspose(XMLoadFloat4x4(worldMatrix));
 
//now map the constant buffer, copy over the transposedWorld matrix, unmap buffer, etc or however you did it before
...
}

So you'd use XMFLOAT4x4 instead of XMMATRIX for the storage type in your header files (e.g. class/struct members), then use function local XMMATRIX types only when you need them for DirextXMath library functions. The local transposedWorld is automatically aligned on the stack, so it's pretty simple, and more importantly it'll let you drop your _XM_NO_INNTRINSICS_ define, to take advantage of SSE intrinsic operation performance without worrying about manual alignment.

This topic is closed to new replies.

Advertisement