Not sure if this is a bug of a feature, but apparently this code always generates column-major matrices:
StructuredBuffer<float4x4>
This is regardless of using D3DCOMPILE_PACK_MATRIX_ROW_MAJOR or #pragma pack_matrix(row_major).
Anyone has an elegant way to fix it? It's a real irritating 'feature'.