Hello!
Recently I have decided to test the new System.Numerics.Vectors library that comes with .NET 4.6. I have downloaded the last NuGet package 4.1.0 and tested it with Visual Studio 2015. The problem I have encountered is that the library seems to perform much slower than what I would expect. For instance, here is a sample C# program and a corresponding C++ program for comparison:
static void Main()
{
var vector = new Vector4();
var matrix = new Matrix4x4();
var stopwatch = new Stopwatch();
stopwatch.Start();
for (int index = 0; index < 1000000000; index++)
{
vector = Vector4.Transform(vector, matrix);
vector = Vector4.Transform(vector, matrix);
vector = Vector4.Transform(vector, matrix);
vector = Vector4.Transform(vector, matrix);
vector = Vector4.Transform(vector, matrix);
vector = Vector4.Transform(vector, matrix);
vector = Vector4.Transform(vector, matrix);
vector = Vector4.Transform(vector, matrix);
vector = Vector4.Transform(vector, matrix);
vector = Vector4.Transform(vector, matrix);
}
stopwatch.Stop();
Console.WriteLine($"Time: {stopwatch.Elapsed}");
}
int main()
{
XMVECTOR vector{};
XMMATRIX matrix{};
time_t start;
time_t final;
time(&start);
for (int index = 0; index < 1000000000; index++)
{
vector = XMVector4Transform(vector, matrix);
vector = XMVector4Transform(vector, matrix);
vector = XMVector4Transform(vector, matrix);
vector = XMVector4Transform(vector, matrix);
vector = XMVector4Transform(vector, matrix);
vector = XMVector4Transform(vector, matrix);
vector = XMVector4Transform(vector, matrix);
vector = XMVector4Transform(vector, matrix);
vector = XMVector4Transform(vector, matrix);
vector = XMVector4Transform(vector, matrix);
}
time(&final);
cout << "Time: " << final - start << endl;
return 0;
}
This is by no means an accurate benchmark. I am including this sample code just to give you a clue what I am talking about. For instance, the above C# code completes in more than 2 minutes, while the similar C++ code takes about 50 seconds. I am testing on a relatively fast i7 CPU, x64 targets in Release mode, to make sure RyuJIT does its SIMD magic, or at least Vector.IsHardwareAccelerated returns True during my tests.
I was expecting the C++ code to be faster than the C# one, but 3x faster seems like a lot to me. It is almost as if the C++ compiler emits real SIMD code, while the C# JIT does not do that at all, even though Vector.IsHardwareAccelerated is True. I guess I am doing something wrong, or perhaps benchmarks like this are simply too inaccurate and give me misleading results. I have searched for more accurate benchmarks online, that can give me some pointers what performance to expect from System.Numerics.Vectors versus a pure native C++ implementation, but I have failed to find anything so far.
I need to evaluate whether System.Numerics.Vectors is fast enough for implementing a certain computationally expensive task, consisting mostly of vector and matrix math, or I would be better off implementing the algorithm in C++ and call it from my C# code via PInvoke. Since I am a newbie to this and I am not very experienced in C++, I would prefer to stick with C# if possible.
Regards.