We've had some reports of weird performance with our vector class (which is basically a thin binding to glm::vec3
). See the following example:
vec3 v;
vec3 p(0.1f, 0.1f, 0.1f);
for (int i = 0; i < 100000; i++) {
// Slowest: (15.0 - 15.4)
// v += vec3(0.1f, 0.1f, 0.1f);
// Faster: (7.9 - 8.1)
// v.Add(0.1f, 0.1f, 0.1f);
// Faster: (5.7 - 5.9)
// v += p;
// Fastest: (4.5 - 4.7)
v.x += 0.1f;
v.y += 0.1f;
v.z += 0.1f;
}
I've commented the operations from slow to fast. It makes sense for the first one to be the slowest, because it has to construct a vec3
object before passing it to opAddAssign
.
For testing, I've added an Add
method that just takes 3 floats, which is slightly faster.
Passing an existing vec3
object to opAddAssign
is a bit faster than that (because it only has to put 1 value on the stack instead of 3, is my guess?)
And then finally, manually inlining the code is the fastest. This is somewhat surprising to me, because I would've thought that 1 call into native code would be faster than 3 lines of script.
Do you have any idea what is going on here? Is there anything that could potentially make this faster in our bindings?
If this is expected and intentional, would it make sense to have some kind of inlining optimization? I found a thread from 2014 about script function inlining ( https://gamedev.net/forums/topic/661308-function-method-inlining/ ), but it sounds like a difficult problem to solve.
For some extra info about our bindings, we use the following flags to register vec3
: (I know we should probably be using asOBJ_APP_CLASS_ALLFLOATS
but this doesn't seem to make a difference on Windows)
asOBJ_VALUE | asOBJ_POD | asOBJ_APP_CLASS_CDK | asGetTypeTraits<glm::vec3>()
And the following binding for opAddAssign
: (this is using a helper class to call asIScriptEngine
methods but there should be nothing surprising in the implementation of the helper class)
regVec3.Method("vec3 &opAddAssign(const vec3 &in)", asMETHODPR(glm::vec3, operator+=, (const glm::vec3&), glm::vec3&), asCALL_THISCALL);