Advertisement

Only 12 Enemies, And My Fps Drops To 30, Why Is That?

Started by July 31, 2016 10:44 PM
31 comments, last by Heelp 8 years, 3 months ago
Guys, I have 7 animated enemies. And my fps is 62 for now( not capped). But when I add 5 more enemies and make them 12, my fps drops to 31. I traced the problem and I finally found it, it's my BoneTransform() function, which fills my vector of TransformMatrices that I use in the vertex shader in order to animate the skeleton. But it rapes my CPU. ( when I comment the BoneTransform() function, framerate goes from 30 to 166!( sometimes jumps between 166 and 200 ). And I kind of stole most of the function from a tutorial on skeletal animation, and I'm sure it's pretty optimized, so there must be some other reason.

[attachment=32770:lowfps.gif]

I used some models from World of Warcraft. And the interesting thing is that I have the game, and when I play it( when I play WoW ), I can have 20 players around me, and my fps is great, but when I add the same models in my own game, my fps drops like crazy and it's 10 times slower than the original game, why? ( bear in mind that I haven't even loaded any map, I just spawn 12 enemies walking on air, and my cpu runs like a fat truckdriver, wtf is that?? ).

Enemies shouldn't "have" a deltaTime, you should just pass them the current deltaTime to their update function.

(right now it looks like you are mixing your updating, player input event processing, AI thinking, and rendering, all in one function)

I used some models from World of Warcraft. And the interesting thing is that I have the game, and when I play it( when I play WoW ), I can have 20 players around me, and my fps is great, but when I add the same models in my own game, my fps drops like crazy and it's 10 times slower than the original game, why?

Because it's not about what data you're loading, it's about how your code uses it. Well, okay, it's about the data and the code working together.

Your code and WoW's code is different, and thus your framerate and WoW's framerate is different.

(Make sure you don't use WoW's models in any copy of your game you distribute publically, btw - that's copyright infringement)

Advertisement

Thanks for the answer.

But!! :)

The way I see it, there are two ways.

First way: Pass the deltaTime to each function in the Enemy class. The problem with this method is that if I have 100 movement functions, I need to pass deltaTime 100 times every frame.

Second way: Pass the deltaTime in the class as a variable each frame. The pros are that I pass the deltaTime only once per frame and I can use it by 100 functions if I want to. The second way sounds better, because you pass the variable only once and use it as much as you want.

And about the fps drop, what would you suggest? I mean, what should I do, I'm pretty sure I have loaded the skeletal animation correctly, because most of the code is stolen from different tutorials, and I don't know what's wrong.

I think the problem is that when I start the game, it uses only 1 core on my laptop, but when I start WoW, all 4 cores are used, can this be the problem?

I have 7 animated enemies. And my fps is 62 for now( not capped). But when I add 5 more enemies and make them 12, my fps drops to 31. I traced the problem and I finally found it, it's my BoneTransform() function, which fills my vector of TransformMatrices that I use in the vertex shader in order to animate the skeleton. But it rapes my CPU.

First step, use your profiler and make sure you are calling it the right number of times. A common error is to call functions more often than needed.

Second step is to get all the performance numbers for the calls that you can using your profiler. Frames per second is a nearly useless number for profiling, you need specific times of specific functions in nanosecond or microsecond resolution.

If it is still running that slow and you don't know how to improve any specific performance number, share the performance problem, your profiling numbers and counts, and the source code in the appropriate area of the site (Graphics Programming, OpenGL/Vulkan, or Direct3D) that matches your code. If you've got more than 50 or so lines of code, consider using a paste site rather than doing a source dump in the discussion forums.

Ok, thanks frob, I will definitely search for a profiling tool because I haven't used one so far. Seems to me that this will take time to fix, so I will leave it for some later moment.

Another question came to mind and I don't want to make another post.

I loaded a very simple map for my paintball game. The map has a floor with a couple of walls and that's it. The floor is perfectly flat and the walls... well, I copied the floor, rotated it to 90 degrees and I made the walls with it.

And I need a very, very simple collision detection for this map. The only two options that come to mind are:

1.Make AABBs for the floor and the walls and do checks every frame.

2.Take a picture of the scene from above and store the depth values in a framebuffer and somehow use them to decide if the player is going to collide.

Is there something else I can do, and if not, what option should I choose from these two?

You should start a new topic to ask a different question.
Advertisement

It reads like you hit your geom limit, some times known as a object limit.

Lucky it is easy to test, double the poly count of the animated model and note the frame rate, then use a very low(100 polygon) animation model an note the frame rate.

If you still run at the same frame rate with low polygon models and high polygon models, give or take a frame or two, then it's the geom limit.

If you get low frame rate no matter the poly count it is often the geom limit or the shaders in my experience.

PC graphic cards can only render so much objects at real time, this is known as the geom limit or object limit. You can batch models into one model to use less objects or you can use instancing.

For animation objects you want instancing as dynamic batching can be very hard and unpredictable.

The problem is that this can be many things from draw calls to bad programming, modeling and many other things.

First way: Pass the deltaTime to each function in the Enemy class. The problem with this method is that if I have 100 movement functions, I need to pass deltaTime 100 times every frame.

Passing a single float, even 100 times, is very cheap, if not free. You're pre-optimizing and obfuscating code without any real gain.

ferrous, true story man, I always forget to think before optimizing..

Scouting Ninja, I don't think I've hit any limits because in the original game my pc can handle 10 times more enemies and everything is ok, but I will give it a try, I just need to change the models poly count with blender, thanks for the idea by the way.

Guys, I solved it.


const aiNodeAnim* Model::FindNodeAnim( const aiAnimation* pAnimation, const string NodeName )
{
    for ( uint i = 0 ; i < pAnimation->mNumChannels ; i ++ )
    {
        const aiNodeAnim* pNodeAnim = pAnimation->mChannels[i];

        if ( string( pNodeAnim->mNodeName.data ) == NodeName )
        {
            return pNodeAnim;
        }
    }

    return NULL;
}

This function here swallows 250 of my fps per second( from 380 to 30 ) and this is the function that finds the proper animation for every node in the model. Basically it counts from 0 to 120( in my case ), and for every loop it does a string comparison in order to find the proper animation for the node. Can you believe it? I still can't. This function is placed in a recursive function called ReadNodeHierarchy() that reads all the nodes matrices and calculates the interpolation and consequently, the final transformation matrix.

Here is the function:


void Model::ReadNodeHeirarchy( float AnimationTime, const aiNode* pNode, const Matrix4f& ParentTransform, int currentAnim )
{
    string NodeName( pNode->mName.data );

    const aiAnimation* pAnimation = this->scene->mAnimations[currentAnim];

    Matrix4f NodeTransformation( pNode->mTransformation );

    const aiNodeAnim* pNodeAnim = FindNodeAnim( pAnimation, NodeName ); //Only this function swallows 250 fps, believe it or not.

    if ( pNodeAnim )
    {
        // Interpolate scaling and generate scaling transformation matrix
        aiVector3D Scaling;
        calcInterpolatedScaling( Scaling, AnimationTime, pNodeAnim );
        Matrix4f ScalingM;
        ScalingM.InitScaleTransform( Scaling.x, Scaling.y, Scaling.z );

        // Interpolate rotation and generate rotation transformation matrix
        aiQuaternion RotationQ;
        calcInterpolatedRotation( RotationQ, AnimationTime, pNodeAnim );
        Matrix4f RotationM = Matrix4f( RotationQ.GetMatrix( ) );

        // Interpolate translation and generate translation transformation matrix
        aiVector3D Translation;
        calcInterpolatedPosition( Translation, AnimationTime, pNodeAnim );
        Matrix4f TranslationM;
        TranslationM.InitTranslationTransform( Translation.x, Translation.y, Translation.z );

        // Combine the above transformations
        NodeTransformation = TranslationM * RotationM * ScalingM;
    }

    Matrix4f GlobalTransformation = ParentTransform * NodeTransformation;

    if ( boneMapping.find( NodeName ) != boneMapping.end( ) )
    {
        uint BoneIndex = boneMapping[NodeName];
        boneInformation[BoneIndex].FinalTransformation = GlobalInverseTransform * GlobalTransformation *
                                                    boneInformation[BoneIndex].BoneOffset;
    }

    for ( uint i = 0; i < pNode->mNumChildren; i ++ )
    {
        ReadNodeHeirarchy( AnimationTime, pNode->mChildren[i], GlobalTransformation, currentAnim );
    }
}

This magical statement gulps all my CPU power: const aiNodeAnim* pNodeAnim = FindNodeAnim( pAnimation, NodeName );

This is because the readNodeHierarchy() function is recursive. For example you have one transformation matrix for the fingers, but then you need to multiply by the arm transformationMatrix because the arm moves the fingers too, and then the body moves the arm which moves the fingers and so on and so on. And every time that happens, the findNodeAnim() function counts from 0 to 120( mNumChannels) in order to find the proper animation based on the node's name and it does string comparison and some other crazy stuff million times per second.

And this was the code from this tutorial: http://ogldev.atspace.co.uk/www/tutorial38/tutorial38.html

It does the job for a tutorial, it is readable, but it is very unoptimized. What I did is to cache all the animChannels' indices with their proper bone in a map container, now the same 12 enemies run on 100 fps instead of 30 fps, so 70 fps gained by caching all the animations.

here is the fps with 24 enemies.

[attachment=32776:gamefixed20fps.gif]

I wonder what crazy fps gain can be made if I cache the interpolation matrices too?

This topic is closed to new replies.

Advertisement