We have several articles and tutorials on the site about using state machines as you described, but which to recommend would depend on which problem you're encountering.
I notice a couple things.
Generally state transitions should be instantaneous, if you need to animate something then you need an additional state that represents the transition. You may need time for entry or exit animations, you may need time for walking or traveling, you may need time to idle while a partner moves into position, each of those are also full states that need their own logic, they're typically not instant transitions.
Second, it looks like you've merged multiple state machines together. You have one state machine that you have State A and State B. You have another state machine that is playing animations. When you try to merge state machines you end up getting a partial combinatorial explosion, all the combinations of one multiply by all the combinations of the other, pruned to whatever ones you consider valid. That explodes in size.
With that in mind, you're probably headed in the direction of behavior trees. They're basically an organized way of asking questions, and triggering behaviors in response:
- Am I almost dead and in danger? → Run
- Am I being attacked? → Fight
- Am I tired? → Sleep
- Otherwise → Idle
If you need a transition within them, then each behavior would have three states: [Enter], [Run loop], [Exit]. Run the run loop until the behavior tree says to do something different.
Behaviors can have nested actions. A fight run loop might have moving around and playing variations on fighting animations, with jabs, blocks, parries, or whatever.
There are many ways to implement the concept, from states as classes as a shallow and broad inheritance tree, to families of functions and function pointers, to switch statements, to a bunch of if or if/else trees. Those are progressively easier to implement but more difficult to maintain, and all can work depending on the size and complexity of your systems.
Going further, behavior trees aren't goal directed, they're typically based on a hierarchy. You start at the beginning, you check the current node and if true, run the associated machine, otherwise advance to the next node. The final node should always run as an idle loop, if nothing else is triggered then you do that behavior. There are more advanced systems that will prioritize and weight behaviors, and systems that will plan a series of behaviors, but even these use the state machines as building blocks. running machine after machine, rather than trying to merge the state machines together.