BTs are not really a superset of FSM, because a tree does not contain any cycles. I'm sure, that you can implement the same or atleast similar logic in both representations, but it will most likely get more complex in one of the two representations.
BTs are very flexible and easy to expand, therefor you can integrate other AI logic as special nodes, therefor adding FSMs should not be a problem. Both, BT and FSM, have kind of memory (path in BTs and state in FSM), but the follup interaction is strongly defined in FSM (state transitions) whereas a BT have the option of interrupting running actions and initiating new actions (priority nodes).
My rule of thumb is to use BTs where you have more script like behavior, eg the approximation of the AI of a human being and to use FSM where you have more rule/state based behavior.