You can set the target matrices to their according rest pose bone matrices. Then the model should look the same as in the modeling app without animation, and the 'combined' matrices should all end up being identity.
To test just if the vertices load correctly, you can ignore all matrices and render them without skinning. Again the model should look like in modeling app.
But there will be exceptions:
Often the model is already cut into pieces, and those pieces are parented under various bones or other nodes in the transform hierarchy.
This adds some additional complexity, as you now need the skeleton matrices to render them correctly even without animation or skinning. (You need to transform each piece by its parent hierarchy node.) This makes sense at least for props like guns or swords, usually parented by the hand bone so they animate properly when the hand is animated.
So for your initial test models you have it easier when the whole mesh is parented by the root.
Thinking of it, if you accidently parented your box model under one of the bones, this could explain the scaling you described eventually.