It is a long time since i last did the basics. Meanwhile i have my framework and don't care about the minutious buffer handling any more ? You were right, the buffer with the data does not need to be bound, the vertex array suffices.
In principle, you can divide your vertex data among buffers howver you like, interleaved, in separate buffers, or sequential. Thus says the Red Book. I allways use interleaved just because it is simpler imo, and i would expect acces to be faster when data for one vertex can be read as a block of memory, but that is speculating.
Padding is equally unnessary.
You'll probably have to do classic debugging. Isolate functionality, check portions, and all that ...