Vertical clipping can be faster (because I can still use 32 bits memcopy) but horizontal clipping can be a pain in the neck (checking if clipped number is divide-able by 4 etc...) however in 32 bits color this wouldn't apply.
I still have much to learn about DX.