As far as surface allocation, you can specify with flags what type of memory the surface uses in the DDSCAPS2 member of the DDSURFACEDESC2 (this is DX7, but I remember the same functionality with older versions).
I would guess that a lot of the performance is due to the amount of memory your card has. What are the specs on the card?
------------------
-Kentamanos