Advertisement

Slow VB lock on Win10 and DX9

Started by June 15, 2018 11:40 AM
2 comments, last by RobM 6 years, 7 months ago

Hi

I’ve been working on a game engine for years and I’ve recently come back to it after a couple of years break.  Because my engine uses DirectX9.0c I thought maybe it would be a good idea to upgrade it to DX11. I then installed Windows 10 and starting tinkering around with the engine trying to refamiliarise myself with all the code.

It all seems to work ok in the new OS but there’s something I’ve noticed that has caused a massive slowdown in frame rate. My engine has a relatively sophisticated terrain system which includes the ability to paint roads onto it, ala CryEngine. The roads are spline curves and built up with polygons matching the terrain surface. It used to work perfectly but I’ve noticed that when I’m dynamically adding the roads, which involves moving the spline curve control points around the surface of the terrain, the frame rate comes to a grinding halt.

There’s some relatively complex processing going on each time the mouse moves - the road either side of the control point(s) being moved, is reconstructed in real time so you can position and bend the road precisely. On my previous OS, which was Win2k Pro, this worked really smoothly and in release mode there was barely any slow down in frame rate, but now it’s unusable. As part of the road reconstruction, I lock the vertex and index buffers and refill them with the new values so my question is, on windows 10 using DX9, is anyone aware of any locking issues? I’m aware that there can be contention when locking buffers dynamically but I’m locking with LOCK_DISCARD and this has never been an issue before.

Any help would be greatly appreciated.

There's some pitfalls around the exact parameters used when creating the buffers (which pool/usage/etc), and in how your code actually writes to the buffers. The size also matters -- drivers might do optimized dynamic streaming for small buffers, but not large ones... Drivers will ideally use write-combined pages, which are faster to write to but eternally slow to read from -- so if your terrain generation code (even accidentally, or automatically via compiler optimizations) ever reads from the mapped regions, that can be an issue too. Some basic profiling should isolate where the time is being spent (inside lock/unlock/your own code), or to track down actual unwanted CPU<->GPU synchronization events, you can do more in-depth profiling with GPUView.

Advertisement

Thanks Hodgman.  I'll have a look at GPUView.

This topic is closed to new replies.

Advertisement