Hi!
There's something strange going on with my OpenGL bindless textures test performance-wise.
(My system is Windows 7 Embedded, 16 GB RAM, 10GB NVIDIA Geforce 1080Ti with driver version 388.00)
I've been testing OpenGL bindless textures as part of our 3-D engine. My app creates 512 pieces of 1024x1024 RGBA8 textures with a single mip level and fills them with constant data. After this, it gets the bindless texture handles for each texture and calls glMakeTextureHandleResidentARB on them. After the operation each of my 512 textures return true for the query glIsTextureHandleResidentARB. Also, my 2 gigabytes of textures is reflected on the GL_GPU_MEMORY_INFO_CURRENT_AVAILABLE_VIDMEM_NVX query as expected. This process is performed once in my frame loop. I never touch them after this init phase is done. My test doesn't even draw anything using them. The engine in which this test is incorporated does draw a lot of other stuff, but not using those textures.
So now I have 2 gigabytes of bindless textures resident.
The engine I'm working uses two GL contexts. One for the window on the desktop and one for a hidden window. Basically the hidden one is used when rendering to offscreen targets and the window context is used when displaying the final result to the user.
When my 2 gigabytes of textures are resident, wglMakeCurrent switching between those two contexts costs a lot. It costs around 0.4-0.5 milliseconds. With all the necessary switches during a frame that can total 1-1.5 milliseconds which is a lot for a 60 Hz app. Without bindless textures resident a GL context switch costs at most 0.1 ms.
The GL context switch cost depends on the total size of the textures so it must be doing something on a texel-level. I profiled the app using AMD CodeAnalyst. When my textures are resident the module "dxgmms1.sys" lights up taking 3% of profiler samples. The module contains stuff named "VidMmInterface" and this module is used both by my app and the kernel (PID 4). So I guess it's doing some video memory management on the context switch. But why? There's plenty of video memory available with gigabytes to spare when the textures are resident.
My test makes them resident only on one of the contexts. If I make them resident on both contexts, the cost doubles.
So my bindless textures incur a cost that's paid on every switch between the two contexts after they have been made resident. If they are resident in a context that's never made current, then there's no cost.
EDIT: The two contexts are on the same share group (wglShareLists)
Has anyone ever used bindless textures with multiple GL contexts? Have you noticed performance problems?
Best regards,
Jani