Seeing effect of vertex cache
Hi,
I was messing around trying to see how much difference cache-friendly data made by rendering a 128 by 128 grid of vertices using strips of different widths, i.e. with the grid:
[pre]
000 001 002 003 004 005 ...
128 129 130 131 132 133 ...
256 257 258 259 260 261 ...
... ... ... ... ... ... ...
[/pre]
a strip of width 2 would use indices:
[pre]
000 128 001 129 128 256 129 257 ...
[/pre]
and a strip of width 4 would use indices:
[pre]
000 128 001 129 002 130 003 131 128 256 129 257 ...
[/pre]
I was expecting to see a sudden drop in performance as the strip width exceeded about 8 vertices in width as vertices would no longer be cached under a simple cache policy. This didn''t happen. Instead I saw a gradual improvement in performance right the way up to strips the entire width of the grid.
Does anybody know how the vertex cache is organised (i.e. what caching policy is used), why my program seemed to show no vertex cache effects and how to write a program that does demonstrate vertex cache effects?
Thanks,
Enigma
you would see performace gain only if your app is transformation bound (eg: using complex vertex program for huge meshes)...
You should never let your fears become the boundaries of your dreams.
You should never let your fears become the boundaries of your dreams.
You should never let your fears become the boundaries of your dreams.
BTW, NV1x pipes have 16 vertex cache entries (12 due to pipelining) while NV2x have 24 (actually 16), so you would not see any drop when your strip is over 8 vertices, since in fact the cache can still get it.
As DarkWIng said, if your app is not xform bound, there will not be any gain at all.
Another thing to keep in mind is that making smaller strips actually takes more CPU time, more time to travel to the bus, more driver hassle, less unleashed GPU power. So, unless you want to use degenerate tri strips, leave the thing out and just go for a longest-possible-strip, which will give you the fastest performance.
As DarkWIng said, if your app is not xform bound, there will not be any gain at all.
Another thing to keep in mind is that making smaller strips actually takes more CPU time, more time to travel to the bus, more driver hassle, less unleashed GPU power. So, unless you want to use degenerate tri strips, leave the thing out and just go for a longest-possible-strip, which will give you the fastest performance.
Previously "Krohm"
Yep, it was the calls to glClear and glLoadIdentity that I''d left in that were obscuring the results. Taking those out shows an increase in indices/second up until a triangle strip of width 5 or 6 (ten or twelve vertices per strip, half reused in next strip) and then performance degrades again as the strip width increases. Total performance is still better with longer strips because fewer indices are sent. It would appear that there is no need to consider the vertex cache unless the transform takes considerably longer than normal. Is there anything that can affect transform time except for vertex programs, which _DarkWIng_ already mentioned?
Cheers,
Enigma
Cheers,
Enigma
All per-vertex oprations cost some time (light, fog,...) but VP can be the worst possible example. Just try using a 127 instructions long VP (just fill it with crap instructions) and you''ll see.
You should never let your fears become the boundaries of your dreams.
You should never let your fears become the boundaries of your dreams.
You should never let your fears become the boundaries of your dreams.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement