Quote:
Original post by Anonymous Poster
huh? just because they have generalized buffers does not mean memory access mechanisms or patterns are going to change.
They will definitely optimize unsampled memory access. If it would use the texture units it would be a big waste. The R580 is already greatly hampered by its low number of texture units. Lots of big surfaces like walls still use shaders with a high ratio of texture samples. Also, with a unified architecture they have to be able to execute vertex, geometry and pixel shaders. So even though pixel shaders might evolve to 5:1 ratio they clearly need massive bandwidth without sampling for vertex/geometry processing.
So I'm sorry but memory access mechanisms and patterns are most definitely going to change. And GPGPU processing, including physics, will benefit.
Quote:
the only real changes that are going to happen are adding the geometry/primitive shader stage and to make it easier to shuffle memory around between stages. youre still going to be accessing memory coherently (nice thing about textures and vertex buffers) and not that often (in the case of shaders)
Even for coherent accesses the latency of texture sampling is high. Just think of anisotropic filtering, now considered standard. It uses several bilinear samples so even if all accessed texels are in the texture cache it's going to take many clock cycles. Still, we observe that the performance hit of anisotropic filtering is very modest for modern hardware. So GPUs must be excellent at hiding long latencies. Ultra-Threading as ATI calls it is one effective approach.
And again the essential part of it all for this discussion is that this is very beneficial for physics processing. They just have to combine it all into the next-generation Direct3D 10 graphics hardware.
Quote:
what do you think you'll be doing with those bound buffers? accessing them in shaders, at an average of 5:1 arithmentic ops to texture ops, or you'll just have a stalling alu...
They are not texture operations. They don't have to use the expensive sampling units at all. They are memory operations and yes they can read from textures but it's totally different from sampling. If they do things right they could easily have a 1:1 ratio without stalling any ALU. The latencies are likely higher than arithmetic operations but that's where Ultra-Threading ensures the ALUs are not idle.
Quote:
first, i'll try and find the interview with one of the head ati engineers where he says they are moving toward higher ratios, as that is the real world performance desired.
I already said that I believe this, and I think I even recall reading the actual interview when R580 was introduced.
Quote:
second, binding buffers and accessing them in a shader unit are different things. just because memory is available to you doesnt mean it is free to access. this (theoretically, cause there is little literature on the physx card for us peons) would be the advantage of faster, more incoherent memory access.
The essential thing is that there's no technical reason to assume that next-generation GPUs would be significantly less efficient at accessing memory than a PPU. Note again my remark about unified architectures above. They just can't afford having slow memory access. Texture access in pixel shaders, yes, that ratio is likely to change significantly, but unsampled access in vertex shaders run on the same same unified shader units need all the efficient memory access they can get.
Quote:
don't know if english is your first language, but this comes off a bit as being an ass.
It's my fourth language but I think I wrote exactly what I intended. My sincere apologies if it was offending, that certainly was not the intention. GPU architectures are quite obviously going to change drastically for Direct3D 10. That was my only point. Just look at
Xenos for a peek into the future. It in no way resembles the 'classical' Direct3D 9 architecture introduced by R300.