Advertisement

physx chip

Started by April 20, 2006 07:42 PM
223 comments, last by GameDev.net 18 years, 5 months ago
Quote: Original post by Kylotan
So is graphics. You're not making a valid point here. The key is in which operations you optimise for. Having written software renderers and physics engines, they both have very distinctive operations.

That is very funny I also had written software renderers and physics engines. I never came across an operation that was exclusive to physics that could not be used in graphics, and vise verse. In fact as I remember I was always able to brake my operataion into a set of dot products, cross products,
Multiply and add, memory moves, look up tables and so on, kind of think I use the same instruction for any other algorithm.
I am curios at to what operation did you find that was exclusive to physic that could not be implemented on a GPU or a CPU?



Quote: Original post by Anonymous Poster
Quote: Original post by Kylotan
So is graphics. You're not making a valid point here. The key is in which operations you optimise for. Having written software renderers and physics engines, they both have very distinctive operations.

That is very funny I also had written software renderers and physics engines. I never came across an operation that was exclusive to physics that could not be used in graphics, and vise verse. In fact as I remember I was always able to brake my operataion into a set of dot products, cross products,
Multiply and add, memory moves, look up tables and so on, kind of think I use the same instruction for any other algorithm.
I am curios at to what operation did you find that was exclusive to physic that could not be implemented on a GPU or a CPU?


well i dont know what kylotan was referring to exactly, but one obvious example is that in modern gpus the ideal ratio of arithmetic ops to texture ops is 3:1 or greater (and i read the average for shader programs now is like 5:1, so this is only going to get more extreme, and may have already), while many physics algs (solvers, relaxation techniques) approach 1:1.

in addition, the memory access patterns are very different...to get optimal performance out of a gpu you need fairly coherent memory access patterns (texture coords don't usually jump around very much), whereas in physics you have no guarantee of such a pattern, in fact you often need very incoherent memory access.

there's a big difference between if a program can run on a platform and if it can run efficiently. all the theoretical flops in the world won't help you if your processor is waiting for data.
Advertisement
Quote: Original post by Anonymous Poster
well i dont know what kylotan was referring to exactly, but one obvious example is that in modern gpus the ideal ratio of arithmetic ops to texture ops is 3:1 or greater (and i read the average for shader programs now is like 5:1, so this is only going to get more extreme, and may have already), while many physics algs (solvers, relaxation techniques) approach 1:1.
in addition, the memory access patterns are very different...to get optimal performance out of a gpu you need fairly coherent memory access patterns (texture coords don't usually jump around very much), whereas in physics you have no guarantee of such a pattern, in fact you often need very incoherent memory access.


If you need lots of memory access, random memory access, you then use CPU, not GPU. CPU is designed for that kind of work, and I don`t think you can do hardware that can do much better than current CPUs.
Quote: Original post by Anonymous Poster
Quote: Original post by Anonymous Poster
well i dont know what kylotan was referring to exactly, but one obvious example is that in modern gpus the ideal ratio of arithmetic ops to texture ops is 3:1 or greater (and i read the average for shader programs now is like 5:1, so this is only going to get more extreme, and may have already), while many physics algs (solvers, relaxation techniques) approach 1:1.
in addition, the memory access patterns are very different...to get optimal performance out of a gpu you need fairly coherent memory access patterns (texture coords don't usually jump around very much), whereas in physics you have no guarantee of such a pattern, in fact you often need very incoherent memory access.


If you need lots of memory access, random memory access, you then use CPU, not GPU. CPU is designed for that kind of work, and I don`t think you can do hardware that can do much better than current CPUs.


out of order execution? wasteful caches? deep pipelines? no thank you!

there's a reason that it is fairly trivial to outperform physics on a cpu by using a gpu, even if you arent using the gpu very efficiently.
Quote: Original post by Anonymous Poster
well i dont know what kylotan was referring to exactly, but one obvious example is that in modern gpus the ideal ratio of arithmetic ops to texture ops is 3:1 or greater (and i read the average for shader programs now is like 5:1, so this is only going to get more extreme, and may have already), while many physics algs (solvers, relaxation techniques) approach 1:1.

Good remark. But for GPUs that ratio is texture accesses, not generic memory access. Texture sampling units are expensive in terms of chip area, so for example the Radeon X1900 has 48 shader units and 'only' 16 texture samplers. But nothing limits them to adding many channels for reading from memory directly. In fact that's what Direct3D 10's 'load' instruction is for. Also look at Direct3D 9's constant registers. They could be considered an early form of memory access, and I'm sure they're accessed 1:1 on average.
Quote: in addition, the memory access patterns are very different...to get optimal performance out of a gpu you need fairly coherent memory access patterns (texture coords don't usually jump around very much), whereas in physics you have no guarantee of such a pattern, in fact you often need very incoherent memory access.

As far as I know, latencies for texture sampling are already much higher than arithmetic instructions, and GPUs know how to deal with that. In fact the Radeon X1900 uses a technique called Ultra-Threading. What it comes down to is that while waiting for a long latency instruction, other pixels can execute their arithmetic instructions.

There are little details about PhysX's approach to memory access latency hiding. What's known is that they don't use a cache so in the best case they use something similar to ATI's Ultra-Threading (this would explain PhysX's 'internal memory').

Also, let's not forget that for direct memory accesses Direct3D 10 cards will in the worst case only have to deal with memory latencies (1.2 ns nowadays), not sampler latencies. So they'll definitely be able to handle physics processing effciently.
Quote: there's a big difference between if a program can run on a platform and if it can run efficiently. all the theoretical flops in the world won't help you if your processor is waiting for data.

I couldn't agree more. GPUs definitely have extreme SIMD units well suited for physics but they need to be kept busy by efficiently feeding them with data. Now, nobody knows exactly what next-generation GPUs will look like, but I think we would gravely underestimate them to think they won't have at least the same capabilities as a PPU. Direct3D 10 has been in development for many years now and there are strong indications that R600 and G80 will have radically new architectures. The Radeon X1900 and Xenos probably show the best view of where the future is headed.

By the way, memory access is one of the reasons why I wouldn't rule out multi-core CPUs for reasonably efficient physics processing. They don't have the high number of SIMD units (although Core 2 is extremely impressive compared to Pentium 4), but they have a high clock frequency and most of all memory access with very low latency thanks to large caches.
Quote: Original post by C0D1F1ED
Here's another intriguing fact: In 'Ghost Recon: Advanced Warfighter' there are separate setting for software mode and PPU mode. So you can't select the same level of physics detail to have a fair comparison. Now why would they do that?

I can tell you for a fact that the reason you can't enable enhanced physics without the PhysX card is because they don't want stupid end-users turning it on and than calling support because the game runs too slow.

If you don't understand the above reasoning than you obviously don't know how many idiots call tech support wasting company time and money on obvious things.

Advertisement
Quote: Original post by Saruman
I can tell you for a fact that the reason you can't enable enhanced physics without the PhysX card is because they don't want stupid end-users turning it on and than calling support because the game runs too slow.

Point taken.

Still I'd love to see a head-to-head comparison between PhysX and a powerful dual-core processor (and Core 2 once it's out). If PhysX is significantly faster (in an actual game) then people will be more inclined to buy it. At the moment I can't help the feeling that they're trying to avoid such comparison. I fully understand the tech support argument, but they could have just made it an option in some partially hidden config file and displayed a clear warning that PhysX hardware is recommended for the highest setting. Now it just sounds like a cheap excuse.

Sorry for the scepticism. It's just that they release so little technical details and their marketing is so incredibly slick that it's hard to believe it's truely that great. To go mainstream they still have a lot to prove. GRAW is a fantastic opportunity to do that and some easily avoidable tech support problems stops them? I don't know...
Quote: Original post by C0D1F1ED
Still I'd love to see a head-to-head comparison between PhysX and a powerful dual-core processor (and Core 2 once it's out). If PhysX is significantly faster (in an actual game) then people will be more inclined to buy it. At the moment I can't help the feeling that they're trying to avoid such comparison. I fully understand the tech support argument, but they could have just made it an option in some partially hidden config file and displayed a clear warning that PhysX hardware is recommended for the highest setting. Now it just sounds like a cheap excuse.


I would like to see an ATI card that is used by their own physics API in this comparison too. On guy from ATI tells me today that the current main problem for “Physics on GPU” are the graphics APIs but they believe that D3D10 is a great step in the right direction.

Another problem with such a comparison is the current state of the PhysX system software that runs on the 64 Bit MIPS Core and the VPUs. Even after the long time it still doesn’t support anything that the software engine contains. This let me believe that they don’t have implemented many optimizations yet. But even after this step I have the strong felling that a D3D10 GPU at the same price point maybe with a custom API will be stronger than PhysX.
Quote: Original post by C0D1F1ED
Sorry for the scepticism. It's just that they release so little technical details and their marketing is so incredibly slick that it's hard to believe it's truely that great.

Hey I totally understand I used to actually be sceptical of it in the beginning. Once I actually saw an engine and benchmarks running the same application both with software and on the PhysX chip I realized how uber it is :)
Quote: Original post by C0D1F1ED
Good remark. But for GPUs that ratio is texture accesses, not generic memory access. Texture sampling units are expensive in terms of chip area, so for example the Radeon X1900 has 48 shader units and 'only' 16 texture samplers. But nothing limits them to adding many channels for reading from memory directly. In fact that's what Direct3D 10's 'load' instruction is for. Also look at Direct3D 9's constant registers. They could be considered an early form of memory access, and I'm sure they're accessed 1:1 on average.

no offense, but this comment seems rather silly. a new API isn't going to fundamentally change the architecture of the chip. furthermore, card designers are going to move toward real life shader usage, ie that 5:1 ratio, not away from it. its just cost/benefit, transistor/benefit, whatever you want to call it.

witness floating point buffers. 32bit/component textures are unfiltered any ops you want to do with them you have to do manually through shaders...but its still through samplers.

This topic is closed to new replies.

Advertisement