Advertisement

physx chip

Started by April 20, 2006 07:42 PM
223 comments, last by GameDev.net 18 years, 5 months ago
Quote: Original post by C0D1F1ED
Because for graphics work it's limited by the number of texture units.

as i alluded to in my last post, this is why they fail as physics processors as well...anything more than a simple advect step and youre totally screwed on the texture sampling. you could never do a complicated multi body system on that architecture without many, many passes (and nullifying the "graphics cardness" of it).

Quote: Physics processing is perfectly fitted.

no, its really not. in most systems you'll end up spending far, far more time integrating and solving linear systems than doing quick multiplies, which requires data access. and i dont know where you got this directx10 business...maybe got mixed up with the addition of virtual memory? memory access is memory access, whether or not its disguising itself as a shader.
not to mention the fact, brought up by several people: *you cant read data back from calculations done on the gpu and still have a playable game.*

Quote: Invest that money in the next generation DirectX 10 graphics card and you'll have better physics ánd graphics than with a current generation GPU plus a PPU.

that doesnt make sense and doesn't add up any way you slice it. you've gone from "i dont think it's cost effective" to making things up.
Quote: Original post by NickGravelyn
Quote: Original post by arithma
Wouldn't PhysX constrain you to a specific set of Physical Laws? What would happen when Quantum Physics are needed in games, or when Relatively should take part in simulation?
Who told you that gamers WANT real physics in games.

At infinity, we will simulate everything on a hyper paralleled processing unit. We are still in the transiant state of developement. In a hundred years, they will look back and laugh at us!


I don't think the hardware is hard-coded with how the "real world" works, just methods of calculating math, collisions, and things necessary for physics faster. I'm sure many aspects of the physics are programmed by the developer.

100% true
IMHO physics engine is yust a collision system with physics integrator & solver, plus some fancy fucntionality specific to each engine ...

@arithma
PhysX chip is like stated before a generic vectorized procesing unit with memory acces optimisations specific to physics, and onboard memory. How can that be constraingin ???

Quantum Physics in game ??? :-are you in to recreational drug use :)
Advertisement
Quote: Original post by justoas i alluded to in my last post, this is why they fail as physics processors as well...anything more than a simple advect step and youre totally screwed on the texture sampling. you could never do a complicated multi body system on that architecture without many, many passes (and nullifying the "graphics cardness" of it).

Why would that be any problem?
Quote: no, its really not. in most systems you'll end up spending far, far more time integrating and solving linear systems than doing quick multiplies, which requires data access. and i dont know where you got this directx10 business...maybe got mixed up with the addition of virtual memory? memory access is memory access, whether or not its disguising itself as a shader.
not to mention the fact, brought up by several people: *you cant read data back from calculations done on the gpu and still have a playable game.*

I'm sorry but I don't get your point. DirectX 10 shaders allow to read and write directly to memory. So there is not memory limitation. Also, reading back is absolutely no problem with PCI-Express. In fact the PhysX card might have a problem because it uses only legacy PCI. And if you were referring to synchronization problems; what makes you think that's any different with a physics card?
Quote:
Quote: Invest that money in the next generation DirectX 10 graphics card and you'll have better physics ánd graphics than with a current generation GPU plus a PPU.

that doesnt make sense and doesn't add up any way you slice it. you've gone from "i dont think it's cost effective" to making things up.

If it doesn't add up then why do both NVIDIA and ATI show us physics on the GPU? I guess they just make things up as well? With all due respect if you want to make a point I'd start to use sound arguments.
Quote: Original post by C0D1F1ED
Quote: Original post by justoas i alluded to in my last post, this is why they fail as physics processors as well...anything more than a simple advect step and youre totally screwed on the texture sampling. you could never do a complicated multi body system on that architecture without many, many passes (and nullifying the "graphics cardness" of it).

Why would that be any problem?

because then you've just repurposed your gpu to be an inefficient ppu!

Quote:
Quote: no, its really not. in most systems you'll end up spending far, far more time integrating and solving linear systems than doing quick multiplies, which requires data access. and i dont know where you got this directx10 business...maybe got mixed up with the addition of virtual memory? memory access is memory access, whether or not its disguising itself as a shader.
not to mention the fact, brought up by several people: *you cant read data back from calculations done on the gpu and still have a playable game.*

I'm sorry but I don't get your point. DirectX 10 shaders allow to read and write directly to memory. So there is not memory limitation. Also, reading back is absolutely no problem with PCI-Express. In fact the PhysX card might have a problem because it uses only legacy PCI. And if you were referring to synchronization problems; what makes you think that's any different with a physics card?

reading back is very much a problem...it is one of the number one bottlenecks in any gpgpu program also utilizing graphics. the main problem is not physical bandwidth, but context switches/memory thoroughput while doing other operations (in game textures, etc). while i don't have a lot of experience with directx, glReadPixels, for example, is quite slow and will kill performance if you try and do it every frame.

as for the directx 10 stuff, can you show me where it says that texture access is going to be fundamentally different (and not just with the addition of virtual memory)? you still have to get texture data to the shader units...this is why performance for shader ops has gotten better and better while texture access time has relatively lagged significantly.

this is the main "gotcha" for the ppu...lightning quick internal memory access and interconnects.

Quote:
Quote:
Quote: Invest that money in the next generation DirectX 10 graphics card and you'll have better physics ánd graphics than with a current generation GPU plus a PPU.

that doesnt make sense and doesn't add up any way you slice it. you've gone from "i dont think it's cost effective" to making things up.

If it doesn't add up then why do both NVIDIA and ATI show us physics on the GPU? I guess they just make things up as well?


see any of the literature about the nvidia/havok physics launch for more details. (for instance this inteview here) physics on the gpu are for gameplay *only* unless you want to waste your graphics card.


Quote: With all due respect if you want to make a point I'd start to use sound arguments.

maybe i was out of line, but i'm sorry if your rhetorical technique of condescending to people doesn't go over too well.
Quote: Original post by justo
because then you've just repurposed your gpu to be an inefficient ppu!

It really isn't inefficient. It has massive SIMD processing capabilities and a fast memory controller. The PPU is very much like a downscaled version of Cell (i.e. a CPU with multiple SIMD units). So as far as I know the PhysX chip has nothing specific that makes it more suited for physics processing than anything else, and the GPU lacks nothing to efficiently do physics processing.
Quote: reading back is very much a problem...it is one of the number one bottlenecks in any gpgpu program also utilizing graphics. the main problem is not physical bandwidth, but context switches/memory thoroughput while doing other operations (in game textures, etc). while i don't have a lot of experience with directx, glReadPixels, for example, is quite slow and will kill performance if you try and do it every frame.

Memory throughput is no problem with PCI-Express. Even the latest graphics cards don't use the full 16x bandwidth.

glReadPixels forces the graphics card to finish all rendering operations (i.e. synchronization), wich can take a while. For physics processing it doesn't have to synchronize graphics work. It's up to the driver to handle this efficiently but there is no fundamental limitation that would make accessing the GPU for physics processing any slower than accessing the PPU.
Quote: as for the directx 10 stuff, can you show me where it says that texture access is going to be fundamentally different (and not just with the addition of virtual memory)? you still have to get texture data to the shader units...this is why performance for shader ops has gotten better and better while texture access time has relatively lagged significantly.

this is the main "gotcha" for the ppu...lightning quick internal memory access and interconnects.

Have a look at the 'Buffer Types' and 'load - HLSL' pages, of the Direct3D 10 Technology Preview (comes with the latest DirectX 9 SDK). Data can be read directly without filtering or sampling. So it clearly skips the texture sampling units.
Quote: see any of the literature about the nvidia/havok physics launch for more details. (for instance this inteview here) physics on the gpu are for gameplay *only* unless you want to waste your graphics card.

Where do they say that? All I see is that they are very excited about the possibilities and they see a great future for physics on the GPU. And they think about PhysX pretty much the way I think about it:
Quote: Havok will always be driven by what game developers need, and so when new hardware platforms arrive and there is clear customer need for them, we always take a serious look. That said, as you can infer from our direction with Havok FX, we really don’t see the viability of a proprietary device in the PC game space. The GPU companies seem to have this aspect of the business well in hand, and with the advances in multi-core machines from AMD and Intel, the trajectory for affordable PC/GPU compute power seems sufficiently unbounded.

Sure, this is the competition talking, but they'd be investing in physics cards themselves if they believed it would be important to keep their lead position. Also, NVIDIA and ATI are multi-billion dollar companies and they're not going to invest time and money in physics/GPGPU if they didn't see a future for it (they'd rather let those engineers work on improving graphics performance). DirectX 10 has been in development since the introduction of DirectX 9 more than three years ago and I expect totally new GPU architectures were concieved around the same time and will be announced in the near future (NVIDIA G80, ATI R600). And look, they also refer to future CPU roadmaps! The multi-core revolution and focus on real performance instead of just clock frequency has only just started.
Quote: maybe i was out of line, but i'm sorry if your rhetorical technique of condescending to people doesn't go over too well.

I'm very sorry but when I read an argument that I believe is flawed then I'll do my research and respond to it. And I expect other people to be as correct, but I hope everyone realizes it can take a little while to get the facts. So the first reaction might seem rethorical but it's just to give people the chance to back up their statements (or correct them if necessary). I'm a very technical person and I would never make this personal (unless it contributes to the argument, like having an expertise in the matter), you're all strangers to me. So I'm just trying to figure out the true potential of PhysX, and that's what this thread is all about. I thank everyone who has added some insight for the interesting discussion so far!
Quote: Original post by C0D1F1ED
Quote: Original post by justo
because then you've just repurposed your gpu to be an inefficient ppu!

It really isn't inefficient. It has massive SIMD processing capabilities and a fast memory controller. The PPU is very much like a downscaled version of Cell (i.e. a CPU with multiple SIMD units). So as far as I know the PhysX chip has nothing specific that makes it more suited for physics processing than anything else, and the GPU lacks nothing to efficiently do physics processing.
Quote: reading back is very much a problem...it is one of the number one bottlenecks in any gpgpu program also utilizing graphics. the main problem is not physical bandwidth, but context switches/memory thoroughput while doing other operations (in game textures, etc). while i don't have a lot of experience with directx, glReadPixels, for example, is quite slow and will kill performance if you try and do it every frame.

Memory throughput is no problem with PCI-Express. Even the latest graphics cards don't use the full 16x bandwidth.

glReadPixels forces the graphics card to finish all rendering operations (i.e. synchronization), wich can take a while. For physics processing it doesn't have to synchronize graphics work. It's up to the driver to handle this efficiently but there is no fundamental limitation that would make accessing the GPU for physics processing any slower than accessing the PPU.


I think the issue here sort of centers around architecture. We have to look at what makes a GPU fast when it comes to graphics. Its because it processes data differently from that of a CPU. GPUs work well specifically with streaming data, especially when we do the exact same operation on all the data and only occassionally swap operations, or perform a state change. So, the whole graphics subsystem is high optimized at pushing pixel data in from one end and the spitting the end result out directly through to the display buffer, which then goes to the display. For all those who have done GPGPU, they all know that random memory access is hell for a GPU just because the whole streaming aspect of data is so optimized. We're not talking about random memory access to main memory, but rather texture memory, which is where all the data you need to work with is stored. So, the whole PCI express thing doesn't come into play. So, the problem is, if random memory access on the card is already slow in comparison to streaming access, then that actually becomes the bottle nexk when trying to get to the data. Also, for the most part, when the graphics card was designed, there was never any thought put into the possibility of streaming things off the card and back into main memory. There just wasn't a need and it just didn't make sense either. So, graphics has always been a one way trip, until recently.

So, then the question for ATI and nVidia is, do you sacrifice graphics performance on the hardware level, just so that you can add physics capability? Do you try to optimize random memory access and lose out on some of the optimizations done to speed up data streaming? And based on the theory of "no free lunch", you can pretty much say that the gain in physic performance will very well be proportional to loss of graphics performance.

As for HavokFX, from their campaign, it would seem that they are targetting SLI mode such that there is a semi-free GPU available to do physics, but wouldn't that kind of defeat the whole purpose of SLI mode? And that would just mean that I just spent almost the same amount of money for a graphics card that won't be a graphics card when I play games.

One of the selling points of PhysiX was also the ability accelerate real time skeletal animation and character animation, which involves alot of inverse kinematics. Inverse Kinematics involve alot of transformations that are interdependent. As of yet, I still don't see anyone doing GPGPU that can simulate that in real-time.
Advertisement
Quote: Original post by C0D1F1ED
Quote: Original post by justo
because then you've just repurposed your gpu to be an inefficient ppu!

It really isn't inefficient. It has massive SIMD processing capabilities and a fast memory controller.

fast isn't the same as efficient. you are retasking hardware optimized to do a lot of math and a little data access to do a lot of data access and a moderate amount of math.

your main bottleneck in any gpgpu application is texture access...the PPU gets around this with 256 GBs/s internal memory access...now thats obviously peak, and theres not a lot of specific architecture data out there, but thats *a lot* faster than even your normal cpu, and unimaginable in shader land.

Quote:
Quote: see any of the literature about the nvidia/havok physics launch for more details. (for instance this inteview here) physics on the gpu are for gameplay *only* unless you want to waste your graphics card.

Where do they say that? All I see is that they are very excited about the possibilities and they see a great future for physics on the GPU.


the key quote is here
Quote: But we were willing to sacrifice some game-play "feedback" in order to achieve great scalability (10K inter-colliding objects, for example, is where things really start to get interesting). We have solid game-play physics in our flagship Havok Physics product - so we wanted to come up with an add-on solution that game developers could use to layer on stunning effects that look and behave correctly


and of course they sing the praises of dual core: their main product runs on the cpu! theyre the competition, as you said, so they aren't really the ones to be asking about the relative efficacy of the ppu...i was just pointing out the specifics of their product...the physics data is making a one way trip.
hard to find data beyond "huge internal bandwidth!" and "highly interconnected!" but here are two sources i just found (the first points to the second, translates the last bits google isn't doing, at least for me):

memory architecture/cell similarities

" The PhysX " of the AGEIA which being hard, actualizes physical simulation
Quote: Original post by justo
hard to find data beyond "huge internal bandwidth!" and "highly interconnected!" but here are two sources i just found (the first points to the second, translates the last bits google isn't doing, at least for me):

memory architecture/cell similarities

" The PhysX " of the AGEIA which being hard, actualizes physical simulation

what i found most interesting about the design of the PPU is that it seemed very similar to the PS2's Emotion Engine's design. Also, the PS2 had the PS1 chip act as the I/O processor when it wasn't running PS1 games. Do you think that the PS3 will have the PS2 EE chip acting as the PhysX chip?

Beginner in Game Development?  Read here. And read here.

 

Quote: Original post by justo
fast isn't the same as efficient. you are retasking hardware optimized to do a lot of math and a little data access to do a lot of data access and a moderate amount of math.

Then you should know the R520 chip got an entirely new memory controller using a 512-bit ring bus, just to cope with memory bandwidth demands. "A little data access" is really a gross understatement.
Quote: your main bottleneck in any gpgpu application is texture access...the PPU gets around this with 256 GBs/s internal memory access...now thats obviously peak, and theres not a lot of specific architecture data out there, but thats *a lot* faster than even your normal cpu, and unimaginable in shader land.

A dual-core Pentium 4 at 3.4 GHz has 256-bit L2-cache busses so we get 217.6 GB/s internal memory access. And lets not forget that this uses a highly efficient cache hierarchy, plus out-of-order execution, to deal with latency. The PPU has neither a cache or our-of-order execution.

So first of all 256 GB/s ain't *a lot* faster. Their marketing even uses '2 Tb/s' just to make it sound more impressive. Secondly, since physics processing has to do random memory accesses this adds latency. So it's quite possible that even though the busses might be able to deliver 256 GB/s, actual memory bandwidth is much lower. Last but not least this is confirmed by the fact that 8 SIMD units at 400 MHz simply can't consume 256 GB/s. With one vector unit and one scalar unit each they need at most 64 GB/s, the rest is waste. And unless every instruction accesses memory the actual bandwidth usage is going to be even lower.
Quote: the key quote is here
Quote: But we were willing to sacrifice some game-play "feedback" in order to achieve great scalability (10K inter-colliding objects, for example, is where things really start to get interesting). We have solid game-play physics in our flagship Havok Physics product - so we wanted to come up with an add-on solution that game developers could use to layer on stunning effects that look and behave correctly

I haven't seen any Ageia demo yet that shows more 'feedback'. In fact I'm sure it's problematic for them. Legacy PCI offers only a fraction of the bandwidth of PCI-Express.

Anyway, I'm sure PhysX is a nice processor for physics processing, no doubt about that. But it's only going to be bought by hardcore gamers. For PhysX to be succesful in the long run it needs much more market penetration. But very soon it will get serious competition from multi-core CPUs and DirectX 10 graphics cards that are unified and well-suited for GPGPU. And frankly I don't expect any actual game to use 10,000 objects. That's cool for a demo but pointless for gameplay. A few hundred or thousand pieces of debris from explosions can perfectly be handled by next-generation CPU/GPU. So my only point is that PhysX just has no future.

This topic is closed to new replies.

Advertisement