Advertisement

physx chip

Started by April 20, 2006 07:42 PM
223 comments, last by GameDev.net 18 years, 5 months ago
Quote: Original post by C0D1F1ED
Quote: Original post by justo
fast isn't the same as efficient. you are retasking hardware optimized to do a lot of math and a little data access to do a lot of data access and a moderate amount of math.

Then you should know the R520 chip got an entirely new memory controller using a 512-bit ring bus, just to cope with memory bandwidth demands. "A little data access" is really a gross understatement.


not relative to arithmetic ops. just par for the course for gpus...thats exactly what the average shader needs.

Quote:
Quote: your main bottleneck in any gpgpu application is texture access...the PPU gets around this with 256 GBs/s internal memory access...now thats obviously peak, and theres not a lot of specific architecture data out there, but thats *a lot* faster than even your normal cpu, and unimaginable in shader land.

A dual-core Pentium 4 at 3.4 GHz has 256-bit L2-cache busses so we get 217.6 GB/s internal memory access. And lets not forget that this uses a highly efficient cache hierarchy, plus out-of-order execution, to deal with latency. The PPU has neither a cache or our-of-order execution.

by design. its not a general purpose cpu. on chip caches dont work well for such incoherent data access patterns. you talk as if these are a bunch of guys who got together and fabbed the first thing that came out of their prototyping board.

anyway, i'm done here. theres not enough data on the cards to really argue about it...i'll stick with my original answer of "enthusiasts will drive it, i'm looking forward to trying it."

Quote: And frankly I don't expect any actual game to use 10,000 objects. That's cool for a demo but pointless for gameplay.

i would estimate within 6 months of ut2k7 we'll get the first full fledged mods playing only with the ppu (geometry wars style excess paired with actual gameplay). as soon as theyre cool in the gaming world, the academic and industry demo scene will follow shortly.

cheers.
Quote: Original post by C0D1F1ED
A dual-core Pentium 4 at 3.4 GHz has 256-bit L2-cache busses so we get 217.6 GB/s internal memory access. And lets not forget that this uses a highly efficient cache hierarchy, plus out-of-order execution, to deal with latency. The PPU has neither a cache or our-of-order execution.

So first of all 256 GB/s ain't *a lot* faster. Their marketing even uses '2 Tb/s' just to make it sound more impressive. Secondly, since physics processing has to do random memory accesses this adds latency. So it's quite possible that even though the busses might be able to deliver 256 GB/s, actual memory bandwidth is much lower. Last but not least this is confirmed by the fact that 8 SIMD units at 400 MHz simply can't consume 256 GB/s. With one vector unit and one scalar unit each they need at most 64 GB/s, the rest is waste. And unless every instruction accesses memory the actual bandwidth usage is going to be even lower.


Well, it doesn't really matter how fast the connection between the L2 cache and the core is because that's all on the processor die. The bottle neck is getting the dat to the L2 cache through prefetching the data from main memory. And the pipelines for the P4 is so deep that it has been shown that any series of prediction misses will cause the flushing of the pipeline and the L2 cache, which in turn is detrimental for performance. This is why clock for clock, the Pentium 4 is not as fast as the Pentium-M, which is actually based on the Pentium 3 core. Its no secret that Intel had been trying to cover up the superior performance margin of the Pentium-M and keep it to the mobile market because it completely destroys the P4 in both performance and energy consumption at lower clock speed. So, the clock speed myth is back in full swing again as the truth is, faster clock speed doesn't mean higher performance. So, why was there the P4 design in the first place? Marketing. The deep pipelines made it possible to scale to higher clock speeds faster.

Also, you don't need cache to get higher performance. If you look at the architecture for the PS2, its based on an architecture that is philosophically the opposite of PC architecture. It gets rid of cache almost completely, but gives all processing units a 10 channel wide direct memory access path. So, all the data is moved around in direct streaming form. The whole point of cache was to reduce idle time of the CPU, as the path between cache and the CPU will be faster than from main memory to CPU. Then prefetching was used to further cache information that the CPU "may" use in the future to further reduce memory latency, but then you get the issue of prediction misses. However, if you settle for a lower clock speed and feed the CPU as much data as fast as it can process it, then you really don't need any cache whatsoever. This is also why the P4 originally required RDRAM for maximum performance as only RDRAM had the bandwidth to feed the CPU data as fast as it could process it.

Quote: Original post by C0D1F1ED
Quote: the key quote is here
Quote: But we were willing to sacrifice some game-play "feedback" in order to achieve great scalability (10K inter-colliding objects, for example, is where things really start to get interesting). We have solid game-play physics in our flagship Havok Physics product - so we wanted to come up with an add-on solution that game developers could use to layer on stunning effects that look and behave correctly

I haven't seen any Ageia demo yet that shows more 'feedback'. In fact I'm sure it's problematic for them. Legacy PCI offers only a fraction of the bandwidth of PCI-Express.


The point I think is that HavokFX is an "add-on" solution to the Havok engine itself. Which means that you don't really offload much to the GPU, while most of the stuff is still done on CPU side. This, of course, has the great property that you can easily turn it off. On the other hand, Ageia's solution completely offloads most of the calculations from the CPU.

Quote: Original post by C0D1F1ED
Anyway, I'm sure PhysX is a nice processor for physics processing, no doubt about that. But it's only going to be bought by hardcore gamers. For PhysX to be succesful in the long run it needs much more market penetration. But very soon it will get serious competition from multi-core CPUs and DirectX 10 graphics cards that are unified and well-suited for GPGPU. And frankly I don't expect any actual game to use 10,000 objects. That's cool for a demo but pointless for gameplay. A few hundred or thousand pieces of debris from explosions can perfectly be handled by next-generation CPU/GPU. So my only point is that PhysX just has no future.


It shouldn't be forgotten that the graphic card market was started by hardcore gamers who dug deep into their pockets and bought the first voodoo cards from 3dFX, I know I was one of those.

And I really would like to restate my view that the whole concept of GPGPU is an oxymoron in itself. It pretty much came out of the academic need of low cost fast processing, and that is pretty much where it really should end. It really is only an academic exercise in trying to see what we can force the GPU to do other than graphics given that it has all this horse power. Has anyone ever thought of building super computers from multiple Quad-SLI machines based completely off GPUs? I don't think so. The whole reason its call "General Purpose" is to try to attract people to the field in hopes that you can get "general purpose" processing out of it. But in the end, the result is that it is a specialized piece of hardware that does graphics. And to make it do "general purpose" stuff, we fool it into thinking its doing graphics.

Also, be very careful about saying things like "that having 10k objects or more is pointless for gameplay" or you'll run into the trouble that Bill Gates got into when he said that no one will ever need 640KB of RAM about 25 years ago. (I recall it was Bill)
Advertisement
Today's computers usually need two kind of processors. Application logic processors, that can handle complex branching and vector processors that can move and process data efficiently. CPUs are the classic logic processors. On the other side, vector processors can be used for calculating repetitive jobs with high volumes of data and low levels of branching. These are: graphics, sound and physics. A good vector or matrix processor can do all of these highly parallelizable jobs. Having a dedicated video, sound and physics chip is like having separate alus and fpus.

Viktor

ps: To make a really general purpose vector processor from a gpu, we need random access to the video memory. The system must support random reads and writes without any filtering or other data modifications. On an unified shader gpu, the pipeline setup can be done with software, so the code could look like: vertex shader setup code, vertex shader, fragment shader setup code, fragment shader, framebuffer code. If we remove the setup codes, we get a general purpose highly parallel vector cpu without any graphics specific parts.
Would it be possible to make a PCI-Express physics card?

I heard that PCI-e can be used for more than just graphics cards. Of
course, that was before it was released so it's probably not true.
F-R-E-D F-R-E-D-B-U-R...G-E-R! - Yes!
Quote: Original post by DigiDude
Would it be possible to make a PCI-Express physics card?

I heard that PCI-e can be used for more than just graphics cards. Of
course, that was before it was released so it's probably not true.


That is very possible. If you look on the market right now, the only other PCI-e application right now is SATA RAID cards. Those are usually PCI-e 1x instead of 16x like the graphics cards. So there is no reason why you can't have cards for other things.
WeirdoFu, I fully realize the CPU is not the ideal processor for physics. I was merely pointing out that the internal bandwidth of PhysX is not as impressive as it sounds, and peak bandwidth tells nothing about sustainable bandwidth. That doesn't mean I think the PPU is weak and has a bad architecture. I'm just saying that next generation multi-core CPUs should not be underestimated and might very well be adequate for physics processing in actual games. As long as that's the case, PhysX won't have the market penetration it needs to survive in the long term.

For more demanding physics processing, next generation GPUs will have an abundance of SIMD units and high bandwidth random memory access. So like I said before they'll pretty much have a PPU on board. As the Anonymous Poster kindly pointed out, a unified architecture using DirectX 10 will be almost like a general purpose highly parallel vector CPU. So while today's GPUs and DirectX 9 still have limitations for physics/GPGPU purposes, the next generation will be radically new and highly suited for general purpose processing.

In a nutshell: GPU + PPU = next-generation (GP)GPU. DirectX 9 introduced programmable graphics shaders. DirectX 10 introduces programmable general purpose (unified) shaders. With this in mind I can't see how PhysX can possible survive long.

One more way to look at it is that currently the PPU architecture is somewhere between a CPU and a GPU. Ageia found a small but undeniable niche market thanks to current CPU/GPU limitations (not enough parallelism / not general-purpose enough). But next-generation CPUs are gaining parallelism (multi-core, extra floating-point units) while next-generation GPUs get more programmable and more general-purpose, so they're actually getting closer together. That leaves very little space in between for a processor specifically for physics, and the trend continues. So their market will definitely shrink, not grow! I'm no economist but that sound really bad for any product's future...
Advertisement
Quote: Original post by WeirdoFu
That is very possible. If you look on the market right now, the only other PCI-e application right now is SATA RAID cards. Those are usually PCI-e 1x instead of 16x like the graphics cards. So there is no reason why you can't have cards for other things.

True, there's no technical limitation.

But there might be a practical problem. The people who likely want a PhysX cards the most are hardcore gamers who already have a hefty SLI/CrossFire configuration. And most motherboards only have extra PCI-Express slots in between the two 16x connectors, which is sometimes blocked by the graphics cards.
Sorry to repeat myself, but I wanted one of the technical guru to at least answer one part of my question [smile]
Quote: Original post by Alpha_ProgDes
Quote: Original post by justo
hard to find data beyond "huge internal bandwidth!" and "highly interconnected!" but here are two sources i just found (the first points to the second, translates the last bits google isn't doing, at least for me):

memory architecture/cell similarities

" The PhysX " of the AGEIA which being hard, actualizes physical simulation

what i found most interesting about the design of the PPU is that it seemed very similar to the PS2's Emotion Engine's design. Also, the PS2 had the PS1 chip act as the I/O processor when it wasn't running PS1 games. Do you think that the PS3 will have the PS2 EE chip acting as the PhysX chip?

Basically for anyone who has seen or knows the design of the PS2 EE chip, is it capable or naturally made for PPU instructions/execution? Do you think the PhysX is primarily based on the EE?

Beginner in Game Development?  Read here. And read here.

 

it's been leaked that sony is going to do backwards compatability via software emulation, so the emotion engine won't be included in the ps3. i dont know much about the ps2's architecture, but the ppu is similar to the cell in many aspects, so take that as you will.
Quote: Original post by C0D1F1ED
It sounds neat but it has no chance of survival. Dual-core processors have a whole lot of extra processing power that can be used for physics without the additional overhead (PCI bandwidth, synchronization) of a separate physics card. Furthermore, investing in a dual-core processor benefits much more than just the physics in a few games. The Ageia cards are quite expensive and will rarely be used.

Besides, no game developer in his right mind would create a game that will only run on a fraction of PCs. So there always has to be a fallback without affecting gameplay. It took graphics cards about three years to become widespread, but by the time physics play a key role in games the CPUs will be multi-core with highly improved architectures...



And Intel (and no doubt others) are woking on multi cores with as many as 16 seperate execution cores (each with its own floating point units). I doubt very much that any custom device could compete pricewise against a mass produced mainstream CPU. You might eventually have daughter boards containing additional CPUs (with a generic CPU and local memory) that would be a common feature to add extra computing power to a computer.

Its possible that GPUs might become versatile enough to do physics data efficiently and that dual functionality would get the physics done while having the cheapness of common hardware.

This topic is closed to new replies.

Advertisement