Advertisement

PCI Express Throughput

Started by December 20, 2016 04:26 PM
16 comments, last by Infinisearch 7 years, 4 months ago

Here (https://en.wikipedia.org/wiki/PCI_Express) is a nice table outlining speeds for PCI Express. I have GF 660 GTX with motherboard with PCI Express 3.0 x16. I made a test by writing a simple D3D11 app that download 1920x1080x32 (8 MB) image from GPU to CPU. The whole operation takes 8 ms. In second this sums up to around 1 GB of data, which corresponds exactly to PCI Express 3.0 x1. Is this how it is supposed to work? Is it like all CopyResource/Map data goes through one of the 16 lanes?

I've never heard this described in detail, but I would imagine that while the interface may have 1, 2, 4, 8 or 16 lanes, the card and driver determine how the data is transmitted. I would assume that if the data can fit within a single lane, a single lane would be used.

Advertisement

The question is what it means "if the data can fit". I would like to copy data back from GPU to CPU as fast as possible and since no other data go that way expect for my one texture download I would ideally like to utilized all 16 lines. If that's possible of course.

The question is what it means "if the data can fit". I would like to copy data back from GPU to CPU as fast as possible and since no other data go that way expect for my one texture download I would ideally like to utilized all 16 lines. If that's possible of course.

You are not streaming data to the monitor. You are telling the card, through the driver, how much data is to be transferred and the card and driver make the appropriate decisions as to how that happens.

You have to understand that you have absolutely no control over what the graphics card and driver does in this matter. I'm not 100% convinced that the driver has control over this, and if not, the user never will.

Out of curiosity, why is this important to you? Have you found yourself bottle-necked by the number of lanes used, or are you looking at potential issues?

I'm just looking at potential uses. I'm aware the GPU->CPU traffic should be avoided as much as possible but for some tests I needed to do this and to make those tests reliable I wanted to utilize full transfer potential.

On a side note, uploading data (CPU -> GPU) takes 3-5 ms (around twice faster than the other way around).

Is this how it is supposed to work? Is it like all CopyResource/Map data goes through one of the 16 lanes?

No thats not how its supposed to work... if it is setup for Pcie3.0 x16 then it should have all 16 lanes transferring at the same time. Maybe your videocard isn't in the x16 slot or maybe its misconfigured.

-potential energy is easily made kinetic-

Advertisement

..In addition not because you motherboard support PCI-E 3.0 doesn't mean that our graphics cards support PCI-E 3.0, and because the graphics card specification states 3.0 support I would still be wary. The GPU may fallback to a lower speed if certain conditions are not met so unless you have all the low level specification for the GPU in question the all we are dealing with is specification.

I'm now testing my work computer which is brand new with GeForce 1080 GTX. See detailed spec in this picture: https://postimg.org/image/hwhuntpn5/

Now my tests show upload (CPU->GPU) 8 GB/s and download (GPU->CPU) 3 GB/s.

PCI-E is bidirectional and all sources I've found claim the transfer rate in both directions should be identical, what is not true in my case.

Were you doing anything with the GPU at the same time as the transfer?

-potential energy is easily made kinetic-

Nothing. It's something like this (download):

        uint64 bef = TickCount();
 
        deviceContext->CopyResource(stagingCopy.texture, gbufferDiffuseRT.texture);

        D3D11_MAPPED_SUBRESOURCE mappedSubresource;
        deviceContext->Map(stagingCopy.texture, 0, D3D11_MAP_READ, 0, &mappedSubresource);
        memcpy(mydata, mappedSubresource.pData, sizeof(mydata));
        deviceContext->Unmap(stagingCopy.texture, 0);

        uint64 aft = TickCount();
        cout << aft - bef << endl;

As for my home GeForce 660 GTX I've just checked in HWINFO app that it's plugged into PCI-E 2.0, hence the slower speed than at my work computer.

Nevertheless I presume the 8 GB/s and 3 GB/s should be bigger. And identical.

This topic is closed to new replies.

Advertisement