PCI Express Throughput

Author

644

December 20, 2016 04:26 PM

Here (https://en.wikipedia.org/wiki/PCI_Express) is a nice table outlining speeds for PCI Express. I have GF 660 GTX with motherboard with PCI Express 3.0 x16. I made a test by writing a simple D3D11 app that download 1920x1080x32 (8 MB) image from GPU to CPU. The whole operation takes 8 ms. In second this sums up to around 1 GB of data, which corresponds exactly to PCI Express 3.0 x1. Is this how it is supposed to work? Is it like all CopyResource/Map data goes through one of the 16 lanes?

MarkS_

3,509

December 20, 2016 04:52 PM

I've never heard this described in detail, but I would imagine that while the interface may have 1, 2, 4, 8 or 16 lanes, the card and driver determine how the data is transmitted. I would assume that if the data can fit within a single lane, a single lane would be used.

maxest

Author

644

December 20, 2016 05:07 PM

The question is what it means "if the data can fit". I would like to copy data back from GPU to CPU as fast as possible and since no other data go that way expect for my one texture download I would ideally like to utilized all 16 lines. If that's possible of course.

MarkS_

3,509

December 20, 2016 05:17 PM

The question is what it means "if the data can fit". I would like to copy data back from GPU to CPU as fast as possible and since no other data go that way expect for my one texture download I would ideally like to utilized all 16 lines. If that's possible of course.

You are not streaming data to the monitor. You are telling the card, through the driver, how much data is to be transferred and the card and driver make the appropriate decisions as to how that happens.

You have to understand that you have absolutely no control over what the graphics card and driver does in this matter. I'm not 100% convinced that the driver has control over this, and if not, the user never will.

Out of curiosity, why is this important to you? Have you found yourself bottle-necked by the number of lanes used, or are you looking at potential issues?

maxest

Author

644

December 20, 2016 05:28 PM

I'm just looking at potential uses. I'm aware the GPU->CPU traffic should be avoided as much as possible but for some tests I needed to do this and to make those tests reliable I wanted to utilize full transfer potential.

On a side note, uploading data (CPU -> GPU) takes 3-5 ms (around twice faster than the other way around).

Infinisearch

3,058

December 20, 2016 11:52 PM

Is this how it is supposed to work? Is it like all CopyResource/Map data goes through one of the 16 lanes?

No thats not how its supposed to work... if it is setup for Pcie3.0 x16 then it should have all 16 lanes transferring at the same time. Maybe your videocard isn't in the x16 slot or maybe its misconfigured.

-potential energy is easily made kinetic-

cgrant

1,878

December 21, 2016 02:52 AM

..In addition not because you motherboard support PCI-E 3.0 doesn't mean that our graphics cards support PCI-E 3.0, and because the graphics card specification states 3.0 support I would still be wary. The GPU may fallback to a lower speed if certain conditions are not met so unless you have all the low level specification for the GPU in question the all we are dealing with is specification.

maxest

Author

644

December 21, 2016 04:25 PM

I'm now testing my work computer which is brand new with GeForce 1080 GTX. See detailed spec in this picture: https://postimg.org/image/hwhuntpn5/

Now my tests show upload (CPU->GPU) 8 GB/s and download (GPU->CPU) 3 GB/s.

PCI-E is bidirectional and all sources I've found claim the transfer rate in both directions should be identical, what is not true in my case.

Infinisearch

3,058

December 21, 2016 07:57 PM

Were you doing anything with the GPU at the same time as the transfer?

-potential energy is easily made kinetic-

maxest

Author

644

December 21, 2016 11:49 PM

Nothing. It's something like this (download):

        uint64 bef = TickCount();
 
        deviceContext->CopyResource(stagingCopy.texture, gbufferDiffuseRT.texture);

        D3D11_MAPPED_SUBRESOURCE mappedSubresource;
        deviceContext->Map(stagingCopy.texture, 0, D3D11_MAP_READ, 0, &mappedSubresource);
        memcpy(mydata, mappedSubresource.pData, sizeof(mydata));
        deviceContext->Unmap(stagingCopy.texture, 0);

        uint64 aft = TickCount();
        cout << aft - bef << endl;

As for my home GeForce 660 GTX I've just checked in HWINFO app that it's plugged into PCI-E 2.0, hence the slower speed than at my work computer.

Nevertheless I presume the 8 GB/s and 3 GB/s should be bigger. And identical.

PCI Express Throughput

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

PCI Express Throughput

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines