I recently wrote an abstraction for this mechanism so my graphics API would not be D3D12 specific. Given that, I can only really describe this from the point of view of writing the code but since things seem to be working, I believe the details I figured out are pretty close to accurate.
First off, you need to look at the three related info structures again since they most certainly do tell you exactly which images are being referenced, it is just a bit indirect. Basically there is an array of all images used in the overall pass found in the render pass info structure, sub passes reference these images via 0 based indexing.
As to the behavior, at the start and end of each subpass the API issues an image transition barrier if needed to put the attachment in the requested format. So, for instance, if you were doing a post processing blur, you might end up with the following chain of events:
NextSubPass
Transition attachment 0 to writable
.. Draw your scene
NextSubPass
Transition attachment 0 to readable
Transition attachment 1 to writable
.. Draw post processing quad to run vertical blur with input attachment 0 and output attachment 1
NextSubPass
Transition attachment 0 to writable
Transition attachment 1 to readable
.. Draw post processing quad to run horizontal blur with input attachment 1 and output attachment 0
So the attachments involved are ping ponging from readable to writable as required for the post processing to occur.
Hopefully this makes sense and helps you out. I had to look at those structures quite a few times till I figured out the details. The structures themselves are pretty simple, it's just the relationships that are hard to see until you try and fail a couple times to get the correct behavior.