So let's say you have 2 subsamples (MSAA 2x) to make things simple, and we're working with a 1000x1000 render target. If we want to unpack this so that you can read every individual subsample on the CPU, then we need to expand that 2xMSAA render target to a 2000x1000 render target. From there we can copy to an 2000x1000 staging texture, and then call Map to read its contents on the CPU.
If you're rendering to an 2000x1000 render target, then the X values of SV_Position will go from 0 to 1999. We can't use this X value directly as an address to read from from your 1000x1000 source texture with 2xMSAA, since the valid range of X address values is 0 to 999. So instead we map each X value of the destination render target to a particular X value of the source MSAA texture. Let's say that we want to output the subsamples in this format:
[Texel (0,0), Subsample 0][Texel (0,0), Subsample 1][Texel (1,0), Subsample 0][Texel (1,0), Subsample 1] ... [Texel (999,0), Subsample 0][Texel (999,0), Subsample 1]
[Texel (0,1), Subsample 0][Texel (0,1), Subsample 1][Texel (1,1), Subsample 0][Texel (1,1), Subsample 1] ... [Texel (999,1), Subsample 0][Texel (999,1), Subsample 1]
Does that make more sense now? Basically you just need to convert the output pixel coordinate to source texel coordinate + subsample index, and I did that putting all N subsamples right next to each other in the output target. There's plenty of other ways you could do this: for instance you could create N render targets (or make a texture array with N slices), and draw N fullscreen quads to render 1 set of subsamples to each render target or array slice.