I don't think we should do any copy at all here.
When the source reader is D3D-aware, and when its pipeline contains either a decoder or a video processor (which is normally the case 99.9% of the time, and not the case only when playing raw video), they will allocate D3D samples already and we don't need to copy anything.