v2: - put test first; - store just allocator without dxgi manager; - use sample copier to copy patches. That does only part of optimization which avoids extra copies back and forth when no sample is returned. When a sample is returned there is still an extra GPU->CPU copy and temporary linear buffer. I believe these parts can be optimized in mfplat/sample.c:sample_CopyToBuffer(); - removed "mfplat: Fix returned buffer length in dxgi_surface_buffer_lock()." patch: that is not directly related and I hope not mixing it in might simplify review.