If I understand correctly, the perform triggers some ioctl on a dedicated thread? If we need the flush to be done asynchronously, we could also do that the same way as winemac is doing, notifying the drawing thread asynchronously, then, taking the surface lock and reading the data from that thread.
I think it's in the right ballpark. Although it's something to worry if current MR proves problematic