On 2/11/22 11:29, Giovanni Mascellani wrote:
Hi,
Il 11/02/22 09:17, Nikolay Sivov ha scritto:
I don't think this works. It should be possible to LockRect() after buffer was locked, if you keep surface locked this won't work. I don't think we have tests for that, but that's what quick testing on Windows shows.
Ouch. So Microsoft really does two more copies per Lock()? Bad!
Maybe the thing we could do anyway is to make call the first LockRect() with D3DLOCK_READONLY and the second one with D3DLOCK_DISCARD, though I doubt it will save anything near 3 ms per frame.
It would be nice if the Lock() interface had something similar to DISCARD, but alas it doesn't.
Lock2DSize() has some flags, I don't know if they map to anything in d3d or are ignored. It's probably worth looking at LockRect() flags, like you described.
Regarding amount of added latency, it's not accurate to compare to the game running on Windows, where by default it might use a pipeline with hw decoding, that only has one copy from system memory - to input decoder surface. Instead manually configured test is what should give an idea of how fast worst case is there, not to say that we should settle for that of course.
We could have a shortcut in MFCopyImage() first, to have a single copy call when strides match, instead of calling per row. Next step could be to have some SIMD variants, with non-temporal copy like docs suggest. No idea how much this improves performance, but for large enough copies it's meant to bypass cache at least, I think.
I tried the single copy thing, but I couldn't see any significant change, so I didn't even bother submitting. As for the non-temporal copy, I don't know much about it either, but something makes me feel it won't change that much either.
Depending on buffer sizes it might have a positive impact, it's impossible to say for me, unless we try. It's also possible that copier is smarter about this.
Thanks, Giovanni.