Hi Stefan,
I did some introductory interface reading. If I understand it correctly, the dxva implementation / driver can control the pool of the input surface. Not only that, it actually creates the surface. Is that correct?
Afaics the output surface is either a dxva-created surface or a render target, is that correct?
All surfaces which are used in conjunction with the dxvapi are created through the CreateSurface command of the IDirectXVideoAccelerationService interface.
If you are in system memory, is there an issue with using the d3d surface's memory as the vaapi input buffer? Also take note of user pointer surfaces / textures in d3d9ex.
The surfaces are only used for storing the output image and they may have a different size than the buffers used in the vaapi. MPEG2 for example uses macro blocks which have a size of 16x16 Pixel and the size of a frame must therefore be dividable by 16. I noticed that VLC creates the surfaces with the size of the actual video while it initializes the decoders with a multiple of 16. Moreover I can not specify the address to which the output data should be copied I can only map the buffer at an address defined by vaapi and copy it manually.
I do not know of any windows driver that supports YUV render targets (see above). Are dxva-created output surfaces video memory surfaces (or textures) or system memory surfaces? If they are sysmem surfaces you don't have a problem - the app either has to read back to sysmem or put up with an RGB surface / texture.
DXVA supports both: direct rendering (called native mode) and reading it back to system memory ( see http://en.wikipedia.org/wiki/DirectX_Video_Acceleration#DXVA2_implementation... )
But even if you're copying to an RGB surface you have to get the GL texture from the IDirect3DSurface9 somehow. There may not even be one, if the surface is just the GL backbuffer. This is just a wine-internal problem though and should be solvable one way or another.
The vaapi-glx interface is also missing options for the mipmap level and cube map face. I guess you can ignore that until you find an application that wants a video decoded to the negative z face, mipmap level 2, of a rendertarget-capable d3d cube texture.
You may also want a way to make wined3d activate the device's WGL context. Right now that's not much of an issue if your code is called from the thread that created the device. The command stream will make this more difficult though.
We implemented some hack to get the opengl texture id of an D3D9 surface and to make the OpenGL context current by calling acquire_context(). As mentioned in the first email, the screenshot was created by using the copy-back approach.
If the vaapi buffer has a constant address you can create a user memory d3d surface. I wouldn't be surprised if dxva was a motivation for user memory surfaces.
On a related note, we don't want any GLX code in wined3d, and probably not in any dxva.dll. The vaapi-glx.h header seems simple enough to use through WGL as it just says a context needs to be active. If not, you'll have to export a WGL version of vaapi from winex11.drv.
At some point we should think about equivalent interfaces on OSX and how to abstract between that and vaapi, but not today.
We actually thought about a better solution on how to get around the problems. We could introduce a new surface type which uses the vaapi buffers as backend. If the users wants to read the memory back to system memory we can simply use the map function of vaapi and if the user wants to actually present the surface we could use the vaapi commands to convert it into a rgb texture with stuff like deinterlacing. This would allows us to implement native and copy back without doing unnecessary conversations or memory copies.
Do you think it would be okay if we try to add such a new type of surface? I think we would need to put the Vaapi commands into the x11 driver and export some functions which can be called d3d.
I also uploaded the patches in their current state so that you guys can take a look at what is actually needed to implement dxva2, but it is not yet in a state in which it could get upstream (we use a separate x11 connection, link statically against libva, inefficient algorithms for copying frames, ...)
You can find it here: https://github.com/compholio/wine-compholio-daily/tree/dxva2/patches/11-DXVA... on the dxva2 branch.
To test it with VLC you need:
1. 32 bit version of libav-dev 1.2.1
On Ubuntu you can get this version of libav-dev from my PPA: https://launchpad.net/~pipelight/+archive/libva (except for Trusty Thar which already provides this version)
2. Install the vaapi driver, for nvidia you need vdpau-va-driver
Make sure that vainfo (apt-get install vainfo) shows the MPEG2 VLD decoder.
3. You also need to apply this nasty hack to get around a problem with VLC and Direct3D: http://ix.io/bo5
4. Set the wine prefix to Vista as DXVA2 is only available in >= Vista
5. Install the current git version of VLC (the stable version has a bug in the DXVA2 code which breaks the decoding of P and B Frames). You can grab it here: http://nightlies.videolan.org/build/win32/last/
( See https://trac.videolan.org/vlc/ticket/10868 for more information about the bug. It took me quite some time to figure out that this bug is in VLC, and not in my code... )
6. Start VLC and enable DXVA2 in the Input/Codecs options. Test it :-)
I did not try it on anything else than nvidia yet and there is some untested code in the patches which is not supported by the vdpau wrapper, so that it may break on other graphic cards.
For other users that want to try out the patchset and expect a huge performance boost: I have to disappoint you! During my tests it was still slower than CPU decoded video data, but I expect a better performance after all the copy-overhead has been removed, and especially for other codecs like H264 the performance boost should be easier to notice. ;-)
Michael