Hi,
it is probably still a bit early, but nevertheless I would like to announce a feature I am currently working on and present you the first results. As some of you have already noticed (http://bugs.winehq.org/show_bug.cgi?id=35868) I've recently submitted some simple stub patches to add the dxva2 dll. The original purposes was to get a browser plugin working, which expects that this library is available, and otherwise refuses to run. The library exports some functions which are used by several applications (like VLC, Flash, Silverlight, ...) for GPU decoding. I started to work to on these functions and I want to present you a first result which you can see here: https://dl.dropboxusercontent.com/u/61413222/dxva2.png
This is actually the windows version of VLC playing a MPEG2 movie with GPU acceleration using DXVA2. My implementation of dxva2 uses the VAAPI on Linux to do the actual gpu decoding and should support AMD, Intel and NVIDIA cards.
Currently only MPEG2 decoding is supported as it is one of the easier codecs and other ones like H264 needs a lot more of buffers, which need be translated from the DXVA format to VAAPI. The second easiest codec to implement would be mpeg4 but as none of my graphic cards support mpeg4, I will most probably continue with VC-1. Anyway, I need to clean up the patches a bit as they add about 3000 new lines of code and test it with some other graphic cards before I can provide them, but there are also some problems, mostly d3d9 related, for which I would like to get your opinion.
The most difficult part is that DXVA2 is completely based on Direct3D9Device and Direct3DSurface9. The DXVA2 places the output images into a Surface and the applications locks the surface to get the output data or simply presents it to the screen. Although it would be much more efficient to directly blit the data in the graphic card at least VLC reads it back into system memory as the decoding and output pipeline are separated.
The problem is that I actually need to allocate twice the amount of memory for decoding. since I need to provide the Direct3D Surfaces to the application and I also need to provide buffers to VAAPI. This is not a big problem for mpeg2 since it only uses 3 output images as a B Frame can only reference the last and the next frame. Anyway, for H264 this is getting insane as it requires to store up to 16 output images so that i would need to allocate 16 VAAPI buffers and 16 Direct3D surfaces.
Currently i lock both kind of buffers after rendering a frame and do the synchronization in system memory, which is kind of inefficient depending on the surface type. My original idea was to do the copy in the graphic card as I can copy the image to a texture after decoding, but after Sebastian implemented this part we found out that the VAAPI implies a format conversion to RGB when copying data to a texture. This is actually a no go since VLC will refuse to use hardware acceleration when the output format is RGB. I also think it is kind of stupid to convert the RGB data back to YUV so that we end up with 3 color coder conversion (YUV->RGB->YUV->RGB). Some Intel developer wrote (see vaCopySurfaceGLX() at http://markmail.org/message/a3sav6q3dm5qvmat) that it would be possible to implement a copy in NV12 format for NVIDIA and Intel but not AMD. We could try to ask them to implement it, so that we can at least do it efficient for these two vendors.
Anyway, if other applications continue to copy the data back to system memory it might be better to instead wrap the VAAPI buffers as Direct3D9 surfaces so that we can directly map the VAAPI buffers when LockRect() is called instead of copying the data. Though this would imply problems when the applications tries to pass this interface to Present().
So what do the wined3d guys think? Is it better to convince the Intel developers to allow a copy in YUV format and copy the data directly into the texture of an Direct3D9 surface or wrap the VAAPI buffers as Direct3D9Surface and add some glue code when an applications tries to render it? Or do you have any better ideas?
Regards, Michael