Hello,
Following your mail on wine-devel about DDRAW, I now understand better what's going on behind the scenes. Below is the result of an experiment I did at the beginning of this month. Reading your message, I wanted to share it with you.
First, thanks for your comments, and one suggestion: You should also CC the mails to wine-devel@winehq.org, so others can read it too.
I'm an occasional wine user for games, and an experienced developper in other projects. Age of Empires II has always been really slow, so I finally decided to run an oprofile session. The results are attached to this message (wine cvs of 2005-12-04). When I saw the results, I immediately thought "put that conversion on the GPU". It seems you found that solution long ago :)
Yes, this is known quite well, and if you search the archives, you'll see that it has been mentioned over and over
Of course, there still are the ~35% used by X11, but even if these are not compressible (DC stuff), a potential x5 increase in game speed due to DIB would make it much more playable.
The need for a better DIB engine is also known, and there are 2 DIB engines ready to be included into Wine: One was sponsored by Transgaming, and the other one is the DIB engine from ReactOS. The ReactOS engine is said to be faster, and it's favored. We'll see, perhaps we can include it with the DirectDraw re-work.
The other problem is hardware 2D accelleration: Most DirectDraw functionality is handled by the GPU on Windows, since the beginning of DirectDraw, long before 3D accellerators were around. In Wine, there are 2 ways: The first is to use X calls, which is safe, but slow. The other one is the use of DGA, which is as fast as the Windows way, but it requires root preveliges, or at least write access to /dev/mem, which is a HORRIBLE security risk. That's were OpenGL comes into play: It is hardware accellerated, and doesn't require /dev/mem access. But the drawback is, that it needs a 3D accellerator, and a decent driver. It works fine for nvidia cards, but it is horribly slow for ati cards.
it seems using 3D as the 2D backend it the only option that makes sense nowadays and in the near future. In Wine case, this supports your DDraw using D3D idea.
Yeah, that perfectly true for current hardware, rumors even have it, that today's GPUs don't have a 2D engine any more, and whatever you do, it will make use of the 3D functionality. The problem are older cards: It would be nice, if games worked on systems for which they were designed for, for example, I'd love to see Anno 1602 working in Wine on my old Acer Extensa 660 with an 120 MHZ Pentium CPU, and a Chips and Technologies 65000 GPU.
Thanks for your interest, and I'd appreciate performance tests once my work is stable ;)
Stefan
Stefan Dösinger <stefandoesinger <at> gmx.at> writes:
least write access to /dev/mem, which is a HORRIBLE security risk. That's were OpenGL comes into play: It is hardware accellerated, and doesn't require /dev/mem access. But the drawback is, that it needs a 3D accellerator, and a decent driver. It works fine for nvidia cards, but it is horribly slow for ati cards.
What is slow with ATI cards? It seems that you should only need basic 3D acceleration to do what you propose. Is fglrx missing something that would be required for 2D rendering?
Regards, Aric
Am Dienstag, 13. Dezember 2005 04:25 schrieb Aric Cyr:
What is slow with ATI cards? It seems that you should only need basic 3D acceleration to do what you propose. Is fglrx missing something that would be required for 2D rendering?
Texture upload is very slow. glReadPixels, glWritePixels and friends take ages. That means that blt, Lock, and Unlock is really slow. There is some hope at least: Xine and Mplayer can play videos over OpenGL really fast, maybe we can find out how they do it.
Stefan
What is slow with ATI cards? Â It seems that you should only need basic
3D
acceleration to do what you propose. Â Is fglrx missing something that
would
be required for 2D rendering?
Texture upload is very slow. glReadPixels, glWritePixels and friends take ages. That means that blt, Lock, and Unlock is really slow. There is some hope at least: Xine and Mplayer can play videos over OpenGL really fast, maybe we can find out how they do it.
Stefan
Indeed glReadPixels and glWritePixels might be slow using the Ati drivers but those calls weren't used for the texture uploading. For that purpose just glTexImage2D / glTexSubImage2D calls were used. If texture uploading was slow on the Ati drivers people would be unable to play games ;)
Roderick
On Tuesday 13 December 2005 09:28, Stefan Dösinger wrote:
Am Dienstag, 13. Dezember 2005 04:25 schrieb Aric Cyr:
What is slow with ATI cards? It seems that you should only need basic 3D acceleration to do what you propose. Is fglrx missing something that would be required for 2D rendering?
Texture upload is very slow. glReadPixels, glWritePixels and friends take ages. That means that blt, Lock, and Unlock is really slow. There is some hope at least: Xine and Mplayer can play videos over OpenGL really fast, maybe we can find out how they do it.
for glReadPixels, glWritePixels you can use the frame buffer object extension (or/with pbuffers) :)
Keep your Good work
Stefan
Regards, Raphael
Raphael <fenix <at> club-internet.fr> writes:
On Tuesday 13 December 2005 09:28, Stefan Dösinger wrote:
Am Dienstag, 13. Dezember 2005 04:25 schrieb Aric Cyr:
What is slow with ATI cards? It seems that you should only need basic 3D acceleration to do what you propose. Is fglrx missing something that would be required for 2D rendering?
Texture upload is very slow. glReadPixels, glWritePixels and friends take ages. That means that blt, Lock, and Unlock is really slow. There is some hope at least: Xine and Mplayer can play videos over OpenGL really fast, maybe we can find out how they do it.
Texture upload is not too bad with the ATI drivers, but gl{Read/Write}Pixels will be horribly slow on most any video card. I would hope that DDraw is using textures and not direct framebuffer writes (which I believe is what Roderick mentioned). Especially, glTexSubImage2D() which should be even faster than glTexImage2D().
for glReadPixels, glWritePixels you can use the frame buffer object extension (or/with pbuffers) :)
while glReadPixels and glWritePixels can be used with FBO, there would be no performance improvment compared with using them on a standard framebuffer. The only potential advantage to using the FBO extension would be if we created the framebuffer to match the ddraw pixel format, but it doesn't seem that paletted framebuffer formats are supported by FBO anyways. For other RGB(A) formats that do not match the current pixel format, it could be a win. For example if the Xserver is running 24 bit colour, we should be able to attach a 16bit colour FBO and use that. This might allow us to work around unsupported colour depths, such as in the ATI driver which only exposes 24bit visuals. With this extension we might be able to get a real 16bit visual. The latest nVidia and ATI drivers both support the FBO extension.
Regards, Aric
--- Aric Cyr Aric.Cyr@gmail.com wrote:
Raphael <fenix <at> club-internet.fr> writes:
On Tuesday 13 December 2005 09:28, Stefan Dösinger wrote:
Am Dienstag, 13. Dezember 2005 04:25 schrieb Aric Cyr:
What is slow with ATI cards? It seems that you should only need basic 3D acceleration to do what you propose. Is fglrx missing something that would be required for 2D rendering?
Texture upload is very slow. glReadPixels, glWritePixels and friends take ages. That means that blt, Lock, and Unlock is really slow. There is some hope at least: Xine and Mplayer can play videos over OpenGL
really fast, maybe we can find out how they do it.
Texture upload is not too bad with the ATI drivers, but gl{Read/Write}Pixels will be horribly slow on most any video card. I would hope that DDraw is using textures and not direct framebuffer writes (which I believe is what Roderick mentioned). Especially, glTexSubImage2D() which should be even faster than glTexImage2D().
ATI's driver are strange, sometimes it's quicker to repack the texture so that it's byte aligned before calling glTexSubImage2D() or glTexImage2D(), another way to inmporve performance is only glReadPixels when the data has really changed and to implement clears &co. against a local buffer. One of the main reasons for the slowness is that the data is being sent over the pci buss and not via AGP. An old ATI driver < 8.18 used double buffering to improve the performance, but turning on double buffering doesn't help with newer drivers.
Oliver.
___________________________________________________________ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com
On Mon, 12 Dec 2005 21:11:44 +0100, Stefan Dösinger wrote:
In Wine, there are 2 ways: The first is to use X calls, which is safe, but slow.
What makes you think that? X drawing primitives as well as XRender should be hardware accelerated even though they aren't OpenGL.
On Tuesday 13 December 2005 21:43, Mike Hearn wrote:
On Mon, 12 Dec 2005 21:11:44 +0100, Stefan Dösinger wrote:
In Wine, there are 2 ways: The first is to use X calls, which is safe, but slow.
What makes you think that? X drawing primitives as well as XRender should be hardware accelerated even though they aren't OpenGL.
No, most of basic X primitives aren't not really hw accelerated XAA is inadequate for recent graphic cards (status http://wiki.x.org/wiki/XorgPerformance). For example, for texturing most of primitives are based on software algorithms (and Mesa do it faster). With EXA it's better but it's not perfect yet (improving little by little)
Regards, Raphael