emaentra@ngi.it wrote:
If during my tests I set the affinity of the executable (and subsequently all its threads) to a particular core I have to say that core gets fully used and FPS are better than general
That's interesting.
what cpu, gpu, and version of wine do you have, and what is the fps before and after pinning the game to one cpu?
You should use, say, the fps on launching into one of the default scenarios without touching anything so that we can get a reproducible measurement. Tell us what mouseclicks to use to replicate your measurement.
Or, if you think it's needed, right after launch, select all units and have them start moving, and use the lowest fps seen during those few seconds.
Hi Stefan/Dan,
following a couple of answers.
1) Apparently rendering is not multi-threaded. I've run the process as specified by Stefan and saved std::err on a file. The command grep ":d3d9:" /home/ema/sc2.log.txt | sed 's/:.*//g' | sort -u | uniq -c Produces only one value.
2) Stefan, do you have any hint on where we should start optimizing the calls of D3D -> OpenGL functions? Where do we waste time/CPU cycles?
3) Setting affinity is a "game changer". I have the following FPS on the exact same game replay: - all Ultra, Full HD (with affinity on one core) 20~22 FPS - all Ultra, Full HD (without affinity) 9~10 FPS - all Ultra, shaders Low, Full HD (with affinity on one core) 50~60 FPS - all Ultra, shaders Low, Full HD (without affinity) 20~25 FPS (StarCraft II allows to print FPS in real time pressing Ctrl+Alt+F). Apparently setting affinity makes that core usage (and speed) to 100%. My system is an AMD Phenom II X4 965 BE, nVidia 470 GTX (260.19.06), 8 GB Ram, Ubuntu 10.10 x86-64. Wine is version wine-1.3.11 To set affinity I'm using a simple executable (written in C++), if you want I can share that as well.
Thanks again, Cheers,
Ps. On a side note it appears that the larger the map, the lower the FPS, even if technically speaking the number of objects are supposed to be similar.
On 10/01/11 15:45, Dan Kegel wrote:
emaentra@ngi.it wrote:
If during my tests I set the affinity of the executable (and subsequently all its threads) to a particular core I have to say that core gets fully used and FPS are better than general
That's interesting.
what cpu, gpu, and version of wine do you have, and what is the fps before and after pinning the game to one cpu?
You should use, say, the fps on launching into one of the default scenarios without touching anything so that we can get a reproducible measurement. Tell us what mouseclicks to use to replicate your measurement.
Or, if you think it's needed, right after launch, select all units and have them start moving, and use the lowest fps seen during those few seconds.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 10.01.2011 um 22:03 schrieb Emanuele Oriani:
- Stefan, do you have any hint on where we should start optimizing the calls of D3D -> OpenGL functions?
Where do we waste time/CPU cycles?
A few bugs that are probably easier to fix(there's no really easy bug. It would have been fixed already): *) context_validate(context.c). The way it performs the checking is expensive. We already hook the wndproc to intercept messages, we could just intercept the messages involved with window destruction and set the valid flag on the context.
*) stream declaration parsing(device_update_stream_info, device.c): This needs some better data structures to either avoid re-parsing, or make parsing faster. The current problem is that we have to do this every time the shader or vertex declaration or a vertex buffer is changed. That virtually means we have to do this every frame.
*) FBO application. Currently we do this every draw, this is unnecessary. Unfortunately there are multiple conditions when this has to be done, among them: - -> A render target is changed - -> The depth stencil is changed - -> The contents of one of those surfaces has changed(e.g. Surface::Map) - -> There are many more, compiling a list is a good starting point to fix this issue
*) The vertex shader is re-bound needlessly. The vertex shader depends on the vertex declaration, but only minimally(D3DCOLOR input type swizzling if GL_ARB_vertex_bgra isn't supported). The question however is how often does this happen in real apps. It causes half of the fps problem in my test app, but probably only a minor hit in most real apps.
*) render target and depth stencil dirtification in drawprimitive(directx.c). This isn't overly expensive, but does sum up, and like FBO reapplication it is rarely needed. There are however many situations in which it is needed.
*) Some global compile stuff: -fPIC costs quite a bit, and compiling out debugging stuff improves performance too. This is something a user who wants fast games and doesn't care about the drawbacks currently has to do on his owm.
*) Various locking things are expensive, the wined3d lock, the X11 lock. You can probably compile them out for single threaded games, but there's no general thing we can do. Also this is pretty specific to my test app too, although you can see the cost of locking in real apps as well(e.g. 3DMark 2001)
I think those are the main ones. You can test my test app, it is fairly hard to break with hacks. With a number of hacks I made it run faster than on Windows, but getting all those fixes in is highly unrealistic.
Some performance data(Macbook pro, 2.8 ghz core 2 duo, geforce 9600): GL version, 64 bit, Linux: 3200 fps GL version, 32 bit, Linux: 2400 fps Windows GL version, Wine, locking hacked out: 1600 fps GL version, 32 bit, Windows: 1400 fps D3D version, Windows: 730 fps(has to run in fullscreen, sometimes driver forces vsync) Windows GL version, Wine: 500 fps D3D version, Wine: 80 fps.
Those numbers are from my memory, so they may be wrong. But in general you get the idea. Note that you don't have to worry too much about the 500 fps for the GL version in Wine - Due to the nature of the app(many, many tiny draws) it hits the locking overhead really hard.
- Setting affinity is a "game changer".
I'm curious, have you done the same tests on Windows?
Note that you can also compile wined3d for windows and test it with SC2. That helps separate 3D related bugs from non-3D bugs. You cannot use this technique to split blame between wined3d and the driver though. Only between the 3D subsystem and the rest of the code.
Hi Stefan,
thanks very much. Do we think any of the below points will be prioritized by wine devs? Btw, I don't have windows (moral choice) so can't test it (I have a virtualized winxp but can't run the game in VirtualBox).
Cheers,
On 11/01/11 10:38, Stefan Dösinger wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 10.01.2011 um 22:03 schrieb Emanuele Oriani:
- Stefan, do you have any hint on where we should start optimizing the calls of D3D -> OpenGL functions?
Where do we waste time/CPU cycles?
A few bugs that are probably easier to fix(there's no really easy bug. It would have been fixed already): *) context_validate(context.c). The way it performs the checking is expensive. We already hook the wndproc to intercept messages, we could just intercept the messages involved with window destruction and set the valid flag on the context.
*) stream declaration parsing(device_update_stream_info, device.c): This needs some better data structures to either avoid re-parsing, or make parsing faster. The current problem is that we have to do this every time the shader or vertex declaration or a vertex buffer is changed. That virtually means we have to do this every frame.
*) FBO application. Currently we do this every draw, this is unnecessary. Unfortunately there are multiple conditions when this has to be done, among them:
- -> A render target is changed
- -> The depth stencil is changed
- -> The contents of one of those surfaces has changed(e.g. Surface::Map)
- -> There are many more, compiling a list is a good starting point to fix this issue
*) The vertex shader is re-bound needlessly. The vertex shader depends on the vertex declaration, but only minimally(D3DCOLOR input type swizzling if GL_ARB_vertex_bgra isn't supported). The question however is how often does this happen in real apps. It causes half of the fps problem in my test app, but probably only a minor hit in most real apps.
*) render target and depth stencil dirtification in drawprimitive(directx.c). This isn't overly expensive, but does sum up, and like FBO reapplication it is rarely needed. There are however many situations in which it is needed.
*) Some global compile stuff: -fPIC costs quite a bit, and compiling out debugging stuff improves performance too. This is something a user who wants fast games and doesn't care about the drawbacks currently has to do on his owm.
*) Various locking things are expensive, the wined3d lock, the X11 lock. You can probably compile them out for single threaded games, but there's no general thing we can do. Also this is pretty specific to my test app too, although you can see the cost of locking in real apps as well(e.g. 3DMark 2001)
I think those are the main ones. You can test my test app, it is fairly hard to break with hacks. With a number of hacks I made it run faster than on Windows, but getting all those fixes in is highly unrealistic.
Some performance data(Macbook pro, 2.8 ghz core 2 duo, geforce 9600): GL version, 64 bit, Linux: 3200 fps GL version, 32 bit, Linux: 2400 fps Windows GL version, Wine, locking hacked out: 1600 fps GL version, 32 bit, Windows: 1400 fps D3D version, Windows: 730 fps(has to run in fullscreen, sometimes driver forces vsync) Windows GL version, Wine: 500 fps D3D version, Wine: 80 fps.
Those numbers are from my memory, so they may be wrong. But in general you get the idea. Note that you don't have to worry too much about the 500 fps for the GL version in Wine - Due to the nature of the app(many, many tiny draws) it hits the locking overhead really hard.
- Setting affinity is a "game changer".
I'm curious, have you done the same tests on Windows?
Note that you can also compile wined3d for windows and test it with SC2. That helps separate 3D related bugs from non-3D bugs. You cannot use this technique to split blame between wined3d and the driver though. Only between the 3D subsystem and the rest of the code.
-----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
iQIcBAEBAgAGBQJNLDMzAAoJEN0/YqbEcdMwh0QP/RodsbVnV+7I6+tAlks+9DN9 1hsEo1vBIPsCIwDQFwbZNqRq6S4Lj9R9Q7itpqccVLzSEnANg2aDnfG5MrBjOlDE TY0ufdsBM2re3idc/uC9eGLNPpRyEGDt5atXsg+gp3ZhgNsVUgfkOZSBgwlPcLTE k24Vuald8jfzSpNnAO9wbXv7tnMxv/3yuHO67/64RuCmU5NCgFYXq5SYnZ7gU602 beL0v0nJHkfWg59L5v6K5cnG7R2ke1IyKwZwnelVMzOwA+39lCqwRkZau/8UCtm1 JF7l4IWTRjEu2N3w1larFJBEYDZWOy4+AGOWmuhuHyR7mg5Cgl+eFfhf/z6V5Ygp ODMX+Xs0NmZwo8Ie5wPz1VF6HFt6EFnkPtTcPIS/7jQPIKcYp/mIBut0c7Vzf1Mt KZBNPKhXfabIP9VIhrDg10UEgz8zPCWupW8MdDjDoty5seJa79sOP1Tnn4Hgi6MK s0lDlwh74scUNyAYkNPgyoXVZdGox/gFzo0HQ6AY210GmxX5vmEf/ISCcwbzPhzo Q9JLW167trE7FyE+i5AHhADLMUF/Nr0D10HxbmGST92HTGissVeVAFFkLLfP8Hst omFYL08ydRDvORnq/0THiWS4Av8EsMux6yFm1H2ZnvXPauExOq+TQsN4gOem3/oJ 488xNEuOj+uYk8w0/aiI =kESt -----END PGP SIGNATURE-----