I've been doing some oprofille tests with wine running fce ultra, the 8-bit Nintendo emulator. I found that when running a rom for 60 seconds, more than 99% of the CPU utilization for winex11drv (which uses the most of all components of wine in this case) is in the same function : convert_888_to_0888_asis (in dlls/x11drv/dib_convert.c).
I've also found that marking a couple of parameters of that function const (that I believe should be marked const anyways), CPU usage in that function drops measurably with oprofile. As far as I know, parameters that aren't modified in a function should be marked const anyways, to send the right hint to the compiler. Since it actually turns out to be faster too, it seems worth it to me.
Here are the results from oprofile before the change:
21982 99.2012 convert_888_to_0888_asis 22078 99.1735 convert_888_to_0888_asis 22207 99.1605 convert_888_to_0888_asis 22161 99.1544 convert_888_to_0888_asis 22158 99.2253 convert_888_to_0888_asis
and after:
21828 99.1731 convert_888_to_0888_asis 21769 99.2296 convert_888_to_0888_asis 21835 99.3313 convert_888_to_0888_asis 21868 99.1027 convert_888_to_0888_asis 21601 99.1508 convert_888_to_0888_asis
On average, it went from 22117 (context switches I believe?) to 21780, about a 1.5% improvement. The same test was run with a script each time (run fceu with the same rom, kill it after 60 seconds). I'm assuming the percentages (the second column in this chart) don't matter much, because this function we're examining takes up an overwhelming amount of the function's CPU (> 99%), so its percentage depends largely on how many context switches other functions took up.
I've taken the liberty of providing a diff that patches all the functions in this file to use const parameters where appropriate. I don't get any compiler warnings with this compiling with gcc 4.0.1. Should I submit this to wine-patches? This would be my first patch to wine (or any open-source project, period). I'd appreciate some feedback before I try for it.
Thanks in advance!
Michael Carlson wrote:
I've also found that marking a couple of parameters of that function const (that I believe should be marked const anyways), CPU usage in that function drops measurably with oprofile. As far as I know, parameters that aren't modified in a function should be marked const anyways, to send the right hint to the compiler. Since it actually turns out to be faster too, it seems worth it to me.
-static void convert_5x5_asis(int width, int height,
const void* srcbits, int srclinebytes,
void* dstbits, int dstlinebytes)
+static void convert_5x5_asis(const int width, const int height,
const void* srcbits, const int srclinebytes,
void* dstbits, const int dstlinebytes)
There is no need to make anything except the pointers const - I don't think I've ever seen that in real world code. In theory this would give the compiler slightly more information... but if the optimizer is unable to figure out that the parameter is never used as an lvalue by himself it sucks so badly that it probably won't do much better with that hints anyway. :-)
Constifying the pointers is fine ofcourse (but rather because it helps finding bugs than for those 1.5% performance improvements.
Felix
On Wed, 27 Jul 2005 09:25, Felix Nawothnig wrote:
There is no need to make anything except the pointers const - I don't think I've ever seen that in real world code. In theory this would give the compiler slightly more information... but if the optimizer is unable to figure out that the parameter is never used as an lvalue by himself it sucks so badly that it probably won't do much better with that hints anyway. :-)
In fact I don't know that making these const would make any difference - I'd like to see the assembly code generated by the two versions for comparison.
Constifying the pointers is fine ofcourse (but rather because it helps finding bugs than for those 1.5% performance improvements.
The 1.5% appears to be within the noise.
A few of us analysed those routines to death on IRC a couple of months back and decided the real problem was that they were being called too frequently, not that there was a problem with the routines themselves. The routines could probably be made faster by rewriting them in assembly language, but bigger benefits could be gained by figuring out why there are so many calls and reducing the number of calls.
On 7/26/05, Troy Rollo wine@troy.rollo.name wrote:
On Wed, 27 Jul 2005 09:25, Felix Nawothnig wrote:
There is no need to make anything except the pointers const - I don't think I've ever seen that in real world code. In theory this would give the compiler slightly more information... but if the optimizer is unable to figure out that the parameter is never used as an lvalue by himself it sucks so badly that it probably won't do much better with that hints anyway. :-)
In fact I don't know that making these const would make any difference - I'd like to see the assembly code generated by the two versions for comparison.
I agree, the compiler should realise the parameters aren't used as an l-value in the function anywhere - but it apparently doesn't, or marking the parameters const is having some other effect on the binary - with gcc 4.0.1 (at least on my system, Debian sid), the compiled size of winex11.drv.so changes with my patch (it increases by 368 bytes from 5335409 -> 5335777). Just changing the parameters to const for the function in question (and none of the other functions in the file) increases the size from the original size by 64 bytes.
I won't pretend to know why my patch increases the size of the binary, but it shows that even without looking at the assembly code, this patch is making some kind of difference (whether good or bad) in the compiled code. I would look further into it, but I don't know assembly.
The 1.5% appears to be within the noise.
A few of us analysed those routines to death on IRC a couple of months back and decided the real problem was that they were being called too frequently, not that there was a problem with the routines themselves. The routines could probably be made faster by rewriting them in assembly language, but bigger benefits could be gained by figuring out why there are so many calls and reducing the number of calls.
I agree it would be good to have these routines called less in the future. However, until we reduce the number of times that routine is called (whenever that may be), it would be nice to know we're doing everything we can to make the routine as fast as possible.
Anyway, I'm the amateur wine-hacker here, so if you guys still say we shouldn't change it, I won't argue the matter further. I just want to point out that this patch is certainly doing SOMETHING to the resulting binary, and it's possibly good.
Hi,
On Tue, Jul 26, 2005 at 11:08:12PM -0400, Michael Carlson wrote:
I agree, the compiler should realise the parameters aren't used as an l-value in the function anywhere - but it apparently doesn't, or marking the parameters const is having some other effect on the binary
- with gcc 4.0.1 (at least on my system, Debian sid), the compiled
size of winex11.drv.so changes with my patch (it increases by 368 bytes from 5335409 -> 5335777). Just changing the parameters to const for the function in question (and none of the other functions in the file) increases the size from the original size by 64 bytes.
I won't pretend to know why my patch increases the size of the binary, but it shows that even without looking at the assembly code, this patch is making some kind of difference (whether good or bad) in the compiled code. I would look further into it, but I don't know assembly.
You probably need to strip the binary or omit gcc -g in order to obtain meaningful numbers, since otherwise it probably encodes the additional const in debug information tables, too, which thus increases the size of the binary.
Anyway, I'm the amateur wine-hacker here, so if you guys still say we shouldn't change it, I won't argue the matter further. I just want to point out that this patch is certainly doing SOMETHING to the resulting binary, and it's possibly good.
While I'm not too convinced in this case (1.5% improvement sounds like within statistic noise), it should be a good idea to mark things in Wine const whenever possible (objdump -x helps here), since it improves reliability (erroneous random memory area writes will crash in const areas) and performance (hints for the compiler in some non-optimized compiler cases, page discarding instead of swapping).
Andreas Mohr
Andreas Mohr wrote:
While I'm not too convinced in this case (1.5% improvement sounds like within statistic noise), it should be a good idea to mark things in Wine const whenever possible (objdump -x helps here), since it improves reliability
Huh? He did stuff like...
-void foo(int i) +void foo(const int i)
This will most likely not improve reliability. He also changed some pointers to const which is wanted and will ofcourse improve reliability.
Assuming that gcc has a pretty good optimizer the only reason I can think of (besides a GCC bug :-) why performance increased is that gcc doesn't have to perform aliasing-analysis and can be sure that the dst pointer will never point into local parameters (constifying local variables could speed up things slightly more).
I'm not sure about the "restrict" semantics but maybe it could be used instead of constifying "int i" to archieve the same effects?
Felix
Has anyone checked if the performance improvement stays if only the pointers are set to const?
I'd also suggest running the test multiple times and averaging.
Felix Nawothnig wrote:
Andreas Mohr wrote:
While I'm not too convinced in this case (1.5% improvement sounds like within statistic noise), it should be a good idea to mark things in Wine const whenever possible (objdump -x helps here), since it improves reliability
Huh? He did stuff like...
-void foo(int i) +void foo(const int i)
This will most likely not improve reliability. He also changed some pointers to const which is wanted and will ofcourse improve reliability.
Assuming that gcc has a pretty good optimizer the only reason I can think of (besides a GCC bug :-) why performance increased is that gcc doesn't have to perform aliasing-analysis and can be sure that the dst pointer will never point into local parameters (constifying local variables could speed up things slightly more).
I'm not sure about the "restrict" semantics but maybe it could be used instead of constifying "int i" to archieve the same effects?
Felix