Francois Jacques wrote:
Group, This patch comes from investigation of a bug that was observed in RedHat 7.0 and not observed in RedHat 6.1. After a debugging session that involved Stephane and I, we found out that it was a compiler issue (which I feared from the start, but wasn't considering it seriously... arghhh). The RedHat 6.1 installation has gcc version 2.91.66 and the RedHat 7.0 installation has gcc version 2.96-20000731 (RedHat Linux 7.0) with latest updates from RedHat. What's happening : 1) HOOK_CallHook is being called (notice wParam and lParam arguments)2) Local variables wParamOrig and lParamOrig are set to wParam and lParam values3) Mapping function is called (MapFunc) with the *addresses* of wParam and lParam so those get changed before we call the hook4) After MapFunc is called, wParamOrig and lParamOrig values also changed to the new values of wParam and lParam (!) Explanation : Down to the assembly level code, the compiler doesn't create wParamOrig and lParamOrig on the stack. Instead, it directly use wParam and lParam. Fix : add volatile to wParamOrig and lParamOrig to prevent optimizations on those variables (aka. force their presence on the stack)
This kind of begs the question: how many more places in Wine should the volatile keyword be used? I find it hard to believe that this is the only spot in a large very low-level project like Wine.
-Dave
This kind of begs the question: how many more places in Wine should the volatile keyword be used? I find it hard to believe that this is the only spot in a large very low-level project like Wine.
You're comment is absolutely right. The thing is, those bugs are especially hard to find out and require very careful code examination.
IMHO, performing code review of the whole tree for missing volatile keywords would be a waste of time compared to do it on a case by case basis. Simply keep in mind that those bugs may happen - especially with "aggressive" compilers such as gcc 2.96. If a bug didn't happen with the previous compiler and does show up with the latest release, it might be a good candidate.
Now try to don't go paranoid about compiler issues when WINE crashes with your new Linux installations ;)
Francois
IMHO, performing code review of the whole tree for missing volatile keywords would be a waste of time compared to do it on a case by case basis. Simply keep in mind that those bugs may happen - especially with "aggressive" compilers such as gcc 2.96. If a bug didn't happen with the previous compiler and does show up with the latest release, it might be a good candidate.
But the point is, this is *not* a Wine bug that was just exposed by a new compiler -- this is a compiler bug, plain and simple.
Making these variables 'volatile' is not required at all -- 'volatile' tells the compiler that the variables might be changed behind the compiler's back, e.g. due to being asynchroneously modified by a signal handler or something like that. In this particular case, however, the variables are not only not changed behind the compiler's back, they are *never* changed at all!
If the compiler generates incorrect code here, this is just a bug, probably somewhere in the common subexpression elimination (or maybe reload) code. Now if the compiler has this bug, it might manifest itself also at different places -- which we *cannot* find by a code review, as the code itself is completely correct ...
Instead of circumventing the compiler bug in this particular case, it might be preferable to find out which optimization phase contains the bug, and write a configure check that switches that phase off if it detects the bug (like we did with the 2.95.2 strength-reduce bug).
Bye, Ulrich
IMHO, performing code review of the whole tree for missing volatile
keywords
would be a waste of time compared to do it on a case by case basis.
Simply
keep in mind that those bugs may happen - especially with "aggressive" compilers such as gcc 2.96. If a bug didn't happen with the previous compiler and does show up with the latest release, it might be a good candidate.
But the point is, this is *not* a Wine bug that was just exposed by a new compiler -- this is a compiler bug, plain and simple.
Did I write it was a WINE bug? No. It is very clear for me that it *is* a compiler issue.
I had three choices
a) find a workaround and have people comment on it; b) disable the optimize phase for gcc 2.96 and have the whole community on my back; c) fix gcc.
I went for the easy one.
Making these variables 'volatile' is not required at all -- 'volatile' tells the compiler that the variables might be changed behind the compiler's back, e.g. due to being asynchroneously modified by a signal handler or something like that. In this particular case, however, the variables are not only not changed behind the compiler's back, they are *never* changed at all!
100% agree. Again, it's a quick-yet-efficient workaround to the compiler problem that just happened to work. While volatile is for variables that are modified by external factors, it was a good way to prevent all the optimization - which was just what I wanted.
Instead of circumventing the compiler bug in this particular case, it might be preferable to find out which optimization phase contains the bug, and write a configure check that switches that phase off if it detects the bug (like we did with the 2.95.2 strength-reduce bug).
We'll do some research on the optimization phase. Once we found it, can anybody help us with that autoconf kungfu?
Ciao! Francois
Did I write it was a WINE bug? No. It is very clear for me that it *is* a compiler issue.
OK, I must have misunderstood you then. Sorry ...
Instead of circumventing the compiler bug in this particular case, it might be preferable to find out which optimization phase contains the bug, and write a configure check that switches that phase off if it detects the bug (like we did with the 2.95.2 strength-reduce bug).
We'll do some research on the optimization phase. Once we found it, can anybody help us with that autoconf kungfu?
I've managed to reduce the problem to a simple testcase, which is appended below. It fails on RedHat gcc-2.96-69 (as of 12/2000) ...
Culprit *appears* to be the global common subexpression elimination, although this is not quite definite. I haven't tracked the problem down yet.
However, when switching off GCSE (-fno-gcse), correct code is produced. I'm not sure I'd like to do a complete Wine build without GCSE though; this might significantly reduce code quality ...
I'll write a configure check to switch off gcse, using the testcase below. We should probably contact RedHat (I think Jacub Jelinek handles 2.96 bugs ...) to confirm that GCSE is really at fault.
Bye, Ulrich
void clobber(int *); void compare(int, int); void use(int);
int global;
void test(int trigger, int param) { int data = trigger? global : 0; int orig_param = param;
clobber(¶m); compare(orig_param, param);
use(data); }
void clobber(int *param) { (*param)++; }
void compare(int orig_param, int param) { if (orig_param != param) printf( "PASS\n" ); else printf( "FAIL\n" ); }
void use(int data) { }
int main(void) { test(0, 0); }
Ulrich Weigand wrote: [...]
I've managed to reduce the problem to a simple testcase, which is appended below. It fails on RedHat gcc-2.96-69 (as of 12/2000) ...
Running this program with Red Hat's gcc-2.96-74 still produces the error. I'm downloading -79 now and will test that. Has this been reported to Red Hat's Bugzilla? I'll submit it if you don't want to.
james[1]%gcc -v Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs gcc version 2.96 20000731 (Red Hat Linux 7.0) james[2]%rpm -qi gcc Name : gcc Relocations: (not relocateable) Version : 2.96 Vendor: Red Hat, Inc. Release : 74 Build Date: Sat 03 Feb 2001 06:49:13 PM EST Install date: Mon 12 Feb 2001 12:02:51 AM EST Build Host: porky.devel.redhat.com Group : Development/Languages Source RPM: gcc-2.96-74.src.rpm Size : 8444185 License: GPL Packager : Red Hat, Inc. http://bugzilla.redhat.com/bugzilla URL : http://gcc.gnu.org Summary : Various compilers (C, C++, Objective-C, Chill, ...) Description : The gcc package contains the GNU Compiler Collection: cc and gcc. You'll need this package in order to compile C/C++ code. james[3]%gcc -O2 test.c james[4]%./a.out FAIL james[6]%gcc -O2 -fno-gcse test.c james[7]%./a.out PASS james[8]%exit
Script done on Fri Mar 30 00:09:36 2001
-- James Juran jamesjuran@alumni.psu.edu
James Juran wrote:
Ulrich Weigand wrote: [...]
I've managed to reduce the problem to a simple testcase, which is appended below. It fails on RedHat gcc-2.96-69 (as of 12/2000) ...
Running this program with Red Hat's gcc-2.96-74 still produces the error. I'm downloading -79 now and will test that. Has this been reported to Red Hat's Bugzilla? I'll submit it if you don't want to.
Same results with -79. It fails with -O2, but passes with -O2 -fno-gcse.
-- James Juran jamesjuran@alumni.psu.edu
On Fri, Mar 30, 2001 at 12:54:06AM -0500, James Juran wrote:
James Juran wrote:
Ulrich Weigand wrote: [...]
I've managed to reduce the problem to a simple testcase, which is appended below. It fails on RedHat gcc-2.96-69 (as of 12/2000) ...
Running this program with Red Hat's gcc-2.96-74 still produces the error. I'm downloading -79 now and will test that. Has this been reported to Red Hat's Bugzilla? I'll submit it if you don't want to.
Same results with -79. It fails with -O2, but passes with -O2 -fno-gcse.
Tested also the very last version -80, same problem. I've forwarded yesterday Francois email to Jakub Jelinek and I'm making now the bugzilla entry.
bye michael
I tested this with 2 newer gcc versions: gcc version 3.0 20010305 (prerelease) gcc version 2.97 20010122 (experimental)
Both print PASS, when compiling with -O2.
Michael
Francois Jacques wrote:
IMHO, performing code review of the whole tree for missing volatile
keywords
would be a waste of time compared to do it on a case by case basis.
Simply
keep in mind that those bugs may happen - especially with "aggressive" compilers such as gcc 2.96. If a bug didn't happen with the previous compiler and does show up with the latest release, it might be a good candidate.
But the point is, this is *not* a Wine bug that was just exposed by a new compiler -- this is a compiler bug, plain and simple.
Did I write it was a WINE bug? No. It is very clear for me that it *is* a compiler issue.
I had three choices
a) find a workaround and have people comment on it; b) disable the optimize phase for gcc 2.96 and have the whole community on my back; c) fix gcc.
I went for the easy one.
Making these variables 'volatile' is not required at all -- 'volatile' tells the compiler that the variables might be changed behind the compiler's back, e.g. due to being asynchroneously modified by a signal handler or something like that. In this particular case, however, the variables are not only not changed behind the compiler's back, they are *never* changed at all!
100% agree. Again, it's a quick-yet-efficient workaround to the compiler problem that just happened to work. While volatile is for variables that are modified by external factors, it was a good way to prevent all the optimization - which was just what I wanted.
To be honest, I didn't read the patch and simply assumed this was a simple case of forgetting to add volatile and getting bitten by a compiler that optimizes more agressively.
Either way it still is a valid point to be concerned about not only compiler bugs but also new compiler features.
Instead of circumventing the compiler bug in this particular case, it might be preferable to find out which optimization phase contains the bug, and write a configure check that switches that phase off if it detects the bug (like we did with the 2.95.2 strength-reduce bug).
We'll do some research on the optimization phase. Once we found it, can anybody help us with that autoconf kungfu?
That would be a good idea. Of course I don't think the volatile keyword in this case is hurting anything except maybe speed (negligble). Hopefully this can get fixed soon in gcc. Looking at the gcc project page, it seems that 3.0 is very close to be done.
-Dave