On 2/22/22 16:42, Zebediah Figura wrote:
Creating new separate 64-bit data structures for the 64-bit calls would necessitate adding a bunch of new and redundant code (to initialize the new structs, etc.), but would be the most performant. Increasing the 32-bit members to 64-bit would waste the most memory and be the least performant for 32-bit calls, but would eliminate redundant code. My choice of unions is a compromise; new data structures are unnecessary and the separate 32-bit and 64-bit calls each access their native data types, which is good for performance.
Using unions in this way is a traditional approach in OpenMP runtime implementations. For example, if you look at the code for libgomp, the most widely used OpenMP implementation, you will see that it uses unions to handle 32/64-bit variants in a similar manner:
https://github.com/gcc-mirror/gcc/blob/1931cbad498e625b1e24452dcfffe02539b12...
Increasing the existing fields to 64-bit uses the exact same amount of memory as using a union. I don't know why libgomp did that, but it seems pointless to me.
The reason that libgomp (and my submitted patches) use unions is so that the OpenMP runtime functions can operate on either 32-bit or 64-bit values, as needed. This yields better performance on some architectures; for example, the 32-bit functions would run slower if they had to do type conversions and operate on 64-bit data types using 32-bit registers.
Note that I also asked for tests as this may help determine what the correct approach is. E.g. if mixing 32- and 64-bit types works without a hitch, that suggests that widening the existing fields is the correct approach.
For each dynamically-scheduled loop, OpenMP-aware compilers will emit *either* a series of calls to the 32-bit functions, *or else* a series of calls to the 64-bit functions. The same instance of the data structure will never be accessed by a mix of 32-bit functions and 64-bit functions. I think it is better to have separate union fields, so that the 32-bit and 64-bit functions can each access their "native" data types.
Yes, if the top priority was to minimize redundant code, then refactoring these functions to handle both the 32-bit and 64-bit cases is possible. However, performance would suffer. If you look at the other previously-implemented vcomp functions, you'll see that earlier Wine contributors implemented separate 32-bit and 64-bit versions of functions for other scheduling modes: _vcomp_for_static_init() and _vcomp_for_static_init_i8(), _vcomp_for_static_simple_init() and _vcomp_for_static_simple_init_i8(), etc. My patches follow this same pre-established pattern.
Just from reading the code, it seems that the "static" initializers are something of a different case, since they're writing to fields which are by definition either 32- or 64-bit. If we were to use the same fields, that concern wouldn't seem to apply here, although I may be missing something.
Consider the two Wine functions _vcomp_for_static_simple_init() and _vcomp_for_static_simple_init_i8(). The code for these functions is identical, except one operates on 32-bit values, and the other on 64-bit values. I assume the original Wine author used duplicative code for performance reasons, as did I in my patches that implement the dynamic schedule routines. I could re-write my patches to minimize code duplication (e.g. implement the 32-bit functions as wrappers around the 64-bit ones), but the result would be lower performance due to the type conversions, which is important in performance-sensitive code such as an OpenMP implementation.
To summarize, my vcomp patches use unions and some redundant code so that both the 32-bit and 64-bit functions have optimal performance, and are consistent with previously-implemented Wine vcomp routines. As you've pointed out, the same functionality could be achieved by widening the data structures to 64-bit and re-factoring the code; this would reduce duplicate code (DRY), but would also reduce the 32-bit performance. I prefer things the way I submitted them, but I respect your years of experience with the Wine codebase and will happily edit and resubmit my patches if you think DRY is more important than optimal performance. The main goal is to improve support for the many OpenMP applications that have been unusable for years due to Wine's incomplete and buggy vcomp implementation.