Re: [PATCH v2 1/3] vcomp: Implement _vcomp_for_dynamic_init_i8()

3 Mar 2022


      On 2/22/22 16:42, Zebediah Figura wrote:
...
...
Creating new separate 64-bit data structures for the 64-bit calls
would necessitate adding a bunch of new and redundant code (to
initialize the new structs, etc.), but would be the most performant.
Increasing the 32-bit members to 64-bit would waste the most memory
and be the least performant for 32-bit calls, but would eliminate
redundant code. My choice of unions is a compromise; new data
structures are unnecessary and the separate 32-bit and 64-bit calls
each access their native data types, which is good for performance.
Using unions in this way is a traditional approach in OpenMP runtime
implementations. For example, if you look at the code for libgomp, the
most widely used OpenMP implementation, you will see that it uses
unions to handle 32/64-bit variants in a similar manner:
https://github.com/gcc-mirror/gcc/blob/1931cbad498e625b1e24452dcfffe02539b12...
Increasing the existing fields to 64-bit uses the exact same amount of
memory as using a union. I don't know why libgomp did that, but it seems
pointless to me.
The reason that libgomp (and my submitted patches) use unions is so
that the OpenMP runtime functions can operate on either 32-bit or 64-bit
values, as needed. This yields better performance on some architectures;
for example, the 32-bit functions would run slower if they had to do
type conversions and operate on 64-bit data types using 32-bit registers.
...
Note that I also asked for tests as this may help determine what the
correct approach is. E.g. if mixing 32- and 64-bit types works without a
hitch, that suggests that widening the existing fields is the correct
approach.
For each dynamically-scheduled loop, OpenMP-aware compilers will emit
*either* a series of calls to the 32-bit functions, *or else* a series
of calls to the 64-bit functions. The same instance of the data structure
will never be accessed by a mix of 32-bit functions and 64-bit functions.
I think it is better to have separate union fields, so that the 32-bit
and 64-bit functions can each access their "native" data types.
...
...
Yes, if the top priority was to minimize redundant code, then
refactoring these functions to handle both the 32-bit and 64-bit cases
is possible. However, performance would suffer. If you look at the
other previously-implemented vcomp functions, you'll see that earlier
Wine contributors implemented separate 32-bit and 64-bit versions of
functions for other scheduling modes: _vcomp_for_static_init() and
_vcomp_for_static_init_i8(), _vcomp_for_static_simple_init() and
_vcomp_for_static_simple_init_i8(), etc. My patches follow this same
pre-established pattern.
Just from reading the code, it seems that the "static" initializers are
something of a different case, since they're writing to fields which are
by definition either 32- or 64-bit. If we were to use the same fields,
that concern wouldn't seem to apply here, although I may be missing
something.
Consider the two Wine functions _vcomp_for_static_simple_init() and
_vcomp_for_static_simple_init_i8(). The code for these functions is
identical, except one operates on 32-bit values, and the other on
64-bit values. I assume the original Wine author used duplicative code
for performance reasons, as did I in my patches that implement the
dynamic schedule routines.  I could re-write my patches to minimize
code duplication (e.g. implement the 32-bit functions as wrappers around
the 64-bit ones), but the result would be lower performance due to the
type conversions, which is important in performance-sensitive code
such as an OpenMP implementation.
To summarize, my vcomp patches use unions and some redundant code so
that both the 32-bit and 64-bit functions have optimal performance, and
are consistent with previously-implemented Wine vcomp routines. As you've
pointed out, the same functionality could be achieved by widening the
data structures to 64-bit and re-factoring the code; this would reduce
duplicate code (DRY), but would also reduce the 32-bit performance.
I prefer things the way I submitted them, but I respect your years of
experience with the Wine codebase and will happily edit and resubmit my
patches if you think DRY is more important than optimal performance. The
main goal is to improve support for the many OpenMP applications that
have been unusable for years due to Wine's incomplete and buggy vcomp
implementation.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH v2 1/3] vcomp: Implement _vcomp_for_dynamic_init_i8()