H.264 uses a 16-pixel alignment, and the stream sink media type should
have the aligned height after the session has started.
--
v12: mf/tests: Test H.264 decoder alignment.
mf/tests: Test H.264 sink media type height alignment.
https://gitlab.winehq.org/wine/wine/-/merge_requests/8887
--
v5: comctl32: Remove v6 only exports.
comctl32: Remove syslink from comctl32 v5.
comctl32: Remove taskdialog from comctl32 v5.
comctl32/tests: Test v6 only exports.
comctl32: Remove user32 control copies in comctl32 v5.
https://gitlab.winehq.org/wine/wine/-/merge_requests/9150
The `RtlRunOnce` family of functions are implemented using (a variant of) the Double Checked Locking Pattern (DCLP). The DCLP requires memory fences for correctness, on both the writer and reader sides. It's pretty obvious why it's needed on the writer side, but quite surprising that any is needed on the reader side! On strong memory model architectures (x86 and x86_64), only compiler-level fences are required. On weak memory model architectures like ARM64, instead, you need both CPU and compiler fences.
That's explained well in books like _Concurrent Programming on Windows_ by Joe Duffy and in online resources like [1].
The Wine implementation has fences on the writing side (`RtlRunOnceComplete`). That's because `InterlockedCompareExchangePointer`
inserts a full memory fence. However some code paths on the reader side (`RtlRunOnceBeginInitialize`) are missing fences, specifically
the (`RTL_RUN_ONCE_CHECK_ONLY`) branch and the (`!RTL_RUN_ONCE_CHECK_ONLY && (val & 3) == 2`) branch.
Add the missing fences using GCC's atomic builtins [2]
Note: with this MR, the generated code should change only for ARM64
### References:
1. [Double-Checked Locking is Fixed In C++11](https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp…
2. [GCC's Built-in Functions for Memory Model Aware Atomic Operations](https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.ht…
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/9178
Convert surface lock to recursive mutex to allow same thread
to acquire the lock multiple times, preventing deadlocks when
lock_surface() and window_surface_lock() are called nestedly.
Signed-off-by: Zhao Yi <zhaoyi(a)uniontech.com>
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/9177