This fixes data race in ARM/ARM64 platforms, and prevents potential memory access reordering by the compiler.
From: Jinoh Kang jinoh.kang.kr@gmail.com
This fixes data race in ARM/ARM64 platforms, and prevents potential memory access reordering by the compiler. --- dlls/gdiplus/gdiplus_private.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dlls/gdiplus/gdiplus_private.h b/dlls/gdiplus/gdiplus_private.h index 7ae8cc564c9..c5a1e1adc59 100644 --- a/dlls/gdiplus/gdiplus_private.h +++ b/dlls/gdiplus/gdiplus_private.h @@ -623,7 +623,7 @@ static inline BOOL image_lock(GpImage *image, BOOL *unlock)
static inline void image_unlock(GpImage *image, BOOL unlock) { - if (unlock) image->busy = 0; + if (unlock) WriteRelease(&image->busy, 0); }
static inline void set_rect(GpRectF *rect, REAL x, REAL y, REAL width, REAL height)
On 10/3/22 07:46, Jinoh Kang wrote:
From: Jinoh Kang jinoh.kang.kr@gmail.com
This fixes data race in ARM/ARM64 platforms, and prevents potential memory access reordering by the compiler.
dlls/gdiplus/gdiplus_private.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dlls/gdiplus/gdiplus_private.h b/dlls/gdiplus/gdiplus_private.h index 7ae8cc564c9..c5a1e1adc59 100644 --- a/dlls/gdiplus/gdiplus_private.h +++ b/dlls/gdiplus/gdiplus_private.h @@ -623,7 +623,7 @@ static inline BOOL image_lock(GpImage *image, BOOL *unlock)
static inline void image_unlock(GpImage *image, BOOL unlock) {
- if (unlock) image->busy = 0;
if (unlock) WriteRelease(&image->busy, 0); }
static inline void set_rect(GpRectF *rect, REAL x, REAL y, REAL width, REAL height)
Why does this need release semantics, and where is the corresponding acquire?
On Tue, Oct 4, 2022, 11:57 PM Zebediah Figura zfigura@codeweavers.com wrote:
On 10/3/22 07:46, Jinoh Kang wrote:
From: Jinoh Kang jinoh.kang.kr@gmail.com
This fixes data race in ARM/ARM64 platforms, and prevents potential memory access reordering by the compiler.
dlls/gdiplus/gdiplus_private.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dlls/gdiplus/gdiplus_private.h
b/dlls/gdiplus/gdiplus_private.h
index 7ae8cc564c9..c5a1e1adc59 100644 --- a/dlls/gdiplus/gdiplus_private.h +++ b/dlls/gdiplus/gdiplus_private.h @@ -623,7 +623,7 @@ static inline BOOL image_lock(GpImage *image, BOOL
*unlock)
static inline void image_unlock(GpImage *image, BOOL unlock) {
- if (unlock) image->busy = 0;
if (unlock) WriteRelease(&image->busy, 0); }
static inline void set_rect(GpRectF *rect, REAL x, REAL y, REAL width,
REAL height)
Why does this need release semantics,
This function can be inlined, and the write to `busy` can be reordered before any other operations.
and where is the corresponding
acquire?
In image_lock's `InterlockedCompareExchange(&image->busy, ...)`
On 10/4/22 10:36, Jin-oh Kang wrote:
Why does this need release semantics,
This function can be inlined, and the write to `busy` can be reordered before any other operations.
Well, sure, but why is that a problem?
and where is the corresponding
acquire?
In image_lock's `InterlockedCompareExchange(&image->busy, ...)`
The point is that both of these questions should be clearly answered in the patch itself :-)
On Wed, Oct 5, 2022, 12:38 AM Zebediah Figura zfigura@codeweavers.com wrote:
On 10/4/22 10:36, Jin-oh Kang wrote:
Why does this need release semantics,
This function can be inlined, and the write to `busy` can be reordered before any other operations.
Well, sure, but why is that a problem?
gdiplus uses image_lock() and image_unlock() to guard accesses to shared mutable state of GpImage. Not using mutual exclusion will result in a data race.
and where is the corresponding
acquire?
In image_lock's `InterlockedCompareExchange(&image->busy, ...)`
The point is that both of these questions should be clearly answered in the patch itself :-)
Yes. I planned to address them in my reply to Esme's question, and the commit message as well. That would, however, need to be accompanied with a good explanation of memory models in general, so that I could convince anyone else reviewing them as well. I was just too busy to get back to it this week and finish up the reply. Sorry.
Sorry, I don't think I can review this because I don't understand what WriteRelease does.
Zebediah has graciously linked me to a resource on memory barriers which I am planning to read in the hope of understanding this better. She also brought up the question of whether we'd be better served by a standard locking mechanism.
The image_lock and image_unlock functions were added with the intention of making image access thread-safe, so we should make sure they have the correct semantics. AFAICT the lock itself doesn't contain any information except the current thread id, so as long as the lock can be cheaply created and destroyed without creating any kernel objects (because we can create and destroy a lot of images), and can be used without blocking, I think a standard mechanism will work fine.
It looks to me like if we create a critical section without debug info and only use TryEnter, it should be as light as it can reasonably be. Using debug info should also be fine as it's just a single heap allocation, but I don't think we need it if we never block on the critical section.
On 04/10/2022 19:18, Esme Povirk (@madewokherd) wrote:
Zebediah has graciously linked me to a resource on memory barriers which I am planning to read in the hope of understanding this better. She also brought up the question of whether we'd be better served by a standard locking mechanism.
The image_lock and image_unlock functions were added with the intention of making image access thread-safe, so we should make sure they have the correct semantics. AFAICT the lock itself doesn't contain any information except the current thread id, so as long as the lock can be cheaply created and destroyed without creating any kernel objects (because we can create and destroy a lot of images), and can be used without blocking, I think a standard mechanism will work fine.
It looks to me like if we create a critical section without debug info and only use TryEnter, it should be as light as it can reasonably be. Using debug info should also be fine as it's just a single heap allocation, but I don't think we need it if we never block on the critical section.
I don't think a critical section is needed if it works now without one. These memory barriers do nothing on x86, and don't even exist (compile to no code at all), it's only useful for architectures without strong memory-ordering guarantees (like ARM or PowerPC, etc).
Those architectures have specific instructions to deal with this, because the CPU can reorder memory writes or reads without them. It's sort of like using "sync" on disk, but on the CPU's memory. You want the other pieces of memory to be synced (flushed) before you update the atomic lock because when you read the lock in some other thread, you expect to read the rest of the stuff and have it synced with it.
Anyway this is a very simplified explanation, but might be easier to grasp, at least as to why it's needed.
On 10/4/22 12:09, Gabriel Ivăncescu wrote:
On 04/10/2022 19:18, Esme Povirk (@madewokherd) wrote:
Zebediah has graciously linked me to a resource on memory barriers which I am planning to read in the hope of understanding this better. She also brought up the question of whether we'd be better served by a standard locking mechanism.
The image_lock and image_unlock functions were added with the intention of making image access thread-safe, so we should make sure they have the correct semantics. AFAICT the lock itself doesn't contain any information except the current thread id, so as long as the lock can be cheaply created and destroyed without creating any kernel objects (because we can create and destroy a lot of images), and can be used without blocking, I think a standard mechanism will work fine.
It looks to me like if we create a critical section without debug info and only use TryEnter, it should be as light as it can reasonably be. Using debug info should also be fine as it's just a single heap allocation, but I don't think we need it if we never block on the critical section.
I don't think a critical section is needed if it works now without one. These memory barriers do nothing on x86, and don't even exist (compile to no code at all), it's only useful for architectures without strong memory-ordering guarantees (like ARM or PowerPC, etc).
Those architectures have specific instructions to deal with this, because the CPU can reorder memory writes or reads without them. It's sort of like using "sync" on disk, but on the CPU's memory. You want the other pieces of memory to be synced (flushed) before you update the atomic lock because when you read the lock in some other thread, you expect to read the rest of the stuff and have it synced with it.
Anyway this is a very simplified explanation, but might be easier to grasp, at least as to why it's needed.
The point is that memory barriers are categorically harder to reason about than mutexes. With mutexes, you can reason with things like "only one thread can modify these variables at a time", and "any two parts of code protected by a critical section will have their side effects serialized with each other".
Memory barriers require one to understand acquire/release semantics (which can often be very subtle), and they require a complete explanation of why they're in use in the code. On the other hand, it's rare that code is in such a hot path that they're actually necessary. I don't know whether gdiplus is in such a hot path (I could see it going either way), but if not I'd advise against trying to use any such atomics.
I think it's clear that we don't *need* a critical section, because anything we need can be built out of primitives we have. But I don't think we need to reinvent them either. It seems to me that using a common and well-tested implementation probably is better than maintaining a separate one.
On Mon Oct 10 14:02:23 2022 +0000, Esme Povirk wrote:
I think it's clear that we don't *need* a critical section, because anything we need can be built out of primitives we have. But I don't think we need to reinvent them either. It seems to me that using a common and well-tested implementation probably is better than maintaining a separate one.
Are you open to refactoring gdiplus instead so that recursive locks can be eliminated altogether? That way, we can just replace them with SRWLOCKs.
On Mon Oct 10 14:02:23 2022 +0000, Jinoh Kang wrote:
Are you open to refactoring gdiplus instead so that recursive locks can be eliminated altogether? That way, we can just replace them with SRWLOCKs.
Would that just mean when we currently have a function that locks and calls another function that locks, we make a helper for the inner function that assumes the image is already locked?
If we do recurse with an SRWLOCK, that'll hang, right?
I think it's more work than necessary, but I'd be open to it, yes.
On Mon Oct 10 19:53:20 2022 +0000, Esme Povirk wrote:
Would that just mean when we currently have a function that locks and calls another function that locks, we make a helper for the inner function that assumes the image is already locked? If we do recurse with an SRWLOCK, that'll hang, right? I think it's more work than necessary, but I'd be open to it, yes.
Actually, we don't want to ever block so I guess instead of hanging on recurse the function will just fail.