Enforce proper atomic update so that other threads do not read stale data from IO_STATUS_BLOCK.
Signed-off-by: Jinoh Kang jinoh.kang.kr@gmail.com
-- v2: ntdll: Fix reading stale Information from IOSB.
From: Jinoh Kang jinoh.kang.kr@gmail.com
Enforce proper atomic update so that other threads do not read stale data from IO_STATUS_BLOCK.
Signed-off-by: Jinoh Kang jinoh.kang.kr@gmail.com --- dlls/ntdll/unix/unix_private.h | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/dlls/ntdll/unix/unix_private.h b/dlls/ntdll/unix/unix_private.h index 795fc148479..fb208f54e45 100644 --- a/dlls/ntdll/unix/unix_private.h +++ b/dlls/ntdll/unix/unix_private.h @@ -357,6 +357,8 @@ static inline BOOL in_wow64_call(void)
static inline void set_async_iosb( client_ptr_t iosb, NTSTATUS status, ULONG_PTR info ) { + NTSTATUS *status_ptr; + if (!iosb) return;
if (in_wow64_call()) @@ -366,19 +368,37 @@ static inline void set_async_iosb( client_ptr_t iosb, NTSTATUS status, ULONG_PTR NTSTATUS Status; ULONG Information; } *io = wine_server_get_ptr( iosb ); - io->Status = status; + status_ptr = &io->Status; io->Information = info; } else { IO_STATUS_BLOCK *io = wine_server_get_ptr( iosb ); #ifdef NONAMELESSUNION - io->u.Status = status; + status_ptr = &io->u.Status; #else - io->Status = status; + status_ptr = &io->Status; #endif io->Information = info; } + + /* GetOverlappedResultEx() skips waiting for the event/file handle if the + * Status field indicates completion, and returns the result from the + * Information field. + * + * Hence, we have to ensure that writes to the Information field are seen + * from the other threads *before* writes to the Status field. + * + * Otherwise, a race condition results: if writing the Status field has + * been completed but the subsequent update to the Information field has + * not, the application may end up reading stale value from it. + * + * The race condition can be especially problematic with applications that + * use the HasOverlappedIoCompleted() macro in a tight loop to busy-wait + * for completion of overlapped I/O. + */ + MemoryBarrier(); /* TODO: consider replacing with a store-release. */ + *(volatile NTSTATUS *)status_ptr = status; }
static inline client_ptr_t iosb_client_ptr( IO_STATUS_BLOCK *io )
On 5/30/22 14:20, Jinoh Kang wrote:
/* GetOverlappedResultEx() skips waiting for the event/file handle if the
* Status field indicates completion, and returns the result from the
* Information field.
*
* Hence, we have to ensure that writes to the Information field are seen
* from the other threads *before* writes to the Status field.
*
* Otherwise, a race condition results: if writing the Status field has
* been completed but the subsequent update to the Information field has
* not, the application may end up reading stale value from it.
*
* The race condition can be especially problematic with applications that
* use the HasOverlappedIoCompleted() macro in a tight loop to busy-wait
* for completion of overlapped I/O.
*/
MemoryBarrier(); /* TODO: consider replacing with a store-release. */
*(volatile NTSTATUS *)status_ptr = status; }
static inline client_ptr_t iosb_client_ptr( IO_STATUS_BLOCK *io )
Unpaired MemoryBarrier() is pretty much always wrong. (In fact, I think unpaired anything would be, here.)
It also doesn't seem clear to me that it's actually better than a store-release, as you point out. Ideally what we want is ReadAcquire() and WriteRelease(), but even setting that aside, a pair of interlocked operations will probably be an improvement by themselves.
On Tue, May 31, 2022, 7:52 AM Zebediah Figura wrote:
On 5/30/22 14:20, Jinoh Kang wrote:
- /* GetOverlappedResultEx() skips waiting for the event/file handle
if the
* Status field indicates completion, and returns the result from
the
* Information field.
*
* Hence, we have to ensure that writes to the Information field
are seen
* from the other threads *before* writes to the Status field.
*
* Otherwise, a race condition results: if writing the Status field
has
* been completed but the subsequent update to the Information
field has
* not, the application may end up reading stale value from it.
*
* The race condition can be especially problematic with
applications that
* use the HasOverlappedIoCompleted() macro in a tight loop to
busy-wait
* for completion of overlapped I/O.
*/
- MemoryBarrier(); /* TODO: consider replacing with a store-release.
*/
*(volatile NTSTATUS *)status_ptr = status; }
static inline client_ptr_t iosb_client_ptr( IO_STATUS_BLOCK *io )
Unpaired MemoryBarrier() is pretty much always wrong. (In fact, I think unpaired anything would be, here.)
By pairing, do you mean that another MemoryBarrier() should follow the status write?
It also doesn't seem clear to me that it's actually better than a store-release, as you point out. Ideally what we want is ReadAcquire() and WriteRelease(), but even setting that aside, a pair of interlocked operations will probably be an improvement by themselves.
As in, convert both status and information writes to interlocked operations, or replace the barrier with a dummy atomic op?
On 5/30/22 20:23, Jin-oh Kang wrote:
On Tue, May 31, 2022, 7:52 AM Zebediah Figura wrote:
On 5/30/22 14:20, Jinoh Kang wrote:
- /* GetOverlappedResultEx() skips waiting for the event/file handle
if the
* Status field indicates completion, and returns the result from
the
* Information field.
*
* Hence, we have to ensure that writes to the Information field
are seen
* from the other threads *before* writes to the Status field.
*
* Otherwise, a race condition results: if writing the Status field
has
* been completed but the subsequent update to the Information
field has
* not, the application may end up reading stale value from it.
*
* The race condition can be especially problematic with
applications that
* use the HasOverlappedIoCompleted() macro in a tight loop to
busy-wait
* for completion of overlapped I/O.
*/
- MemoryBarrier(); /* TODO: consider replacing with a store-release.
*/
*(volatile NTSTATUS *)status_ptr = status; }
static inline client_ptr_t iosb_client_ptr( IO_STATUS_BLOCK *io )
Unpaired MemoryBarrier() is pretty much always wrong. (In fact, I think unpaired anything would be, here.)
By pairing, do you mean that another MemoryBarrier() should follow the status write?
No. We need another memory barrier in GetOverlappedResult().
The point of a memory barrier is that if you have two pairs of memory operations (read or write) which must be perceived as happening "in order" across threads, you place a memory barrier in the middle of each pair. That is, if thread 1 executes ops A and B, and thread 2 executes ops C and D, you need a barrier between A and B, but also a barrier between C and D.
The following image (stolen and adapted from the Linux kernel documentation) may be illustrative:
CPU 1 CPU 2 =============== =============== WRITE_ONCE(a, 1); x = READ_ONCE(b); <write barrier> <--------> <read barrier> WRITE_ONCE(b, 2); y = READ_ONCE(a);
You need to do this because the compiler—and, more importantly, the CPU—is free to reorder the instructions on *both* sides. If, as in the above example, you need "x == 2" to imply "y == 1", the write barrier alone is not enough to achieve this, because the CPU can *also* reorder the reads.
The same applies to atomic operations that aren't just RMW. E.g. a store-release (to Status) would be correct here, and more efficient than a barrier, but it has to be paired with a read-acquire (also of Status) in GetOverlappedResult().
On Tue, May 31, 2022, 11:10 AM Zebediah Figura zfigura@codeweavers.com wrote:
On 5/30/22 20:23, Jin-oh Kang wrote:
On Tue, May 31, 2022, 7:52 AM Zebediah Figura wrote:
On 5/30/22 14:20, Jinoh Kang wrote:
- /* GetOverlappedResultEx() skips waiting for the event/file handle
if the
* Status field indicates completion, and returns the result from
the
* Information field.
*
* Hence, we have to ensure that writes to the Information field
are seen
* from the other threads *before* writes to the Status field.
*
* Otherwise, a race condition results: if writing the Status
field
has
* been completed but the subsequent update to the Information
field has
* not, the application may end up reading stale value from it.
*
* The race condition can be especially problematic with
applications that
* use the HasOverlappedIoCompleted() macro in a tight loop to
busy-wait
* for completion of overlapped I/O.
*/
- MemoryBarrier(); /* TODO: consider replacing with a
store-release.
*/
*(volatile NTSTATUS *)status_ptr = status; }
static inline client_ptr_t iosb_client_ptr( IO_STATUS_BLOCK *io )
Unpaired MemoryBarrier() is pretty much always wrong. (In fact, I think unpaired anything would be, here.)
By pairing, do you mean that another MemoryBarrier() should follow the status write?
No. We need another memory barrier in GetOverlappedResult().
Ah yes, of course. I'm sending it in a separate patch set, though. Thanks for the reminder.
The point of a memory barrier is that if you have two pairs of memory operations (read or write) which must be perceived as happening "in order" across threads, you place a memory barrier in the middle of each pair. That is, if thread 1 executes ops A and B, and thread 2 executes ops C and D, you need a barrier between A and B, but also a barrier between C and D.
The following image (stolen and adapted from the Linux kernel documentation) may be illustrative:
CPU 1 CPU 2 =============== =============== WRITE_ONCE(a, 1); x = READ_ONCE(b); <write barrier> <--------> <read barrier> WRITE_ONCE(b, 2); y = READ_ONCE(a);
That said, the *_ONCE macros would be quite helpful for Wine too. Ideally, however, we should just use the stdatomic.h primitives whenever possible (old GCC has bugs in them though), shimming in our own implementation on older compilers.
You need to do this because the compiler—and, more importantly, the CPU—is free to reorder the instructions on *both* sides. If, as in the above example, you need "x == 2" to imply "y == 1", the write barrier alone is not enough to achieve this, because the CPU can *also* reorder the reads.
The same applies to atomic operations that aren't just RMW. E.g. a store-release (to Status) would be correct here, and more efficient than a barrier, but it has to be paired with a read-acquire (also of Status) in GetOverlappedResult().
The problem is that the Wine codebase doesn't have one (yet). Otherwise an acquire/release barrier pair would be much more efficient than a full barrier (at least on weak-memory-order architectures).
On 5/30/22 21:17, Jin-oh Kang wrote:
The point of a memory barrier is that if you have two pairs of memory operations (read or write) which must be perceived as happening "in order" across threads, you place a memory barrier in the middle of each pair. That is, if thread 1 executes ops A and B, and thread 2 executes ops C and D, you need a barrier between A and B, but also a barrier between C and D.
The following image (stolen and adapted from the Linux kernel documentation) may be illustrative:
CPU 1 CPU 2 =============== =============== WRITE_ONCE(a, 1); x = READ_ONCE(b); <write barrier> <--------> <read barrier> WRITE_ONCE(b, 2); y = READ_ONCE(a);
That said, the *_ONCE macros would be quite helpful for Wine too. Ideally, however, we should just use the stdatomic.h primitives whenever possible (old GCC has bugs in them though), shimming in our own implementation on older compilers.
We've been preferring to use win32 atomic definitions instead.
I believe win32 guarantees that all access to suitably aligned types is atomic (not targeting the Alpha has its advantages), so there's no need for READ_ONCE or WRITE_ONCE. On the other hand, ReadNoFence() and WriteNoFence() also exist in recent winnt.h. They lack documentation, of course, and so do ReadAcquire() and WriteRelease()...
You need to do this because the compiler—and, more importantly, the CPU—is free to reorder the instructions on *both* sides. If, as in the above example, you need "x == 2" to imply "y == 1", the write barrier alone is not enough to achieve this, because the CPU can *also* reorder the reads.
The same applies to atomic operations that aren't just RMW. E.g. a store-release (to Status) would be correct here, and more efficient than a barrier, but it has to be paired with a read-acquire (also of Status) in GetOverlappedResult().
The problem is that the Wine codebase doesn't have one (yet). Otherwise an acquire/release barrier pair would be much more efficient than a full barrier (at least on weak-memory-order architectures).
We don't have any way to encode store-release and write-acquire, but they would be easy enough to add that I think it wouldn't delay this patch too much.