On 5/30/22 20:23, Jin-oh Kang wrote:
On Tue, May 31, 2022, 7:52 AM Zebediah Figura wrote:
On 5/30/22 14:20, Jinoh Kang wrote:
- /* GetOverlappedResultEx() skips waiting for the event/file handle
if the
* Status field indicates completion, and returns the result from
the
* Information field.
*
* Hence, we have to ensure that writes to the Information field
are seen
* from the other threads *before* writes to the Status field.
*
* Otherwise, a race condition results: if writing the Status field
has
* been completed but the subsequent update to the Information
field has
* not, the application may end up reading stale value from it.
*
* The race condition can be especially problematic with
applications that
* use the HasOverlappedIoCompleted() macro in a tight loop to
busy-wait
* for completion of overlapped I/O.
*/
- MemoryBarrier(); /* TODO: consider replacing with a store-release.
*/
*(volatile NTSTATUS *)status_ptr = status; }
static inline client_ptr_t iosb_client_ptr( IO_STATUS_BLOCK *io )
Unpaired MemoryBarrier() is pretty much always wrong. (In fact, I think unpaired anything would be, here.)
By pairing, do you mean that another MemoryBarrier() should follow the status write?
No. We need another memory barrier in GetOverlappedResult().
The point of a memory barrier is that if you have two pairs of memory operations (read or write) which must be perceived as happening "in order" across threads, you place a memory barrier in the middle of each pair. That is, if thread 1 executes ops A and B, and thread 2 executes ops C and D, you need a barrier between A and B, but also a barrier between C and D.
The following image (stolen and adapted from the Linux kernel documentation) may be illustrative:
CPU 1 CPU 2 =============== =============== WRITE_ONCE(a, 1); x = READ_ONCE(b); <write barrier> <--------> <read barrier> WRITE_ONCE(b, 2); y = READ_ONCE(a);
You need to do this because the compiler—and, more importantly, the CPU—is free to reorder the instructions on *both* sides. If, as in the above example, you need "x == 2" to imply "y == 1", the write barrier alone is not enough to achieve this, because the CPU can *also* reorder the reads.
The same applies to atomic operations that aren't just RMW. E.g. a store-release (to Status) would be correct here, and more efficient than a barrier, but it has to be paired with a read-acquire (also of Status) in GetOverlappedResult().