On 5/30/22 20:23, Jin-oh Kang wrote:
On Tue, May 31, 2022, 7:52 AM Zebediah Figura wrote:
On 5/30/22 14:20, Jinoh Kang wrote:
+ + /* GetOverlappedResultEx() skips waiting for the event/file handle if the + * Status field indicates completion, and returns the result from the + * Information field. + * + * Hence, we have to ensure that writes to the Information field are seen + * from the other threads *before* writes to the Status field. + * + * Otherwise, a race condition results: if writing the Status field has + * been completed but the subsequent update to the Information field has + * not, the application may end up reading stale value from it. + * + * The race condition can be especially problematic with applications that + * use the HasOverlappedIoCompleted() macro in a tight loop to busy-wait + * for completion of overlapped I/O. + */ + MemoryBarrier(); /* TODO: consider replacing with a store-release. */ + *(volatile NTSTATUS *)status_ptr = status; }
static inline client_ptr_t iosb_client_ptr( IO_STATUS_BLOCK *io )
Unpaired MemoryBarrier() is pretty much always wrong. (In fact, I think unpaired anything would be, here.)
By pairing, do you mean that another MemoryBarrier() should follow the status write?
No. We need another memory barrier in GetOverlappedResult(). The point of a memory barrier is that if you have two pairs of memory operations (read or write) which must be perceived as happening "in order" across threads, you place a memory barrier in the middle of each pair. That is, if thread 1 executes ops A and B, and thread 2 executes ops C and D, you need a barrier between A and B, but also a barrier between C and D. The following image (stolen and adapted from the Linux kernel documentation) may be illustrative: CPU 1 CPU 2 =============== =============== WRITE_ONCE(a, 1); x = READ_ONCE(b); <write barrier> <--------> <read barrier> WRITE_ONCE(b, 2); y = READ_ONCE(a); You need to do this because the compiler—and, more importantly, the CPU—is free to reorder the instructions on *both* sides. If, as in the above example, you need "x == 2" to imply "y == 1", the write barrier alone is not enough to achieve this, because the CPU can *also* reorder the reads. The same applies to atomic operations that aren't just RMW. E.g. a store-release (to Status) would be correct here, and more efficient than a barrier, but it has to be paired with a read-acquire (also of Status) in GetOverlappedResult().